We'll now turn to a different type of Twitter data -- static data, either recent tweets or user-level information. This type of data can be retrieved with Twitter's REST API. We will use the `tweetscores` package here -- this is a package that I created to facilitate the collection and analysis of Twitter data. ### Searching recent tweets It is possible to download recent tweets, but only up those less than 7 days old, and in some cases not all of them. ```{r} load("~/my_oauth") library(tweetscores) library(streamR) searchTweets(q=c("kennedy", "supreme court"), filename="../data/kennedy-tweets.json", n=1000, until="2018-07-01", oauth=my_oauth) tweets <- parseTweets("../data/kennedytweets.json") ``` What are the most popular hashtags? ```{r} library(stringr) ht <- str_extract_all(tweets$text, "#(\\d|\\w)+") ht <- unlist(ht) head(sort(table(ht), decreasing = TRUE)) ``` You can check the documentation about the options for string search [here](https://dev.twitter.com/rest/public/search). ### Extracting users' profile information This is how you would extract information from user profiles: ```{r} wh <- c("realDonaldTrump", "POTUS", "VP", "FLOTUS") users <- getUsersBatch(screen_names=wh, oauth=my_oauth) str(users) ``` Which of these has the most followers? ```{r} users[which.max(users$followers_count),] users$screen_name[which.max(users$followers_count)] ``` Download up to 3,200 recent tweets from a Twitter account: ```{r} getTimeline(filename="../data/realDonaldTrump.json", screen_name="realDonaldTrump", n=1000, oauth=my_oauth) ``` What are the most common hashtags? ```{r} tweets <- parseTweets("../data/realDonaldTrump.json") ht <- str_extract_all(tweets$text, "#(\\d|\\w)+") ht <- unlist(ht) head(sort(table(ht), decreasing = TRUE)) ``` ### Building friend and follower networks Download friends and followers: ```{r} followers <- getFollowers("MethodologyLSE", oauth=my_oauth) friends <- getFriends("MethodologyLSE", oauth=my_oauth) ``` What are the most common words that friends of the LSE Methodology Twitter account use to describe themselves on Twitter? ```{r, fig.height=6, fig.width=6} # extract profile descriptions users <- getUsersBatch(ids=friends, oauth=my_oauth) # create table with frequency of word use library(quanteda) tw <- corpus(users$description[users$description!=""]) dfm <- dfm(tw, remove=c(stopwords("english"), stopwords("spanish"), "t.co", "https", "rt", "rts", "http"), remove_punct=TRUE) topfeatures(dfm, n = 30) # create wordcloud par(mar=c(0,0,0,0)) textplot_wordcloud(dfm, rotation=0, min_size=1, max_size=5, max_words=100) ``` ### Estimating ideology based on Twitter networks The `tweetscores` package also includes functions to replicate the method developed in the Political Analysis paper [__Birds of a Feather Tweet Together. Bayesian Ideal Point Estimation Using Twitter Data__](https://doi.org/10.1093/pan/mpu011). For an application of this method, see also [this Monkey Cage blog post](http://www.washingtonpost.com/blogs/monkey-cage/wp/2015/06/16/who-is-the-most-conservative-republican-candidate-for-president/). ```{r} # download list of friends for an account user <- "p_barbera" friends <- getFriends(user, oauth=my_oauth) # estimating ideology with MCMC methods results <- estimateIdeology(user, friends, verbose=FALSE) # trace plot to monitor convergence tracePlot(results, "theta") # comparing with other ideology estimates plot(results) ``` ### Other types of data The REST API offers also a long list of other endpoints that could be of use at some point, depending on your research interests. 1) You can search users related to specific keywords: ```{r} users <- searchUsers(q="london school of economics", count=100, oauth=my_oauth) users$screen_name[1:10] ``` 2) If you know the ID of the tweets, you can download it directly from the API. This is useful because tweets cannot be redistributed as part of the replication materials of a published paper, but the list of tweet IDs can be shared: ```{r} # Downloading tweets when you know the ID getStatuses(ids=c("474134260149157888", "266038556504494082"), filename="../data/old-tweets.json", oauth=my_oauth) parseTweets("../data/old-tweets.json") ``` 3) Lists of Twitter users, compiled by other users, are also accessible through the API. ```{r} # download user information from a list MCs <- getList(list_name="new-members-of-congress", screen_name="cspan", oauth=my_oauth) head(MCs) ``` This is also useful if e.g. you're interested in compiling lists of journalists, because media outlets offer these lists in their profiles. 4) List of users who retweeted a particular tweet -- unfortunately, it's limited to only 100 most recent retweets. ```{r} # Download list of users who retweeted a tweet (unfortunately, only up to 100) rts <- getRetweets(id='942123433873281024', oauth=my_oauth) # https://twitter.com/realDonaldTrump/status/942123433873281024 users <- getUsersBatch(ids=rts, oauth=my_oauth) # create table with frequency of word use library(quanteda) tw <- corpus(users$description[users$description!=""]) dfm <- dfm(tw, remove=c(stopwords("english"), stopwords("spanish"), "t.co", "https", "rt", "rts", "http"), remove_punct = TRUE) # create wordcloud par(mar=c(0,0,0,0)) textplot_wordcloud(dfm, rot.per=0, scale=c(3, .50), max.words=100) ``` 5) And one final function to convert dates in their internal Twitter format to another format we could work with in R: ```{r} # format Twitter dates to facilitate analysis tweets <- parseTweets("../data/realDonaldTrump.json") tweets$date <- formatTwDate(tweets$created_at, format="date") hist(tweets$date, breaks="month") ``` Now time for another challenge!