We’ll now turn to a different type of Twitter data – static data, either recent tweets or user-level information. This type of data can be retrieved with Twitter’s REST API. We will use the tweetscores package here – this is a package that I created to facilitate the collection and analysis of Twitter data.

Searching recent tweets

It is possible to download recent tweets, but only up those less than 7 days old, and in some cases not all of them.

load("~/my_oauth")
library(tweetscores)
## Loading required package: R2WinBUGS
## Loading required package: coda
## Loading required package: boot
## ##
## ## tweetscores: tools for the analysis of Twitter data
## ## Pablo Barbera (USC)
## ## www.tweetscores.com
## ##
library(streamR)
## Loading required package: RCurl
## Loading required package: rjson
## Loading required package: ndjson
searchTweets(q="halloween", 
  filename="../data/halloween-tweets.json",
  n=1000, until="2022-10-31", 
  oauth=my_oauth)
## 100 tweets. Max id: 1586870565041840128
## 178 hits left
## 196 tweets. Max id: 1586870553964584960
## 177 hits left
## 296 tweets. Max id: 1586870543210483712
## 176 hits left
## 396 tweets. Max id: 1586870533714583552
## 175 hits left
## 496 tweets. Max id: 1586870524210479104
## 174 hits left
## 596 tweets. Max id: 1586870513787355136
## 173 hits left
## 696 tweets. Max id: 1586870503876481024
## 172 hits left
## 796 tweets. Max id: 1586870494418161664
## 171 hits left
## 896 tweets. Max id: 1586870484519591936
## 170 hits left
## 992 tweets. Max id: 1586870473786376192
## 169 hits left
## 1092 tweets. Max id: 1586870462055084032
tweets <- parseTweets("../data/halloween-tweets.json")
## 1092 tweets have been parsed.

What are the most popular hashtags?

library(stringr)
ht <- str_extract_all(tweets$text, '#[A-Za-z0-9_]+')
ht <- unlist(ht)
head(sort(table(ht), decreasing = TRUE))
## ht
##     #Halloween #GenshinImpact     #halloween        #Amazon        #childe 
##             68             25             11              9              9 
##          #povo 
##              8

You can check the documentation about the options for string search here.

Extracting users’ profile information

This is how you would extract information from user profiles:

wh <- c("JoeBiden", "POTUS", "VP", "FLOTUS")
users <- getUsersBatch(screen_names=wh,
                       oauth=my_oauth)
## 1--4 users left
str(users)
## 'data.frame':    4 obs. of  9 variables:
##  $ id_str         : chr  "1349154719386775552" "803694179079458816" "939091" "1349149096909668363"
##  $ screen_name    : chr  "FLOTUS" "VP" "JoeBiden" "POTUS"
##  $ name           : chr  "Jill Biden" "Vice President Kamala Harris" "Joe Biden" "President Biden"
##  $ description    : chr  "First Lady of the United States Jill Biden. Community college educator. Military mother. Grandmother. Wife of @POTUS." "Vice President of the United States. Wife to the first @SecondGentleman. Momala. Auntie. Fighting for the people." "Husband to @DrBiden, proud father and grandfather. Ready to build back better for all Americans. Official account is @POTUS." "46th President of the United States, husband to @FLOTUS, proud dad & pop. Tweets may be archived: https://t.co/"| __truncated__
##  $ followers_count: int  4115264 13790391 36357676 26867220
##  $ statuses_count : int  603 7987 8650 4415
##  $ friends_count  : int  6 6 48 12
##  $ created_at     : chr  "Wed Jan 13 00:57:47 +0000 2021" "Tue Nov 29 20:16:39 +0000 2016" "Sun Mar 11 17:51:24 +0000 2007" "Wed Jan 13 00:37:08 +0000 2021"
##  $ location       : chr  "" "" "Washington, DC" ""

Which of these has the most followers?

users[which.max(users$followers_count),]
##   id_str screen_name      name
## 3 939091    JoeBiden Joe Biden
##                                                                                                                    description
## 3 Husband to @DrBiden, proud father and grandfather. Ready to build back better for all Americans. Official account is @POTUS.
##   followers_count statuses_count friends_count                     created_at
## 3        36357676           8650            48 Sun Mar 11 17:51:24 +0000 2007
##         location
## 3 Washington, DC
users$screen_name[which.max(users$followers_count)]
## [1] "JoeBiden"

Download up to 3,200 recent tweets from a Twitter account:

getTimeline(filename="../data/uscpoir.json",
            screen_name="uscpoir", n=200, oauth=my_oauth)
## 200 tweets. Max id: 1432375952936427523

What are the most common hashtags?

tweets <- parseTweets("../data/uscpoir.json")
## 200 tweets have been parsed.
ht <- str_extract_all(tweets$text, '#[A-Za-z0-9_]+')
ht <- unlist(ht)
head(sort(table(ht), decreasing = TRUE))
## ht
##         #COP26         #ICYMI   #Afghanistan   #humanrights #environmental 
##              3              3              2              2              1 
##       #Eurasia 
##              1

Building friend and follower networks

Download friends and followers:

followers <- getFollowers("uscpoir", 
    oauth=my_oauth)
## 15 API calls left
## 1053 followers. Next cursor: 0
## 14 API calls left
friends <- getFriends("uscpoir", 
    oauth=my_oauth)
## 15 API calls left
## 221 friends. Next cursor: 0
## 14 API calls left

What are the most common words that friends of the uscpoir account use to describe themselves on Twitter?

# extract profile descriptions
users <- getUsersBatch(ids=friends, oauth=my_oauth)
## 1--221 users left
## 2--121 users left
## 3--21 users left
# create table with frequency of word use
library(quanteda)
## Package version: 3.2.3
## Unicode version: 14.0
## ICU version: 70.1
## Parallel computing: 8 of 8 threads used.
## See https://quanteda.io for tutorials and examples.
tw <- tokens(corpus(users$description[users$description!=""]),remove_punct=TRUE)
dfm <- dfm_remove(dfm(tw), c(stopwords("english"), stopwords("spanish"),
                                 "t.co", "https", "rt", "rts", "http"))
topfeatures(dfm, n = 20)
##             |           usc     political       science      politics 
##            98            63            56            37            29 
##          @usc      official    university      research        public 
##            27            24            24            22            21 
##     professor      @uscpoir           phd    california      southern 
##            20            20            20            19            19 
##          news international        policy       twitter       account 
##            16            16            16            15            15

Estimating ideology based on Twitter networks

The tweetscores package also includes functions to replicate the method developed in the Political Analysis paper Birds of a Feather Tweet Together. Bayesian Ideal Point Estimation Using Twitter Data. For an application of this method, see also this Monkey Cage blog post.

# download list of friends for an account
user <- "DonaldJTrumpJr"
friends <- getFriends(user, oauth=my_oauth)
## 14 API calls left
## 1702 friends. Next cursor: 0
## 13 API calls left
# estimating ideology with correspondence analysis method
(theta <- estimateIdeology2(user, friends, verbose=FALSE))
## DonaldJTrumpJr follows 157 elites: ABC, AEI, AlbertBrooks, AllenWest, AmbJohnBolton, andersoncooper, AndreaTantaros, AnnCoulter, AP, AriFleischer, azizansari, BBCBreaking, BBCWorld, benshapiro, BillHemmer, BillyHallowell, BreakingNews, BreitbartNews, BretBaier, brithume, BuzzFeed, ByronYork, CharlieDaniels, ChrisLoesch, chrisrock, chucktodd, chuckwoolery, CNBC, CNN, cnnbrk, DailyCaller, daveweigel, davidgregory, DavidLimbaugh, davidwebbshow, DineshDSouza, DLoesch, DRUDGE, DRUDGE_REPORT, ericbolling, EWErickson, FareedZakaria, FinancialTimes, ForAmerica, Forbes, foxandfriends, FoxNews, foxnewspolitics, funnyordie, Gabby_Hoffman, GarySinise, Gawker, ggreenwald, glennbeck, GOP, gopleader, GovChristie, GovernorPerry, GovMikeHuckabee, greggutfeld, GStephanopoulos, hardball_chris, Heritage, HeyTammyBruce, HouseGOP, HuffingtonPost, hughhewitt, iamjohnoliver, IngrahamAngle, iowahawkblog, JamesRosenFNC, jasoninthehouse, jim_jordan, JimDeMint, jimmyfallon, Judgenap, JudicialWatch, kanyewest, KatiePavlich, kilmeade, kimguilfoyle, krauthammer, KurtSchlichter, larryelder, LindaSuhler, loudobbsnews, LukeRussert, marcorubio, marklevinshow, MarkSteynOnline, marshablackburn, marthamaccallum, megynkelly, michellemalkin, mitchellreports, mkhammer, MonicaCrowley, mtaibbi, NASA, neiltyson, newsbusters, newtgingrich, NolteNC, NRA, NRO, nytimes, oreillyfactor, politico, ppppolls, RealBenCarson, RealJamesWoods, RedState, Reince, repdianeblack, repgosar, repjeffduncan, reploubarletta, replouiegohmert, repmobrooks, reppaulryan, repseanduffy, repthomasmassie, Reuters, SarahPalinUSA, scrowder, seanhannity, Senate_GOPs, senatortimscott, senmikelee, sentedcruz, ShannonBream, SharylAttkisson, stevekingia, SteveMartinToGo, tamronhall, TeamCavuto, tedcruz, TedNugent, TEDTalks, tgowdysc, TheAtlantic, theblaze, TheEconomist, thenation, TheOnion, ThomasSowell, TODAYshow, toddstarnes, townhallcom, TuckerCarlson, TwitchyTeam, VanityFair, washingtonpost, WayneDupreeShow, wikileaks, WSJ, YoungCons
## [1] 1.566959

Other types of data

The REST API offers also a long list of other endpoints that could be of use at some point, depending on your research interests.

  1. You can search users related to specific keywords:
users <- searchUsers(q="uscpoir", count=100, oauth=my_oauth)
users$screen_name[1:10]
##  [1] "uscpoir"         "p_barbera"       "nolahaynes_"     "as_hartnett"    
##  [5] "JennCryer"       "audryewong"      "clairebcrawford" "shallow__state" 
##  [9] "pongkwans"       "miguelmaria"
  1. If you know the ID of the tweets, you can download it directly from the API. This is useful because tweets cannot be redistributed as part of the replication materials of a published paper, but the list of tweet IDs can be shared:
# Downloading tweets when you know the ID
getStatuses(ids=c('1454115859534950406', '1452687910055002115',
                  '1451896893743767555'),
            filename="../data/old-tweets.json",
            oauth=my_oauth)
## 900 API calls left
## 3 tweets left.
## 0 tweets left.
## 899 API calls left
parseTweets("../data/old-tweets.json")
## 3 tweets have been parsed.
##                                                                                                                                                                                                                                                                                                      text
## 1 Signing the Paris Agreement to fight climate change was one of my proudest moments in office. But it was always a foundation to build on. As world leaders gather for COP26, I shared some reflections on the road to Paris and the young activists who are pushing us further. https://t.co/EebSdTQ6QS
## 2                When you look at the history of big social movements, they’re usually started and sustained by young people who put in the work to make it happen. As we look to COP26, I’m inspired by the young people using their voices in the fight against climate change. https://t.co/IBdciqn8ml
## 3                                                        Some of the most important changes often start in state legislatures. That's why I'm proud to support these candidates for the Virginia state legislature. I hope you'll join me and the @DLCC in giving them your vote. https://t.co/jt3brA4F3B
##   retweet_count favorite_count favorited truncated              id_str
## 1          2491          14289     FALSE     FALSE 1454115859534950406
## 2          2384          14864     FALSE     FALSE 1452687910055002115
## 3          2860          12432     FALSE     FALSE 1451896893743767555
##   in_reply_to_screen_name
## 1                      NA
## 2                      NA
## 3                      NA
##                                                                               source
## 1       <a href="https://studio.twitter.com" rel="nofollow">Twitter Media Studio</a>
## 2 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
## 3            <a href="https://mobile.twitter.com" rel="nofollow">Twitter Web App</a>
##   retweeted                     created_at in_reply_to_status_id_str
## 1     FALSE Fri Oct 29 16:00:11 +0000 2021                        NA
## 2     FALSE Mon Oct 25 17:26:02 +0000 2021                        NA
## 3     FALSE Sat Oct 23 13:02:49 +0000 2021                        NA
##   in_reply_to_user_id_str lang listed_count verified       location user_id_str
## 1                      NA   en       220276     TRUE Washington, DC      813286
## 2                      NA   en       220276     TRUE Washington, DC      813286
## 3                      NA   en       220276     TRUE Washington, DC      813286
##                         description geo_enabled                user_created_at
## 1 Dad, husband, President, citizen.       FALSE Mon Mar 05 22:08:25 +0000 2007
## 2 Dad, husband, President, citizen.       FALSE Mon Mar 05 22:08:25 +0000 2007
## 3 Dad, husband, President, citizen.       FALSE Mon Mar 05 22:08:25 +0000 2007
##   statuses_count followers_count favourites_count protected
## 1          16651       133438013                5     FALSE
## 2          16651       133438013                5     FALSE
## 3          16651       133438013                5     FALSE
##                  user_url         name time_zone user_lang utc_offset
## 1 https://t.co/kHvnxozw8x Barack Obama        NA        NA         NA
## 2 https://t.co/kHvnxozw8x Barack Obama        NA        NA         NA
## 3 https://t.co/kHvnxozw8x Barack Obama        NA        NA         NA
##   friends_count screen_name country_code country place_type full_name
## 1        577332 BarackObama           NA      NA         NA        NA
## 2        577332 BarackObama           NA      NA         NA        NA
## 3        577332 BarackObama           NA      NA         NA        NA
##   place_name place_id place_lat place_lon lat lon
## 1         NA       NA       NaN       NaN  NA  NA
## 2         NA       NA       NaN       NaN  NA  NA
## 3         NA       NA       NaN       NaN  NA  NA
##                                             expanded_url
## 1                                                   <NA>
## 2 https://twitter.com/YouTube/status/1452022851222048774
## 3    https://twitter.com/DLCC/status/1451896240543764494
##                       url
## 1                    <NA>
## 2 https://t.co/IBdciqn8ml
## 3 https://t.co/jt3brA4F3B
  1. Lists of Twitter users, compiled by other users, are also accessible through the API.
# download user information from a list
govs <- getList(list_id="7560205", oauth=my_oauth)
## 900 API calls left
## 20 users in list. Next cursor: 5427968139194679296
## 899 API calls left
## 40 users in list. Next cursor: 4611686018662284436
## 898 API calls left
## 54 users in list. Next cursor: 0
## 897 API calls left
head(govs)
##             id              id_str                       name     screen_name
## 1 1.201314e+18 1201313519662067712 Governor Ralph DLG. Torres    GovernorCNMI
## 2 1.192550e+18 1192549822865297409      Governor Andy Beshear  GovAndyBeshear
## 3 1.099290e+18 1099290316920799233   Governor Albert Bryan Jr        govbryan
## 4 1.086522e+18 1086521832671346688 Governor Lou Leon Guerrero louleonguerrero
## 5 1.084925e+18 1084924525232513025              Gov. Bill Lee      GovBillLee
## 6 1.084818e+18 1084817523378454529       Governor JB Pritzker     GovPritzker
##                   location
## 1 Northern Mariana Islands
## 2      Frankfort, Kentucky
## 3     Virgin Islands, U.S.
## 4                     Guam
## 5       State of Tennessee
## 6                         
##                                                                                                                             description
## 1      Father, Husband, Public Servant, and the 9th Governor of the Commonwealth of the Northern Mariana Islands (CNMI) 🇲🇵 #TheMarianas
## 2 The official account of the 63rd Governor of the Commonwealth of Kentucky. Tweets from Andy are signed ^AB. #TeamKentucky #TogetherKy
## 3                                                                   Father. Husband. Ninth Elected Governor of the U.S. Virgin Islands.
## 4                                                      Nurse. Businesswoman. Policymaker. Mother. Grandmother. 9th Governor of Guam. 🇬🇺
## 5                                                                                                            50th Governor of Tennessee
## 6                                                                       Husband and father. Proudly serving as Illinois’ 43rd governor.
##                       url followers_count friends_count
## 1 https://t.co/ckMjZPpuMk             443            28
## 2 https://t.co/rkzEPFKuGH          163925           136
## 3 https://t.co/yERaQ1VBdj            2099            76
## 4 https://t.co/Q2O0kVQua9            3187           158
## 5 https://t.co/cmW7jEv3Wn          110668           158
## 6 https://t.co/35JwCLCMrB          249656           172
##                       created_at time_zone lang
## 1 Mon Dec 02 01:34:16 +0000 2019        NA   NA
## 2 Thu Nov 07 21:10:34 +0000 2019        NA   NA
## 3 Sat Feb 23 12:50:10 +0000 2019        NA   NA
## 4 Sat Jan 19 07:12:46 +0000 2019        NA   NA
## 5 Mon Jan 14 21:25:38 +0000 2019        NA   NA
## 6 Mon Jan 14 14:20:27 +0000 2019        NA   NA

This is also useful if e.g. you’re interested in compiling lists of journalists, because media outlets offer these lists in their profiles.

  1. List of users who retweeted a particular tweet – unfortunately, it’s limited to only 100 most recent retweets.
# Download list of users who retweeted a tweet (unfortunately, only up to 100)
rts <- getRetweets(id='653733796408377344', oauth=my_oauth)
## 75 API calls left
## 36 retweeters. Next cursor: 0
## 74 API calls left
# https://twitter.com/HillaryClinton/status/653733796408377344
  1. And one final function to convert dates in their internal Twitter format to another format we could work with in R:
# format Twitter dates to facilitate analysis
tweets <- parseTweets("../data/uscpoir.json")
## 200 tweets have been parsed.
tweets$date <- formatTwDate(tweets$created_at, format="date")
## Warning in Sys.setlocale("LC_TIME", "English"): OS reports request to set locale
## to "English" cannot be honored
hist(tweets$date, breaks="month")