We’ll now turn to a different type of Twitter data – static data,
either recent tweets or user-level information. This type of data can be
retrieved with Twitter’s REST API. We will use the
tweetscores
package here – this is a package that I created
to facilitate the collection and analysis of Twitter data.
It is possible to download recent tweets, but only up those less than 7 days old, and in some cases not all of them.
load("~/my_oauth")
library(tweetscores)
## Loading required package: R2WinBUGS
## Loading required package: coda
## Loading required package: boot
## ##
## ## tweetscores: tools for the analysis of Twitter data
## ## Pablo Barbera (USC)
## ## www.tweetscores.com
## ##
library(streamR)
## Loading required package: RCurl
## Loading required package: rjson
## Loading required package: ndjson
searchTweets(q="halloween",
filename="../data/halloween-tweets.json",
n=1000, until="2022-10-31",
oauth=my_oauth)
## 100 tweets. Max id: 1586870565041840128
## 178 hits left
## 196 tweets. Max id: 1586870553964584960
## 177 hits left
## 296 tweets. Max id: 1586870543210483712
## 176 hits left
## 396 tweets. Max id: 1586870533714583552
## 175 hits left
## 496 tweets. Max id: 1586870524210479104
## 174 hits left
## 596 tweets. Max id: 1586870513787355136
## 173 hits left
## 696 tweets. Max id: 1586870503876481024
## 172 hits left
## 796 tweets. Max id: 1586870494418161664
## 171 hits left
## 896 tweets. Max id: 1586870484519591936
## 170 hits left
## 992 tweets. Max id: 1586870473786376192
## 169 hits left
## 1092 tweets. Max id: 1586870462055084032
tweets <- parseTweets("../data/halloween-tweets.json")
## 1092 tweets have been parsed.
What are the most popular hashtags?
library(stringr)
ht <- str_extract_all(tweets$text, '#[A-Za-z0-9_]+')
ht <- unlist(ht)
head(sort(table(ht), decreasing = TRUE))
## ht
## #Halloween #GenshinImpact #halloween #Amazon #childe
## 68 25 11 9 9
## #povo
## 8
You can check the documentation about the options for string search here.
This is how you would extract information from user profiles:
wh <- c("JoeBiden", "POTUS", "VP", "FLOTUS")
users <- getUsersBatch(screen_names=wh,
oauth=my_oauth)
## 1--4 users left
str(users)
## 'data.frame': 4 obs. of 9 variables:
## $ id_str : chr "1349154719386775552" "803694179079458816" "939091" "1349149096909668363"
## $ screen_name : chr "FLOTUS" "VP" "JoeBiden" "POTUS"
## $ name : chr "Jill Biden" "Vice President Kamala Harris" "Joe Biden" "President Biden"
## $ description : chr "First Lady of the United States Jill Biden. Community college educator. Military mother. Grandmother. Wife of @POTUS." "Vice President of the United States. Wife to the first @SecondGentleman. Momala. Auntie. Fighting for the people." "Husband to @DrBiden, proud father and grandfather. Ready to build back better for all Americans. Official account is @POTUS." "46th President of the United States, husband to @FLOTUS, proud dad & pop. Tweets may be archived: https://t.co/"| __truncated__
## $ followers_count: int 4115264 13790391 36357676 26867220
## $ statuses_count : int 603 7987 8650 4415
## $ friends_count : int 6 6 48 12
## $ created_at : chr "Wed Jan 13 00:57:47 +0000 2021" "Tue Nov 29 20:16:39 +0000 2016" "Sun Mar 11 17:51:24 +0000 2007" "Wed Jan 13 00:37:08 +0000 2021"
## $ location : chr "" "" "Washington, DC" ""
Which of these has the most followers?
users[which.max(users$followers_count),]
## id_str screen_name name
## 3 939091 JoeBiden Joe Biden
## description
## 3 Husband to @DrBiden, proud father and grandfather. Ready to build back better for all Americans. Official account is @POTUS.
## followers_count statuses_count friends_count created_at
## 3 36357676 8650 48 Sun Mar 11 17:51:24 +0000 2007
## location
## 3 Washington, DC
users$screen_name[which.max(users$followers_count)]
## [1] "JoeBiden"
Download up to 3,200 recent tweets from a Twitter account:
getTimeline(filename="../data/uscpoir.json",
screen_name="uscpoir", n=200, oauth=my_oauth)
## 200 tweets. Max id: 1432375952936427523
What are the most common hashtags?
tweets <- parseTweets("../data/uscpoir.json")
## 200 tweets have been parsed.
ht <- str_extract_all(tweets$text, '#[A-Za-z0-9_]+')
ht <- unlist(ht)
head(sort(table(ht), decreasing = TRUE))
## ht
## #COP26 #ICYMI #Afghanistan #humanrights #environmental
## 3 3 2 2 1
## #Eurasia
## 1
Download friends and followers:
followers <- getFollowers("uscpoir",
oauth=my_oauth)
## 15 API calls left
## 1053 followers. Next cursor: 0
## 14 API calls left
friends <- getFriends("uscpoir",
oauth=my_oauth)
## 15 API calls left
## 221 friends. Next cursor: 0
## 14 API calls left
What are the most common words that friends of the uscpoir account use to describe themselves on Twitter?
# extract profile descriptions
users <- getUsersBatch(ids=friends, oauth=my_oauth)
## 1--221 users left
## 2--121 users left
## 3--21 users left
# create table with frequency of word use
library(quanteda)
## Package version: 3.2.3
## Unicode version: 14.0
## ICU version: 70.1
## Parallel computing: 8 of 8 threads used.
## See https://quanteda.io for tutorials and examples.
tw <- tokens(corpus(users$description[users$description!=""]),remove_punct=TRUE)
dfm <- dfm_remove(dfm(tw), c(stopwords("english"), stopwords("spanish"),
"t.co", "https", "rt", "rts", "http"))
topfeatures(dfm, n = 20)
## | usc political science politics
## 98 63 56 37 29
## @usc official university research public
## 27 24 24 22 21
## professor @uscpoir phd california southern
## 20 20 20 19 19
## news international policy twitter account
## 16 16 16 15 15
The tweetscores
package also includes functions to
replicate the method developed in the Political Analysis paper Birds of a Feather
Tweet Together. Bayesian Ideal Point Estimation Using Twitter
Data. For an application of this method, see also this
Monkey Cage blog post.
# download list of friends for an account
user <- "DonaldJTrumpJr"
friends <- getFriends(user, oauth=my_oauth)
## 14 API calls left
## 1702 friends. Next cursor: 0
## 13 API calls left
# estimating ideology with correspondence analysis method
(theta <- estimateIdeology2(user, friends, verbose=FALSE))
## DonaldJTrumpJr follows 157 elites: ABC, AEI, AlbertBrooks, AllenWest, AmbJohnBolton, andersoncooper, AndreaTantaros, AnnCoulter, AP, AriFleischer, azizansari, BBCBreaking, BBCWorld, benshapiro, BillHemmer, BillyHallowell, BreakingNews, BreitbartNews, BretBaier, brithume, BuzzFeed, ByronYork, CharlieDaniels, ChrisLoesch, chrisrock, chucktodd, chuckwoolery, CNBC, CNN, cnnbrk, DailyCaller, daveweigel, davidgregory, DavidLimbaugh, davidwebbshow, DineshDSouza, DLoesch, DRUDGE, DRUDGE_REPORT, ericbolling, EWErickson, FareedZakaria, FinancialTimes, ForAmerica, Forbes, foxandfriends, FoxNews, foxnewspolitics, funnyordie, Gabby_Hoffman, GarySinise, Gawker, ggreenwald, glennbeck, GOP, gopleader, GovChristie, GovernorPerry, GovMikeHuckabee, greggutfeld, GStephanopoulos, hardball_chris, Heritage, HeyTammyBruce, HouseGOP, HuffingtonPost, hughhewitt, iamjohnoliver, IngrahamAngle, iowahawkblog, JamesRosenFNC, jasoninthehouse, jim_jordan, JimDeMint, jimmyfallon, Judgenap, JudicialWatch, kanyewest, KatiePavlich, kilmeade, kimguilfoyle, krauthammer, KurtSchlichter, larryelder, LindaSuhler, loudobbsnews, LukeRussert, marcorubio, marklevinshow, MarkSteynOnline, marshablackburn, marthamaccallum, megynkelly, michellemalkin, mitchellreports, mkhammer, MonicaCrowley, mtaibbi, NASA, neiltyson, newsbusters, newtgingrich, NolteNC, NRA, NRO, nytimes, oreillyfactor, politico, ppppolls, RealBenCarson, RealJamesWoods, RedState, Reince, repdianeblack, repgosar, repjeffduncan, reploubarletta, replouiegohmert, repmobrooks, reppaulryan, repseanduffy, repthomasmassie, Reuters, SarahPalinUSA, scrowder, seanhannity, Senate_GOPs, senatortimscott, senmikelee, sentedcruz, ShannonBream, SharylAttkisson, stevekingia, SteveMartinToGo, tamronhall, TeamCavuto, tedcruz, TedNugent, TEDTalks, tgowdysc, TheAtlantic, theblaze, TheEconomist, thenation, TheOnion, ThomasSowell, TODAYshow, toddstarnes, townhallcom, TuckerCarlson, TwitchyTeam, VanityFair, washingtonpost, WayneDupreeShow, wikileaks, WSJ, YoungCons
## [1] 1.566959
The REST API offers also a long list of other endpoints that could be of use at some point, depending on your research interests.
users <- searchUsers(q="uscpoir", count=100, oauth=my_oauth)
users$screen_name[1:10]
## [1] "uscpoir" "p_barbera" "nolahaynes_" "as_hartnett"
## [5] "JennCryer" "audryewong" "clairebcrawford" "shallow__state"
## [9] "pongkwans" "miguelmaria"
# Downloading tweets when you know the ID
getStatuses(ids=c('1454115859534950406', '1452687910055002115',
'1451896893743767555'),
filename="../data/old-tweets.json",
oauth=my_oauth)
## 900 API calls left
## 3 tweets left.
## 0 tweets left.
## 899 API calls left
parseTweets("../data/old-tweets.json")
## 3 tweets have been parsed.
## text
## 1 Signing the Paris Agreement to fight climate change was one of my proudest moments in office. But it was always a foundation to build on. As world leaders gather for COP26, I shared some reflections on the road to Paris and the young activists who are pushing us further. https://t.co/EebSdTQ6QS
## 2 When you look at the history of big social movements, they’re usually started and sustained by young people who put in the work to make it happen. As we look to COP26, I’m inspired by the young people using their voices in the fight against climate change. https://t.co/IBdciqn8ml
## 3 Some of the most important changes often start in state legislatures. That's why I'm proud to support these candidates for the Virginia state legislature. I hope you'll join me and the @DLCC in giving them your vote. https://t.co/jt3brA4F3B
## retweet_count favorite_count favorited truncated id_str
## 1 2491 14289 FALSE FALSE 1454115859534950406
## 2 2384 14864 FALSE FALSE 1452687910055002115
## 3 2860 12432 FALSE FALSE 1451896893743767555
## in_reply_to_screen_name
## 1 NA
## 2 NA
## 3 NA
## source
## 1 <a href="https://studio.twitter.com" rel="nofollow">Twitter Media Studio</a>
## 2 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
## 3 <a href="https://mobile.twitter.com" rel="nofollow">Twitter Web App</a>
## retweeted created_at in_reply_to_status_id_str
## 1 FALSE Fri Oct 29 16:00:11 +0000 2021 NA
## 2 FALSE Mon Oct 25 17:26:02 +0000 2021 NA
## 3 FALSE Sat Oct 23 13:02:49 +0000 2021 NA
## in_reply_to_user_id_str lang listed_count verified location user_id_str
## 1 NA en 220276 TRUE Washington, DC 813286
## 2 NA en 220276 TRUE Washington, DC 813286
## 3 NA en 220276 TRUE Washington, DC 813286
## description geo_enabled user_created_at
## 1 Dad, husband, President, citizen. FALSE Mon Mar 05 22:08:25 +0000 2007
## 2 Dad, husband, President, citizen. FALSE Mon Mar 05 22:08:25 +0000 2007
## 3 Dad, husband, President, citizen. FALSE Mon Mar 05 22:08:25 +0000 2007
## statuses_count followers_count favourites_count protected
## 1 16651 133438013 5 FALSE
## 2 16651 133438013 5 FALSE
## 3 16651 133438013 5 FALSE
## user_url name time_zone user_lang utc_offset
## 1 https://t.co/kHvnxozw8x Barack Obama NA NA NA
## 2 https://t.co/kHvnxozw8x Barack Obama NA NA NA
## 3 https://t.co/kHvnxozw8x Barack Obama NA NA NA
## friends_count screen_name country_code country place_type full_name
## 1 577332 BarackObama NA NA NA NA
## 2 577332 BarackObama NA NA NA NA
## 3 577332 BarackObama NA NA NA NA
## place_name place_id place_lat place_lon lat lon
## 1 NA NA NaN NaN NA NA
## 2 NA NA NaN NaN NA NA
## 3 NA NA NaN NaN NA NA
## expanded_url
## 1 <NA>
## 2 https://twitter.com/YouTube/status/1452022851222048774
## 3 https://twitter.com/DLCC/status/1451896240543764494
## url
## 1 <NA>
## 2 https://t.co/IBdciqn8ml
## 3 https://t.co/jt3brA4F3B
# download user information from a list
govs <- getList(list_id="7560205", oauth=my_oauth)
## 900 API calls left
## 20 users in list. Next cursor: 5427968139194679296
## 899 API calls left
## 40 users in list. Next cursor: 4611686018662284436
## 898 API calls left
## 54 users in list. Next cursor: 0
## 897 API calls left
head(govs)
## id id_str name screen_name
## 1 1.201314e+18 1201313519662067712 Governor Ralph DLG. Torres GovernorCNMI
## 2 1.192550e+18 1192549822865297409 Governor Andy Beshear GovAndyBeshear
## 3 1.099290e+18 1099290316920799233 Governor Albert Bryan Jr govbryan
## 4 1.086522e+18 1086521832671346688 Governor Lou Leon Guerrero louleonguerrero
## 5 1.084925e+18 1084924525232513025 Gov. Bill Lee GovBillLee
## 6 1.084818e+18 1084817523378454529 Governor JB Pritzker GovPritzker
## location
## 1 Northern Mariana Islands
## 2 Frankfort, Kentucky
## 3 Virgin Islands, U.S.
## 4 Guam
## 5 State of Tennessee
## 6
## description
## 1 Father, Husband, Public Servant, and the 9th Governor of the Commonwealth of the Northern Mariana Islands (CNMI) 🇲🇵 #TheMarianas
## 2 The official account of the 63rd Governor of the Commonwealth of Kentucky. Tweets from Andy are signed ^AB. #TeamKentucky #TogetherKy
## 3 Father. Husband. Ninth Elected Governor of the U.S. Virgin Islands.
## 4 Nurse. Businesswoman. Policymaker. Mother. Grandmother. 9th Governor of Guam. 🇬🇺
## 5 50th Governor of Tennessee
## 6 Husband and father. Proudly serving as Illinois’ 43rd governor.
## url followers_count friends_count
## 1 https://t.co/ckMjZPpuMk 443 28
## 2 https://t.co/rkzEPFKuGH 163925 136
## 3 https://t.co/yERaQ1VBdj 2099 76
## 4 https://t.co/Q2O0kVQua9 3187 158
## 5 https://t.co/cmW7jEv3Wn 110668 158
## 6 https://t.co/35JwCLCMrB 249656 172
## created_at time_zone lang
## 1 Mon Dec 02 01:34:16 +0000 2019 NA NA
## 2 Thu Nov 07 21:10:34 +0000 2019 NA NA
## 3 Sat Feb 23 12:50:10 +0000 2019 NA NA
## 4 Sat Jan 19 07:12:46 +0000 2019 NA NA
## 5 Mon Jan 14 21:25:38 +0000 2019 NA NA
## 6 Mon Jan 14 14:20:27 +0000 2019 NA NA
This is also useful if e.g. you’re interested in compiling lists of journalists, because media outlets offer these lists in their profiles.
# Download list of users who retweeted a tweet (unfortunately, only up to 100)
rts <- getRetweets(id='653733796408377344', oauth=my_oauth)
## 75 API calls left
## 36 retweeters. Next cursor: 0
## 74 API calls left
# https://twitter.com/HillaryClinton/status/653733796408377344
# format Twitter dates to facilitate analysis
tweets <- parseTweets("../data/uscpoir.json")
## 200 tweets have been parsed.
tweets$date <- formatTwDate(tweets$created_at, format="date")
## Warning in Sys.setlocale("LC_TIME", "English"): OS reports request to set locale
## to "English" cannot be honored
hist(tweets$date, breaks="month")