To scrape data from Facebook’s API, we’ll use the Rfacebook
package.
library(Rfacebook)
## Loading required package: httr
## Warning: package 'httr' was built under R version 3.4.1
## Loading required package: rjson
## Loading required package: httpuv
## Warning: package 'httpuv' was built under R version 3.4.1
##
## Attaching package: 'Rfacebook'
## The following object is masked from 'package:methods':
##
## getGroup
To get access to the Facebook API, you need an OAuth code. You can get yours going to the following URL: https://developers.facebook.com/tools/explorer
Once you’re there:
1. Click on “Get Access Token”
2. Copy the long code (“Access Token”) and paste it here below, substituting the fake one I wrote:
fb_oauth = 'EAACEdEose0cBAIG0ZAqsxAlZCDoINqOvDPSOsOx45MSsCbmZBVTyoqU8E1mkILz4eFzo4mLHLP4O8z14Kq50S75Ook8J3sL1pGPNtWplwKigTGuJZA5FOjYrbp4sm9PdqwY4MADBnUcbuZAbAJ1S7abAgI5Ist3ztlBEDAMxsERnS3oiXUKE6LgRT8L4tdjMZD'
Now try running the following line:
getUsers("me", token=fb_oauth, private_info=TRUE)
## Warning in strptime(x, fmt, tz = "GMT"): unknown timezone 'default/America/
## Los_Angeles'
## id name username first_name middle_name last_name gender
## 1 557698085 Pablo Barberá NA NA NA NA NA
## locale likes picture birthday location hometown relationship_status
## 1 NA NA NA NA NA NA NA
Does it return your Facebook public information? Yes? Then we’re ready to go. See also ?fbOAuth
for information on how to get a long-lived OAuth token.
At the moment, the only information that can be scraped from Facebook is the content of public pages.
The following line downloads the ~200 most recent posts on the facebook page of Donald Trump
page <- getPage("DonaldTrump", token=fb_oauth, n=20, reactions=TRUE, api="v2.9")
## 20 posts
What information is available for each of these posts?
page[1,]
## id likes_count from_id from_name
## 1 153080620724_10160021837600725 36086 153080620724 Donald J. Trump
## message
## 1 Republicans are going for the big Budget approval today, first step toward massive tax cuts. I think we have the votes, but who knows?
## created_time type
## 1 2017-10-19T16:56:02+0000 photo
## link
## 1 https://www.facebook.com/DonaldTrump/photos/a.10156483516640725.1073741830.153080620724/10160021837560725/?type=3
## story comments_count shares_count love_count haha_count wow_count
## 1 <NA> 4907 1887 3346 653 89
## sad_count angry_count
## 1 46 169
Which post got more likes, more comments, and more shares?
page[which.max(page$likes_count),]
## id likes_count from_id from_name
## 14 153080620724_10160043106930725 78612 153080620724 Donald J. Trump
## message
## 14 Perhaps no Administration has done more in its first 9 months than this Administration. We are bringing back the AMERICAN DREAM!\xed\xa0\xbc\xed\xb7\xba\xed\xa0\xbc\xed\xb7\xb8
## created_time type
## 14 2017-10-23T19:27:00+0000 video
## link story
## 14 https://www.facebook.com/DonaldTrump/videos/10160043106930725/ <NA>
## comments_count shares_count love_count haha_count wow_count sad_count
## 14 9632 15132 NA NA NA NA
## angry_count
## 14 NA
page[which.max(page$comments_count),]
## id likes_count from_id from_name
## 6 153080620724_10160027734720725 51492 153080620724 Donald J. Trump
## message
## 6 Watch REAL news to learn how President Trump is working harder than ever for you, the American people!
## created_time type
## 6 2017-10-20T20:42:32+0000 video
## link story
## 6 https://www.facebook.com/DonaldTrump/videos/10160027734720725/ <NA>
## comments_count shares_count love_count haha_count wow_count sad_count
## 6 12083 8761 5495 2328 115 41
## angry_count
## 6 282
page[which.max(page$shares_count),]
## id likes_count from_id from_name
## 8 153080620724_10160032085795725 67164 153080620724 Donald J. Trump
## message
## 8 Stock Market hits another ALL TIME HIGH on Friday. 5.3 trillion dollars up since Election. Fake News doesn't spend much time on this!
## created_time type
## 8 2017-10-21T17:39:14+0000 video
## link story
## 8 https://www.facebook.com/DonaldTrump/videos/10160032085795725/ <NA>
## comments_count shares_count love_count haha_count wow_count sad_count
## 8 10277 18230 7793 603 1354 26
## angry_count
## 8 103
What about other reactions?
page[which.max(page$love_count),]
## id likes_count from_id from_name
## 15 153080620724_10160044761135725 65837 153080620724 Donald J. Trump
## message
## 15 Today we gathered to tell the world of Captain Gary Rose’s valor and to proudly present him with our nation’s highest military honor. http://bit.ly/2i0P6ZY
## created_time type
## 15 2017-10-23T23:53:41+0000 photo
## link
## 15 https://www.facebook.com/DonaldTrump/photos/a.488852220724.393301.153080620724/10160044753440725/?type=3
## story comments_count shares_count love_count haha_count wow_count
## 15 <NA> 4011 5934 10831 108 335
## sad_count angry_count
## 15 18 42
page[which.max(page$haha_count),]
## id likes_count from_id from_name
## 18 153080620724_10160048592495725 19085 153080620724 Donald J. Trump
## message
## 18 Bob Corker, who helped President Obama give us the bad Iran Deal & couldn't get elected dog catcher in Tennessee, is now fighting Tax Cuts....
## created_time type
## 18 2017-10-24T17:45:31+0000 video
## link story
## 18 https://www.facebook.com/DonaldTrump/videos/10160048592495725/ <NA>
## comments_count shares_count love_count haha_count wow_count sad_count
## 18 7521 3184 1033 3824 448 268
## angry_count
## 18 2312
page[which.max(page$wow_count),]
## id likes_count from_id from_name
## 8 153080620724_10160032085795725 67164 153080620724 Donald J. Trump
## message
## 8 Stock Market hits another ALL TIME HIGH on Friday. 5.3 trillion dollars up since Election. Fake News doesn't spend much time on this!
## created_time type
## 8 2017-10-21T17:39:14+0000 video
## link story
## 8 https://www.facebook.com/DonaldTrump/videos/10160032085795725/ <NA>
## comments_count shares_count love_count haha_count wow_count sad_count
## 8 10277 18230 7793 603 1354 26
## angry_count
## 8 103
page[which.max(page$sad_count),]
## id likes_count from_id from_name
## 5 153080620724_10160027237345725 39703 153080620724 Donald J. Trump
## message
## 5 Just out report: "United Kingdom crime rises 13% annually amid spread of Radical Islamic terror." Not good, we must keep America safe!
## created_time type
## 5 2017-10-20T17:43:00+0000 photo
## link
## 5 https://www.facebook.com/DonaldTrump/photos/a.488852220724.393301.153080620724/10160027235790725/?type=3
## story comments_count shares_count love_count haha_count wow_count
## 5 <NA> 7513 4771 2880 881 828
## sad_count angry_count
## 5 467 423
page[which.max(page$angry_count),]
## id likes_count from_id from_name
## 18 153080620724_10160048592495725 19085 153080620724 Donald J. Trump
## message
## 18 Bob Corker, who helped President Obama give us the bad Iran Deal & couldn't get elected dog catcher in Tennessee, is now fighting Tax Cuts....
## created_time type
## 18 2017-10-24T17:45:31+0000 video
## link story
## 18 https://www.facebook.com/DonaldTrump/videos/10160048592495725/ <NA>
## comments_count shares_count love_count haha_count wow_count sad_count
## 18 7521 3184 1033 3824 448 268
## angry_count
## 18 2312
Let’s do another example, looking at the Facebook page of USC POIR:
page <- getPage("USCPOIR", token=fb_oauth, n=100, reactions=TRUE, api="v2.9")
## 25 posts
# most popular posts
page[which.max(page$likes_count),]
## id likes_count from_id from_name
## 10 476158409070913_1570298249656918 26 476158409070913 USC POIR
## message
## 10 Tom Jamieson, POIR PhD student, was awarded the Distinguished Jr Scholars Award in Political Psychology at the American Political Science Association 2017 Annual Meeting. Congratulations Tom!
## created_time type
## 10 2017-09-11T19:34:27+0000 photo
## link
## 10 https://www.facebook.com/USCPOIR/photos/a.1566900939996649.1073741826.476158409070913/1570297326323677/?type=3
## story comments_count shares_count love_count haha_count wow_count
## 10 <NA> 0 1 0 0 0
## sad_count angry_count
## 10 0 0
page[which.max(page$comments_count),]
## id likes_count from_id from_name
## 15 476158409070913_1571988159487927 20 476158409070913 USC POIR
## message
## 15 PhD student Kelly Zvobgo & POIR Professor Ben Graham presented research on the World Bank at the American Political Science Association's annual meeting earlier this month. Their paper, entitled The World Bank as an Enforcer of Human Rights was presented in the International Collaboration section. Kelly & Professor Graham are pictured here talking with other attendees on the first day of the conference. They gave their presentation the same day.
## created_time type
## 15 2017-09-13T19:34:43+0000 photo
## link
## 15 https://www.facebook.com/USCPOIR/photos/a.1566900939996649.1073741826.476158409070913/1571979469488796/?type=3
## story comments_count shares_count love_count haha_count wow_count
## 15 <NA> 3 2 4 0 0
## sad_count angry_count
## 15 0 0
page[which.max(page$shares_count),]
## id likes_count from_id from_name
## 11 476158409070913_1570387856314624 16 476158409070913 USC POIR
## message
## 11 POIR Professor Jane Junn teamed up with POIR students Michelle Cornelius and Dave Ebner and POIR alum Justin Berry (Assistant Professor of Political Science at Kalamazoo College) to present research at the American Political Science Association 2017 Annual Meeting. Their paper is entitled Identifying White Racial Identifiers and was presented as a part of the Race & Ethnicity section.
## created_time type
## 11 2017-09-11T22:10:37+0000 photo
## link
## 11 https://www.facebook.com/USCPOIR/photos/a.1566900939996649.1073741826.476158409070913/1570385642981512/?type=3
## story comments_count shares_count love_count haha_count wow_count
## 11 <NA> 2 2 2 0 0
## sad_count angry_count
## 11 0 0
We can also subset by date. For example, imagine we want to get all the posts from early November 2012 on Barack Obama’s Facebook page
page <- getPage("barackobama", token=fb_oauth, n=100,
since='2012/11/01', until='2012/11/10')
## 25 posts 29 posts
page[which.max(page$likes_count),]
## from_id from_name message created_time type
## 4 6815841748 Barack Obama Four more years. 2012-11-07T04:15:08+0000 photo
## link
## 4 https://www.facebook.com/barackobama/photos/a.53081056748.66806.6815841748/10151255420886749/?type=3
## id story likes_count comments_count
## 4 6815841748_10151255420886749 <NA> 4826415 218939
## shares_count
## 4 658914
And if we need to, we can also extract the specific comments from each post.
post_id <- page$id[which.max(page$likes_count)]
post <- getPost(post_id, token=fb_oauth, n.comments=1000, likes=FALSE)
This is how you can view those comments:
comments <- post$comments
head(comments)
## from_id from_name message created_time
## 1 509226872540260 Jesse Talafili OBAMA ! 2012-11-07T04:15:16+0000
## 2 485613484893917 Zain Ahmed Turk yayy 2012-11-07T04:15:17+0000
## 3 675870897427 Gary D Ploski <3 2012-11-07T04:15:17+0000
## 4 802034289809838 David Furka YES 2012-11-07T04:15:18+0000
## 5 10201918108506766 Pinky Keys :X 2012-11-07T04:15:18+0000
## 6 10102278537299904 Zac Bowling Hell yes! 2012-11-07T04:15:19+0000
## likes_count comments_count id
## 1 18 0 10151255420886749_11954305
## 2 3 0 10151255420886749_11954306
## 3 2 0 10151255420886749_11954307
## 4 5 0 10151255420886749_11954309
## 5 1 0 10151255420886749_11954311
## 6 9 0 10151255420886749_11954315
Also, note that users can like comments! What is the comment that got the most likes?
comments[which.max(comments$likes_count),]
## from_id from_name message created_time
## 1 509226872540260 Jesse Talafili OBAMA ! 2012-11-07T04:15:16+0000
## likes_count comments_count id
## 1 18 0 10151255420886749_11954305
This is how you get nested comments:
page <- getPage("barackobama", token=fb_oauth, n=1)
## 1 posts
post <- getPost(page$id, token=fb_oauth, comments=TRUE, n=100, likes=FALSE)
comment <- getCommentReplies(post$comments$id[1],
token=fb_oauth, n=500, likes=TRUE)
If we want to scrape an entire page that contains many posts, given that the API can sometimes give an error, it is a good idea to embed the function within a loop and collect the data by month.
# list of dates to sample
dates <- seq(as.Date("2011/01/01"), as.Date("2017/08/01"), by="3 months")
n <- length(dates)-1
df <- list()
# loop over months
for (i in 1:n){
message(as.character(dates[i]))
df[[i]] <- getPage("GameOfThrones", token=fb_oauth, n=1000, since=dates[i],
until=dates[i+1], verbose=FALSE)
Sys.sleep(0.5)
}
df <- do.call(rbind, df)
write.csv(df, file="../data/gameofthrones.csv", row.names=FALSE)
And we can then look at the popularity over time:
library(netdemR)
## Loading required package: ROAuth
## Loading required package: jsonlite
##
## Attaching package: 'jsonlite'
## The following objects are masked from 'package:rjson':
##
## fromJSON, toJSON
## ##
## ## netdemR: tools for analysis of Twitter data
## ## Networked Democracy Lab at USC
## ## netdem.org
## ##
##
## Attaching package: 'netdemR'
## The following objects are masked from 'package:Rfacebook':
##
## getFriends, getUsers
library(stringr)
library(reshape2)
df <- read.csv("../data/gameofthrones.csv", stringsAsFactors=FALSE)
# parse date into month
df$month <- df$created_time %>% str_sub(1, 7) %>% paste0("-01") %>% as.Date()
# computing average by month
metrics <- aggregate(cbind(likes_count, comments_count, shares_count) ~ month,
data=df, FUN=mean)
# reshaping into long format
metrics <- melt(metrics, id.vars="month")
# visualize evolution in metric
library(ggplot2)
library(scales)
ggplot(metrics, aes(x = month, y = value, group = variable)) +
geom_line(aes(color = variable)) +
scale_x_date(date_breaks = "years", labels = date_format("%Y")) +
scale_y_log10("Average count per post",
breaks = c(10, 100, 1000, 10000, 100000, 200000), labels=scales::comma) +
theme_bw() + theme(axis.title.x = element_blank())
Just like public Facebook pages, the data from public groups can also be easily downloaded with the getGroup function. Note that this will only work for groups that the authenticated user is a member of.
group <- getGroup("150048245063649", token=fb_oauth, n=50)
## 25 posts 50 posts