Scraping web data from Facebook

To scrape data from Facebook’s API, we’ll use the Rfacebook package.

library(Rfacebook)
## Loading required package: httr
## Warning: package 'httr' was built under R version 3.4.1
## Loading required package: rjson
## Loading required package: httpuv
## Warning: package 'httpuv' was built under R version 3.4.1
## 
## Attaching package: 'Rfacebook'
## The following object is masked from 'package:methods':
## 
##     getGroup

To get access to the Facebook API, you need an OAuth code. You can get yours going to the following URL: https://developers.facebook.com/tools/explorer

Once you’re there:
1. Click on “Get Access Token”
2. Copy the long code (“Access Token”) and paste it here below, substituting the fake one I wrote:

fb_oauth = 'EAACEdEose0cBAIG0ZAqsxAlZCDoINqOvDPSOsOx45MSsCbmZBVTyoqU8E1mkILz4eFzo4mLHLP4O8z14Kq50S75Ook8J3sL1pGPNtWplwKigTGuJZA5FOjYrbp4sm9PdqwY4MADBnUcbuZAbAJ1S7abAgI5Ist3ztlBEDAMxsERnS3oiXUKE6LgRT8L4tdjMZD'

Now try running the following line:

getUsers("me", token=fb_oauth, private_info=TRUE)
## Warning in strptime(x, fmt, tz = "GMT"): unknown timezone 'default/America/
## Los_Angeles'
##          id          name username first_name middle_name last_name gender
## 1 557698085 Pablo Barberá       NA         NA          NA        NA     NA
##   locale likes picture birthday location hometown relationship_status
## 1     NA    NA      NA       NA       NA       NA                  NA

Does it return your Facebook public information? Yes? Then we’re ready to go. See also ?fbOAuth for information on how to get a long-lived OAuth token.

At the moment, the only information that can be scraped from Facebook is the content of public pages.

The following line downloads the ~200 most recent posts on the facebook page of Donald Trump

page <- getPage("DonaldTrump", token=fb_oauth, n=20, reactions=TRUE, api="v2.9") 
## 20 posts

What information is available for each of these posts?

page[1,]
##                               id likes_count      from_id       from_name
## 1 153080620724_10160021837600725       36086 153080620724 Donald J. Trump
##                                                                                                                                  message
## 1 Republicans are going for the big Budget approval today, first step toward massive tax cuts. I think we have the votes, but who knows?
##               created_time  type
## 1 2017-10-19T16:56:02+0000 photo
##                                                                                                                link
## 1 https://www.facebook.com/DonaldTrump/photos/a.10156483516640725.1073741830.153080620724/10160021837560725/?type=3
##   story comments_count shares_count love_count haha_count wow_count
## 1  <NA>           4907         1887       3346        653        89
##   sad_count angry_count
## 1        46         169

Which post got more likes, more comments, and more shares?

page[which.max(page$likes_count),]
##                                id likes_count      from_id       from_name
## 14 153080620724_10160043106930725       78612 153080620724 Donald J. Trump
##                                                                                                                                                                             message
## 14 Perhaps no Administration has done more in its first 9 months than this Administration. We are bringing back the AMERICAN DREAM!\xed\xa0\xbc\xed\xb7\xba\xed\xa0\xbc\xed\xb7\xb8
##                created_time  type
## 14 2017-10-23T19:27:00+0000 video
##                                                              link story
## 14 https://www.facebook.com/DonaldTrump/videos/10160043106930725/  <NA>
##    comments_count shares_count love_count haha_count wow_count sad_count
## 14           9632        15132         NA         NA        NA        NA
##    angry_count
## 14          NA
page[which.max(page$comments_count),]
##                               id likes_count      from_id       from_name
## 6 153080620724_10160027734720725       51492 153080620724 Donald J. Trump
##                                                                                                  message
## 6 Watch REAL news to learn how President Trump is working harder than ever for you, the American people!
##               created_time  type
## 6 2017-10-20T20:42:32+0000 video
##                                                             link story
## 6 https://www.facebook.com/DonaldTrump/videos/10160027734720725/  <NA>
##   comments_count shares_count love_count haha_count wow_count sad_count
## 6          12083         8761       5495       2328       115        41
##   angry_count
## 6         282
page[which.max(page$shares_count),]
##                               id likes_count      from_id       from_name
## 8 153080620724_10160032085795725       67164 153080620724 Donald J. Trump
##                                                                                                                                 message
## 8 Stock Market hits another ALL TIME HIGH on Friday. 5.3 trillion dollars up since Election. Fake News doesn't spend much time on this!
##               created_time  type
## 8 2017-10-21T17:39:14+0000 video
##                                                             link story
## 8 https://www.facebook.com/DonaldTrump/videos/10160032085795725/  <NA>
##   comments_count shares_count love_count haha_count wow_count sad_count
## 8          10277        18230       7793        603      1354        26
##   angry_count
## 8         103

What about other reactions?

page[which.max(page$love_count),]
##                                id likes_count      from_id       from_name
## 15 153080620724_10160044761135725       65837 153080620724 Donald J. Trump
##                                                                                                                                                        message
## 15 Today we gathered to tell the world of Captain Gary Rose’s valor and to proudly present him with our nation’s highest military honor. http://bit.ly/2i0P6ZY
##                created_time  type
## 15 2017-10-23T23:53:41+0000 photo
##                                                                                                        link
## 15 https://www.facebook.com/DonaldTrump/photos/a.488852220724.393301.153080620724/10160044753440725/?type=3
##    story comments_count shares_count love_count haha_count wow_count
## 15  <NA>           4011         5934      10831        108       335
##    sad_count angry_count
## 15        18          42
page[which.max(page$haha_count),]
##                                id likes_count      from_id       from_name
## 18 153080620724_10160048592495725       19085 153080620724 Donald J. Trump
##                                                                                                                                           message
## 18 Bob Corker, who helped President Obama give us the bad Iran Deal & couldn't get elected dog catcher in Tennessee, is now fighting Tax Cuts....
##                created_time  type
## 18 2017-10-24T17:45:31+0000 video
##                                                              link story
## 18 https://www.facebook.com/DonaldTrump/videos/10160048592495725/  <NA>
##    comments_count shares_count love_count haha_count wow_count sad_count
## 18           7521         3184       1033       3824       448       268
##    angry_count
## 18        2312
page[which.max(page$wow_count),]
##                               id likes_count      from_id       from_name
## 8 153080620724_10160032085795725       67164 153080620724 Donald J. Trump
##                                                                                                                                 message
## 8 Stock Market hits another ALL TIME HIGH on Friday. 5.3 trillion dollars up since Election. Fake News doesn't spend much time on this!
##               created_time  type
## 8 2017-10-21T17:39:14+0000 video
##                                                             link story
## 8 https://www.facebook.com/DonaldTrump/videos/10160032085795725/  <NA>
##   comments_count shares_count love_count haha_count wow_count sad_count
## 8          10277        18230       7793        603      1354        26
##   angry_count
## 8         103
page[which.max(page$sad_count),]
##                               id likes_count      from_id       from_name
## 5 153080620724_10160027237345725       39703 153080620724 Donald J. Trump
##                                                                                                                                  message
## 5 Just out report: "United Kingdom crime rises 13% annually amid spread of Radical Islamic terror." Not good, we must keep America safe!
##               created_time  type
## 5 2017-10-20T17:43:00+0000 photo
##                                                                                                       link
## 5 https://www.facebook.com/DonaldTrump/photos/a.488852220724.393301.153080620724/10160027235790725/?type=3
##   story comments_count shares_count love_count haha_count wow_count
## 5  <NA>           7513         4771       2880        881       828
##   sad_count angry_count
## 5       467         423
page[which.max(page$angry_count),]
##                                id likes_count      from_id       from_name
## 18 153080620724_10160048592495725       19085 153080620724 Donald J. Trump
##                                                                                                                                           message
## 18 Bob Corker, who helped President Obama give us the bad Iran Deal & couldn't get elected dog catcher in Tennessee, is now fighting Tax Cuts....
##                created_time  type
## 18 2017-10-24T17:45:31+0000 video
##                                                              link story
## 18 https://www.facebook.com/DonaldTrump/videos/10160048592495725/  <NA>
##    comments_count shares_count love_count haha_count wow_count sad_count
## 18           7521         3184       1033       3824       448       268
##    angry_count
## 18        2312

Let’s do another example, looking at the Facebook page of USC POIR:

page <- getPage("USCPOIR", token=fb_oauth, n=100, reactions=TRUE, api="v2.9") 
## 25 posts
# most popular posts
page[which.max(page$likes_count),]
##                                  id likes_count         from_id from_name
## 10 476158409070913_1570298249656918          26 476158409070913  USC POIR
##                                                                                                                                                                                            message
## 10 Tom Jamieson, POIR PhD student, was awarded the Distinguished Jr Scholars Award in Political Psychology at the American Political Science Association 2017 Annual Meeting. Congratulations Tom!
##                created_time  type
## 10 2017-09-11T19:34:27+0000 photo
##                                                                                                              link
## 10 https://www.facebook.com/USCPOIR/photos/a.1566900939996649.1073741826.476158409070913/1570297326323677/?type=3
##    story comments_count shares_count love_count haha_count wow_count
## 10  <NA>              0            1          0          0         0
##    sad_count angry_count
## 10         0           0
page[which.max(page$comments_count),]
##                                  id likes_count         from_id from_name
## 15 476158409070913_1571988159487927          20 476158409070913  USC POIR
##                                                                                                                                                                                                                                                                                                                                                                                                                                                               message
## 15 PhD student Kelly Zvobgo & POIR Professor Ben Graham presented research on the World Bank at the American Political Science Association's annual meeting earlier this month. Their paper, entitled The World Bank as an Enforcer of Human Rights was presented in the  International Collaboration section. Kelly & Professor Graham are pictured here talking with other attendees on the first day of the conference. They gave their presentation the same day.
##                created_time  type
## 15 2017-09-13T19:34:43+0000 photo
##                                                                                                              link
## 15 https://www.facebook.com/USCPOIR/photos/a.1566900939996649.1073741826.476158409070913/1571979469488796/?type=3
##    story comments_count shares_count love_count haha_count wow_count
## 15  <NA>              3            2          4          0         0
##    sad_count angry_count
## 15         0           0
page[which.max(page$shares_count),]
##                                  id likes_count         from_id from_name
## 11 476158409070913_1570387856314624          16 476158409070913  USC POIR
##                                                                                                                                                                                                                                                                                                                                                                                                message
## 11 POIR Professor Jane Junn teamed up with POIR students Michelle Cornelius and Dave Ebner and POIR alum Justin Berry (Assistant Professor of Political Science at Kalamazoo College) to present research at the American Political Science Association 2017 Annual Meeting. Their paper is entitled Identifying White Racial Identifiers and was presented as a part of the Race & Ethnicity section.
##                created_time  type
## 11 2017-09-11T22:10:37+0000 photo
##                                                                                                              link
## 11 https://www.facebook.com/USCPOIR/photos/a.1566900939996649.1073741826.476158409070913/1570385642981512/?type=3
##    story comments_count shares_count love_count haha_count wow_count
## 11  <NA>              2            2          2          0         0
##    sad_count angry_count
## 11         0           0

We can also subset by date. For example, imagine we want to get all the posts from early November 2012 on Barack Obama’s Facebook page

page <- getPage("barackobama", token=fb_oauth, n=100,
    since='2012/11/01', until='2012/11/10')
## 25 posts 29 posts
page[which.max(page$likes_count),]
##      from_id    from_name          message             created_time  type
## 4 6815841748 Barack Obama Four more years. 2012-11-07T04:15:08+0000 photo
##                                                                                                   link
## 4 https://www.facebook.com/barackobama/photos/a.53081056748.66806.6815841748/10151255420886749/?type=3
##                             id story likes_count comments_count
## 4 6815841748_10151255420886749  <NA>     4826415         218939
##   shares_count
## 4       658914

And if we need to, we can also extract the specific comments from each post.

post_id <- page$id[which.max(page$likes_count)]
post <- getPost(post_id, token=fb_oauth, n.comments=1000, likes=FALSE)

This is how you can view those comments:

comments <- post$comments
head(comments)
##             from_id       from_name   message             created_time
## 1   509226872540260  Jesse Talafili   OBAMA ! 2012-11-07T04:15:16+0000
## 2   485613484893917 Zain Ahmed Turk      yayy 2012-11-07T04:15:17+0000
## 3      675870897427   Gary D Ploski        <3 2012-11-07T04:15:17+0000
## 4   802034289809838     David Furka       YES 2012-11-07T04:15:18+0000
## 5 10201918108506766      Pinky Keys        :X 2012-11-07T04:15:18+0000
## 6 10102278537299904     Zac Bowling Hell yes! 2012-11-07T04:15:19+0000
##   likes_count comments_count                         id
## 1          18              0 10151255420886749_11954305
## 2           3              0 10151255420886749_11954306
## 3           2              0 10151255420886749_11954307
## 4           5              0 10151255420886749_11954309
## 5           1              0 10151255420886749_11954311
## 6           9              0 10151255420886749_11954315

Also, note that users can like comments! What is the comment that got the most likes?

comments[which.max(comments$likes_count),]
##           from_id      from_name message             created_time
## 1 509226872540260 Jesse Talafili OBAMA ! 2012-11-07T04:15:16+0000
##   likes_count comments_count                         id
## 1          18              0 10151255420886749_11954305

This is how you get nested comments:

page <- getPage("barackobama", token=fb_oauth, n=1)
## 1 posts
post <- getPost(page$id, token=fb_oauth, comments=TRUE, n=100, likes=FALSE)
comment <- getCommentReplies(post$comments$id[1],
                             token=fb_oauth, n=500, likes=TRUE)

If we want to scrape an entire page that contains many posts, given that the API can sometimes give an error, it is a good idea to embed the function within a loop and collect the data by month.

# list of dates to sample
dates <- seq(as.Date("2011/01/01"), as.Date("2017/08/01"), by="3 months")
n <- length(dates)-1
df <- list()
# loop over months
for (i in 1:n){
    message(as.character(dates[i]))
    df[[i]] <- getPage("GameOfThrones", token=fb_oauth, n=1000, since=dates[i],
        until=dates[i+1], verbose=FALSE)
    Sys.sleep(0.5)
}
df <- do.call(rbind, df)
write.csv(df, file="../data/gameofthrones.csv", row.names=FALSE)

And we can then look at the popularity over time:

library(netdemR)
## Loading required package: ROAuth
## Loading required package: jsonlite
## 
## Attaching package: 'jsonlite'
## The following objects are masked from 'package:rjson':
## 
##     fromJSON, toJSON
## ##
## ## netdemR: tools for analysis of Twitter data
## ## Networked Democracy Lab at USC
## ## netdem.org
## ##
## 
## Attaching package: 'netdemR'
## The following objects are masked from 'package:Rfacebook':
## 
##     getFriends, getUsers
library(stringr)
library(reshape2)
df <- read.csv("../data/gameofthrones.csv", stringsAsFactors=FALSE)
# parse date into month
df$month <- df$created_time %>% str_sub(1, 7) %>% paste0("-01") %>% as.Date()
# computing average by month
metrics <- aggregate(cbind(likes_count, comments_count, shares_count) ~ month,
          data=df, FUN=mean)
# reshaping into long format
metrics <- melt(metrics, id.vars="month")
# visualize evolution in metric
library(ggplot2)
library(scales)
ggplot(metrics, aes(x = month, y = value, group = variable)) + 
  geom_line(aes(color = variable)) + 
    scale_x_date(date_breaks = "years", labels = date_format("%Y")) + 
  scale_y_log10("Average count per post", 
    breaks = c(10, 100, 1000, 10000, 100000, 200000), labels=scales::comma) + 
  theme_bw() + theme(axis.title.x = element_blank())

Just like public Facebook pages, the data from public groups can also be easily downloaded with the getGroup function. Note that this will only work for groups that the authenticated user is a member of.

group <- getGroup("150048245063649", token=fb_oauth, n=50)
## 25 posts 50 posts