Scraping web data from Facebook

To scrape data from Facebook’s API, we’ll use the Rfacebook package.

library(Rfacebook)
## Loading required package: httr
## Loading required package: rjson
## Loading required package: httpuv
## 
## Attaching package: 'Rfacebook'
## The following object is masked from 'package:methods':
## 
##     getGroup

To get access to the Facebook API, you need an OAuth code. You can get yours going to the following URL: https://developers.facebook.com/tools/explorer

Once you’re there:
1. Click on “Get Access Token”
2. Copy the long code (“Access Token”) and paste it here below, substituting the fake one I wrote:

fb_oauth = 'EAACEdEose0cBADADWb2qgzCcTM3gUxZCiDn4Lrz3hZCK9Bug7GzqgujhcfoAYjdbavRZCmRN94BdZCFFDOF0ROKkChCp5ZAHPkGsj51gBYzz2FwSBcBvS5ltnZAYAkRZAsIa2g9VDWuRKEdqsUIHD285qWJq3wBJFBZB4ZB59ZAWvGb4pL3cYwb85lCmb11SqHj9oZD'

Now try running the following line:

getUsers("me", token=fb_oauth, private_info=TRUE)
##          id          name username first_name middle_name last_name gender
## 1 557698085 Pablo Barberá       NA      Pablo          NA   Barberá   male
##   locale likes picture   birthday                location       hometown
## 1  en_US    NA      NA 01/12/1986 Los Angeles, California Cáceres, Spain
##   relationship_status
## 1                  NA

Does it return your Facebook public information? Yes? Then we’re ready to go. See also ?fbOAuth for information on how to get a long-lived OAuth token.

At the moment, the only information that can be scraped from Facebook is the content of public pages.

The following line downloads the ~200 most recent posts on the facebook page of Donald Trump

page <- getPage("DonaldTrump", token=fb_oauth, n=20, reactions=TRUE, api="v2.9") 
## 20 posts

What information is available for each of these posts?

page[1,]
##                               id likes_count      from_id       from_name
## 1 153080620724_10159357439260725       45099 153080620724 Donald J. Trump
##                                                                                                                                                                               message
## 1 We're keeping our promises. We're not going to let the same failed and tired voices in Washington keep us from delivering the CHANGE you voted for and the change that YOU DESERVE!
##               created_time  type
## 1 2017-06-24T18:00:00+0000 video
##                                                             link story
## 1 https://www.facebook.com/DonaldTrump/videos/10159357439260725/  <NA>
##   comments_count shares_count love_count haha_count wow_count sad_count
## 1           5551         4215       5859        846        88        31
##   angry_count
## 1         213

Which post got more likes, more comments, and more shares?

page[which.max(page$likes_count),]
##                               id likes_count      from_id       from_name
## 6 153080620724_10159373535885725      180600 153080620724 Donald J. Trump
##                                                                                                                                                                                        message
## 6 Very grateful for the 9-O decision from the U. S. Supreme Court. We must keep America SAFE!\n\nFull statement: whitehouse.gov/the-press-office/2017/06/26/statement-president-donald-j-trump
##               created_time  type
## 6 2017-06-26T18:49:01+0000 photo
##                                                                                                       link
## 6 https://www.facebook.com/DonaldTrump/photos/a.488852220724.393301.153080620724/10159373535885725/?type=3
##   story comments_count shares_count love_count haha_count wow_count
## 6  <NA>           9295        20940      22651       1275       465
##   sad_count angry_count
## 6       105         454
page[which.max(page$comments_count),]
##                                id likes_count      from_id       from_name
## 17 153080620724_10159387848545725       40222 153080620724 Donald J. Trump
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            message
## 17 JOIN US LIVE from the Cabinet Room- \n\nIMMIGRATION ROUNDTABLE....\n\nWe are here today to discuss two crucial votes taking place in Congress tomorrow on vital public safety and national security legislation.\n\nWe are joined by the Chairman of the House Judiciary Committee, Bob Goodlatte.  Bob is one of the most skilled legislators in Congress, and he has worked with law enforcement to write a series of critical immigration bills that will close dangerous loopholes exploited by criminals, gang members, drug dealers, killers, and terrorists. \n\nAlso with us today are Congressmen Peter King, Lou Barletta, and David Young. \n\nTomorrow, the House will vote on the No Sanctuary for Criminals Act, which will cut federal grant money to cities that shield dangerous criminal aliens from being turned over to federal law enforcement.\n\nThe House will also vote on Kate’s Law – named for Kate Steinle, who was killed by an illegal immigrant who had been deported five times. This law will enhance criminal penalties for those who repeatedly re-enter the country illegally.\n\nCountless innocent Americans – including the loved ones of many families in the room with us today – have been killed by illegal immigrants with multiple deportations.\n\nI am especially honored to be here with so many courageousfamilies whom I got to know so well during the campaign. \n\nYou lost the people you loved because our government refused to enforce our nation’s immigration laws. \n\nFor years, the pundits, journalists and politicians in Washington refused to hear your voices.  But, on Election Day 2016, your voices were heard across THE ENTIRE WORLD.\n\nChairman Goodlatte has produced a package of truly key immigration enforcement bills.  This package includes the Davis-Oliver Act – whose passage I called for nearly a year ago at my immigration speech in Phoenix, Arizona. \n\nThe Davis-Oliver Act was named for Detective Michael Davis and Deputy Sheriff Danny Oliver, who were gunned down in the line of duty by an illegal immigrant with a criminal record and two prior deportations. \n\nTheir incredibly brave widows honored us with their presence at my address to Congress.  Today we are privileged to be joined by Melissa Oliver, the daughter of Deputy Sheriff Oliver. Melissa, your father was a HERO and we will never forget him.\n\nWe are calling on ALL members of Congress to honor grieving American Families by passing these life-savingmeasures in the House, IN THE SENATE, and sending them to my desk for signature.  It is time to support our police, to protect our families, and to SAVE AMERICAN LIVES.\n\nSo with that, I’d like to ask each of the families and invited guests to say a few words and to share your stories with the American People, beginning with my good friend Jamiel Shaw.
##                created_time  type
## 17 2017-06-28T19:07:07+0000 video
##                                                              link
## 17 https://www.facebook.com/DonaldTrump/videos/10159387848545725/
##                                             story comments_count
## 17 Donald J. Trump was live — at The White House.          24561
##    shares_count love_count haha_count wow_count sad_count angry_count
## 17         7542      13231        624       319       672        1203
page[which.max(page$shares_count),]
##                                id likes_count      from_id       from_name
## 12 153080620724_10159379423530725      135714 153080620724 Donald J. Trump
##                                                                                                                                                                                                                  message
## 12 Wow, CNN had to retract big story, with 3 employees forced to resign. What about all the other phony stories they do? What about NBC, CBS, & ABC? What about the failing New York Times & Washington Post? FAKE NEWS!
##                created_time  type
## 12 2017-06-27T19:45:00+0000 video
##                                                              link story
## 12 https://www.facebook.com/DonaldTrump/videos/10159379423530725/  <NA>
##    comments_count shares_count love_count haha_count wow_count sad_count
## 12          14977        36858      11610      16964      1494       154
##    angry_count
## 12         765

What about other reactions?

page[which.max(page$love_count),]
##                               id likes_count      from_id       from_name
## 6 153080620724_10159373535885725      180600 153080620724 Donald J. Trump
##                                                                                                                                                                                        message
## 6 Very grateful for the 9-O decision from the U. S. Supreme Court. We must keep America SAFE!\n\nFull statement: whitehouse.gov/the-press-office/2017/06/26/statement-president-donald-j-trump
##               created_time  type
## 6 2017-06-26T18:49:01+0000 photo
##                                                                                                       link
## 6 https://www.facebook.com/DonaldTrump/photos/a.488852220724.393301.153080620724/10159373535885725/?type=3
##   story comments_count shares_count love_count haha_count wow_count
## 6  <NA>           9295        20940      22651       1275       465
##   sad_count angry_count
## 6       105         454
page[which.max(page$haha_count),]
##                                id likes_count      from_id       from_name
## 12 153080620724_10159379423530725      135714 153080620724 Donald J. Trump
##                                                                                                                                                                                                                  message
## 12 Wow, CNN had to retract big story, with 3 employees forced to resign. What about all the other phony stories they do? What about NBC, CBS, & ABC? What about the failing New York Times & Washington Post? FAKE NEWS!
##                created_time  type
## 12 2017-06-27T19:45:00+0000 video
##                                                              link story
## 12 https://www.facebook.com/DonaldTrump/videos/10159379423530725/  <NA>
##    comments_count shares_count love_count haha_count wow_count sad_count
## 12          14977        36858      11610      16964      1494       154
##    angry_count
## 12         765
page[which.max(page$wow_count),]
##                                id likes_count      from_id       from_name
## 12 153080620724_10159379423530725      135714 153080620724 Donald J. Trump
##                                                                                                                                                                                                                  message
## 12 Wow, CNN had to retract big story, with 3 employees forced to resign. What about all the other phony stories they do? What about NBC, CBS, & ABC? What about the failing New York Times & Washington Post? FAKE NEWS!
##                created_time  type
## 12 2017-06-27T19:45:00+0000 video
##                                                              link story
## 12 https://www.facebook.com/DonaldTrump/videos/10159379423530725/  <NA>
##    comments_count shares_count love_count haha_count wow_count sad_count
## 12          14977        36858      11610      16964      1494       154
##    angry_count
## 12         765
page[which.max(page$sad_count),]
##                                id likes_count      from_id       from_name
## 17 153080620724_10159387848545725       40222 153080620724 Donald J. Trump
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            message
## 17 JOIN US LIVE from the Cabinet Room- \n\nIMMIGRATION ROUNDTABLE....\n\nWe are here today to discuss two crucial votes taking place in Congress tomorrow on vital public safety and national security legislation.\n\nWe are joined by the Chairman of the House Judiciary Committee, Bob Goodlatte.  Bob is one of the most skilled legislators in Congress, and he has worked with law enforcement to write a series of critical immigration bills that will close dangerous loopholes exploited by criminals, gang members, drug dealers, killers, and terrorists. \n\nAlso with us today are Congressmen Peter King, Lou Barletta, and David Young. \n\nTomorrow, the House will vote on the No Sanctuary for Criminals Act, which will cut federal grant money to cities that shield dangerous criminal aliens from being turned over to federal law enforcement.\n\nThe House will also vote on Kate’s Law – named for Kate Steinle, who was killed by an illegal immigrant who had been deported five times. This law will enhance criminal penalties for those who repeatedly re-enter the country illegally.\n\nCountless innocent Americans – including the loved ones of many families in the room with us today – have been killed by illegal immigrants with multiple deportations.\n\nI am especially honored to be here with so many courageousfamilies whom I got to know so well during the campaign. \n\nYou lost the people you loved because our government refused to enforce our nation’s immigration laws. \n\nFor years, the pundits, journalists and politicians in Washington refused to hear your voices.  But, on Election Day 2016, your voices were heard across THE ENTIRE WORLD.\n\nChairman Goodlatte has produced a package of truly key immigration enforcement bills.  This package includes the Davis-Oliver Act – whose passage I called for nearly a year ago at my immigration speech in Phoenix, Arizona. \n\nThe Davis-Oliver Act was named for Detective Michael Davis and Deputy Sheriff Danny Oliver, who were gunned down in the line of duty by an illegal immigrant with a criminal record and two prior deportations. \n\nTheir incredibly brave widows honored us with their presence at my address to Congress.  Today we are privileged to be joined by Melissa Oliver, the daughter of Deputy Sheriff Oliver. Melissa, your father was a HERO and we will never forget him.\n\nWe are calling on ALL members of Congress to honor grieving American Families by passing these life-savingmeasures in the House, IN THE SENATE, and sending them to my desk for signature.  It is time to support our police, to protect our families, and to SAVE AMERICAN LIVES.\n\nSo with that, I’d like to ask each of the families and invited guests to say a few words and to share your stories with the American People, beginning with my good friend Jamiel Shaw.
##                created_time  type
## 17 2017-06-28T19:07:07+0000 video
##                                                              link
## 17 https://www.facebook.com/DonaldTrump/videos/10159387848545725/
##                                             story comments_count
## 17 Donald J. Trump was live — at The White House.          24561
##    shares_count love_count haha_count wow_count sad_count angry_count
## 17         7542      13231        624       319       672        1203
page[which.max(page$angry_count),]
##                                id likes_count      from_id       from_name
## 17 153080620724_10159387848545725       40222 153080620724 Donald J. Trump
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            message
## 17 JOIN US LIVE from the Cabinet Room- \n\nIMMIGRATION ROUNDTABLE....\n\nWe are here today to discuss two crucial votes taking place in Congress tomorrow on vital public safety and national security legislation.\n\nWe are joined by the Chairman of the House Judiciary Committee, Bob Goodlatte.  Bob is one of the most skilled legislators in Congress, and he has worked with law enforcement to write a series of critical immigration bills that will close dangerous loopholes exploited by criminals, gang members, drug dealers, killers, and terrorists. \n\nAlso with us today are Congressmen Peter King, Lou Barletta, and David Young. \n\nTomorrow, the House will vote on the No Sanctuary for Criminals Act, which will cut federal grant money to cities that shield dangerous criminal aliens from being turned over to federal law enforcement.\n\nThe House will also vote on Kate’s Law – named for Kate Steinle, who was killed by an illegal immigrant who had been deported five times. This law will enhance criminal penalties for those who repeatedly re-enter the country illegally.\n\nCountless innocent Americans – including the loved ones of many families in the room with us today – have been killed by illegal immigrants with multiple deportations.\n\nI am especially honored to be here with so many courageousfamilies whom I got to know so well during the campaign. \n\nYou lost the people you loved because our government refused to enforce our nation’s immigration laws. \n\nFor years, the pundits, journalists and politicians in Washington refused to hear your voices.  But, on Election Day 2016, your voices were heard across THE ENTIRE WORLD.\n\nChairman Goodlatte has produced a package of truly key immigration enforcement bills.  This package includes the Davis-Oliver Act – whose passage I called for nearly a year ago at my immigration speech in Phoenix, Arizona. \n\nThe Davis-Oliver Act was named for Detective Michael Davis and Deputy Sheriff Danny Oliver, who were gunned down in the line of duty by an illegal immigrant with a criminal record and two prior deportations. \n\nTheir incredibly brave widows honored us with their presence at my address to Congress.  Today we are privileged to be joined by Melissa Oliver, the daughter of Deputy Sheriff Oliver. Melissa, your father was a HERO and we will never forget him.\n\nWe are calling on ALL members of Congress to honor grieving American Families by passing these life-savingmeasures in the House, IN THE SENATE, and sending them to my desk for signature.  It is time to support our police, to protect our families, and to SAVE AMERICAN LIVES.\n\nSo with that, I’d like to ask each of the families and invited guests to say a few words and to share your stories with the American People, beginning with my good friend Jamiel Shaw.
##                created_time  type
## 17 2017-06-28T19:07:07+0000 video
##                                                              link
## 17 https://www.facebook.com/DonaldTrump/videos/10159387848545725/
##                                             story comments_count
## 17 Donald J. Trump was live — at The White House.          24561
##    shares_count love_count haha_count wow_count sad_count angry_count
## 17         7542      13231        624       319       672        1203

We can also subset by date. For example, imagine we want to get all the posts from early November 2012 on Barack Obama’s Facebook page

page <- getPage("barackobama", token=fb_oauth, n=100,
    since='2012/11/01', until='2012/11/10')
## 23 posts
page[which.max(page$likes_count),]
##      from_id    from_name          message             created_time  type
## 5 6815841748 Barack Obama Four more years. 2012-11-07T04:15:08+0000 photo
##                                                                                                   link
## 5 https://www.facebook.com/barackobama/photos/a.53081056748.66806.6815841748/10151255420886749/?type=3
##                             id story likes_count comments_count
## 5 6815841748_10151255420886749  <NA>     4833722         219350
##   shares_count
## 5       660562

And if we need to, we can also extract the specific comments from each post.

post_id <- page$id[which.max(page$likes_count)]
post <- getPost(post_id, token=fb_oauth, n.comments=1000, likes=FALSE)

This is how you can view those comments:

comments <- post$comments
head(comments)
##             from_id       from_name   message             created_time
## 1   509226872540260  Jesse Talafili   OBAMA ! 2012-11-07T04:15:16+0000
## 2   485613484893917 Zain Ahmed Turk      yayy 2012-11-07T04:15:17+0000
## 3      675870897427   Gary D Ploski        <3 2012-11-07T04:15:17+0000
## 4   802034289809838     David Furka       YES 2012-11-07T04:15:18+0000
## 5 10201918108506766      Pinky Keys        :X 2012-11-07T04:15:18+0000
## 6 10102278537299904     Zac Bowling Hell yes! 2012-11-07T04:15:19+0000
##   likes_count comments_count                         id
## 1          18              0 10151255420886749_11954305
## 2           3              0 10151255420886749_11954306
## 3           2              0 10151255420886749_11954307
## 4           5              0 10151255420886749_11954309
## 5           1              0 10151255420886749_11954311
## 6           9              0 10151255420886749_11954315

Also, note that users can like comments! What is the comment that got the most likes?

comments[which.max(comments$likes_count),]
##           from_id      from_name message             created_time
## 1 509226872540260 Jesse Talafili OBAMA ! 2012-11-07T04:15:16+0000
##   likes_count comments_count                         id
## 1          18              0 10151255420886749_11954305

This is how you get nested comments:

page <- getPage("barackobama", token=fb_oauth, n=1)
## 1 posts
post <- getPost(page$id, token=fb_oauth, comments=TRUE, n=100, likes=FALSE)
comment <- getCommentReplies(post$comments$id[1],
                             token=fb_oauth, n=500, likes=TRUE)