Querying multiple SQL tables

Now’s where we get to see the magic of SQL. It makes it really easy to combine multiple SQL tables using JOIN. We just select columns from different tables and then specify what key should be used to match them using ON.

The example belows shows an inner join, but there are other types (left vs right outer joins, cross joins…) that we will not cover in this workshop.

library(DBI)
db <- dbConnect(RSQLite::SQLite(), "facebook-db.sqlite")

dbGetQuery(db, 
  "SELECT posts.likes_count, congress.party, posts.date
  FROM posts JOIN congress
    ON congress.screen_name = posts.screen_name
  LIMIT 10")
##    likes_count      party       date
## 1          201 Republican 2017-01-03
## 2          220 Republican 2017-01-04
## 3          116 Republican 2017-01-10
## 4          235 Republican 2017-01-10
## 5          571 Republican 2017-01-10
## 6          122 Republican 2017-01-13
## 7           40 Republican 2017-01-17
## 8           43 Republican 2017-01-17
## 9           46 Republican 2017-01-18
## 10         101 Republican 2017-01-20

Once we know how to merge tables, we can use columns from both tables. For example, let’s say we want to find the most shared post by a Democrat, or by a Republican. Here’s how we would do it:

dbGetQuery(db, 
  "SELECT posts.from_name, posts.message, posts.shares_count, congress.party 
  FROM posts JOIN congress
    ON congress.screen_name = posts.screen_name
  WHERE party = 'Democrat'
  ORDER BY shares_count DESC
  LIMIT 1")
##                 from_name
## 1 Congressman Mark Takano
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           message
## 1 This remarkable line of questioning from Congresswoman Suzan DelBene demonstrates just a few of the ways that the GOP tax plan treats corporations better than people.   Under the Republican plan, corporations are still allowed to deduct state and local taxes. Workers are not. Corporations are still allowed to deduct business expenses. Teachers are not.  Corporations are still allowed to deduct more than $10,000 in property taxes. Homeowners are not. Corporations are still allowed to deduct moving expenses. Families are not.   And this is on top of a $1.5 trillion corporate tax cut.   Let's be clear, this is not a "middle-class tax cut." Working families get the crumbs and the super-wealthy get everything else.
##   shares_count    party
## 1       341039 Democrat
dbGetQuery(db, 
  "SELECT posts.from_name, posts.message, posts.shares_count, congress.party 
  FROM posts JOIN congress
    ON congress.screen_name = posts.screen_name
  WHERE party = 'Republican'
  ORDER BY shares_count DESC
  LIMIT 1")
##     from_name
## 1 John McCain
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            message
## 1 Our government has a responsibility to defend our borders, but we must do so in a way that makes us safer and upholds all that is decent and exceptional about our nation.   It is clear from the confusion at our airports across the nation that President Trump’s executive order was not properly vetted. We are particularly concerned by reports that this order went into effect with little to no consultation with the Departments of State, Defense, Justice, and Homeland Security.   Such a hasty process risks harmful results. We should not stop green-card holders from returning to the country they call home. We should not stop those who have served as interpreters for our military and diplomats from seeking refuge in the country they risked their lives to help. And we should not turn our backs on those refugees who have been shown through extensive vetting to pose no demonstrable threat to our nation, and who have suffered unspeakable horrors, most of them women and children.  Ultimately, we fear this executive order will become a self-inflicted wound in the fight against terrorism. At this very moment, American troops are fighting side-by-side with our Iraqi partners to defeat ISIL. But this executive order bans Iraqi pilots from coming to military bases in Arizona to fight our common enemies. Our most important allies in the fight against ISIL are the vast majority of Muslims who reject its apocalyptic ideology of hatred. This executive order sends a signal, intended or not, that America does not want Muslims coming into our country. That is why we fear this executive order may do more to help terrorist recruitment than improve our security.
##   shares_count      party
## 1       100376 Republican

Grouping and aggregating

SQL makes it really easy to compute aggregate statistics of one column based on values of a different column, with a series of built-in functions (COUNT, AVG, MAX, MIN…). Let’s now see a few examples of this type of queries:

# use COUNT with * to return the total number of rows
dbGetQuery(db, 'SELECT COUNT(*) FROM posts')
##   COUNT(*)
## 1   175411
dbGetQuery(db, 'SELECT COUNT(*) FROM congress')
##   COUNT(*)
## 1      518

Combining everything we’ve learned, we can for example count the total number of posts by party, or by party and gender.

dbGetQuery(db, 
  "SELECT congress.party, COUNT(*) AS total_posts
  FROM posts JOIN congress
    ON congress.screen_name = posts.screen_name
  GROUP BY congress.party")
##         party total_posts
## 1    Democrat       85129
## 2 Independent         893
## 3  Republican       89389
dbGetQuery(db, 
  "SELECT congress.party, congress.gender, COUNT(*) AS total_posts
  FROM posts JOIN congress
    ON congress.screen_name = posts.screen_name
  GROUP BY congress.party, congress.gender")
##         party gender total_posts
## 1    Democrat      F       30454
## 2    Democrat      M       54675
## 3 Independent      M         893
## 4  Republican      F        9893
## 5  Republican      M       79496

Here’s another example. Let’s say we want to look at what domains Republicans and Democrats share more frequently on their Facebook pages. How would we do this?

dbGetQuery(db, 
  "SELECT congress.party, domain, COUNT(*) AS domain_count
  FROM posts JOIN Congress
    ON congress.screen_name = posts.screen_name
  WHERE congress.party = 'Democrat'
  GROUP BY domain
  ORDER BY domain_count DESC
  LIMIT 20")
##       party             domain domain_count
## 1  Democrat               <NA>        53929
## 2  Democrat        nytimes.com         2159
## 3  Democrat             bit.ly         2043
## 4  Democrat washingtonpost.com         1945
## 5  Democrat        thehill.com          899
## 6  Democrat            cnn.com          647
## 7  Democrat       politico.com          594
## 8  Democrat               t.co          445
## 9  Democrat huffingtonpost.com          382
## 10 Democrat            vox.com          364
## 11 Democrat         medium.com          340
## 12 Democrat        latimes.com          324
## 13 Democrat            npr.org          308
## 14 Democrat             goo.gl          295
## 15 Democrat        nbcnews.com          278
## 16 Democrat        twitter.com          276
## 17 Democrat         c-span.org          268
## 18 Democrat             apo.st          261
## 19 Democrat       usatoday.com          260
## 20 Democrat            nyti.ms          256
dbGetQuery(db, 
  "SELECT congress.party, domain, COUNT(*) AS domain_count
  FROM posts JOIN Congress
    ON congress.screen_name = posts.screen_name
  WHERE congress.party = 'Republican'
  GROUP BY domain
  ORDER BY domain_count DESC
  LIMIT 20")
##         party                   domain domain_count
## 1  Republican                     <NA>        59195
## 2  Republican                   bit.ly         1723
## 3  Republican              foxnews.com          708
## 4  Republican              thehill.com          664
## 5  Republican   washingtonexaminer.com          464
## 6  Republican       washingtonpost.com          426
## 7  Republican                     t.co          405
## 8  Republican              twitter.com          376
## 9  Republican                   goo.gl          293
## 10 Republican                  wsj.com          249
## 11 Republican        fairandsimple.gop          243
## 12 Republican             usatoday.com          240
## 13 Republican               medium.com          205
## 14 Republican                  cnn.com          199
## 15 Republican               c-span.org          195
## 16 Republican             facebook.com          185
## 17 Republican iqconnect.lmhostediq.com          180
## 18 Republican             politico.com          180
## 19 Republican        cotton.senate.gov          163
## 20 Republican      washingtontimes.com          147

There are other functions besides just COUNT. We can also compute averages, the maximum and minimu of a column, the sum… We can also functions like SUBSTR to extract parts of a character string.

# average number of posts by party
dbGetQuery(db, 
  "SELECT congress.party, AVG(posts.likes_count), COUNT(*) 
  FROM posts JOIN congress
    ON congress.screen_name = posts.screen_name
  GROUP BY congress.party")
##         party AVG(posts.likes_count) COUNT(*)
## 1    Democrat               404.5807    85129
## 2 Independent             17207.3303      893
## 3  Republican               171.2773    89389
# we can also compute it in a different way...
dbGetQuery(db, 
  "SELECT congress.party, SUM(posts.likes_count)/COUNT(*) AS average 
  FROM posts JOIN congress
    ON congress.screen_name = posts.screen_name
  GROUP BY congress.party")
##         party average
## 1    Democrat     404
## 2 Independent   17207
## 3  Republican     171
# most popular post by party
dbGetQuery(db, 
  "SELECT from_name, message, congress.party, MAX(posts.likes_count), COUNT(*) 
  FROM posts JOIN congress
    ON congress.screen_name = posts.screen_name
  GROUP BY congress.party")
##                       from_name
## 1 U.S. Senator Elizabeth Warren
## 2   U.S. Senator Bernie Sanders
## 3                   John McCain
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            message
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Tonight we fight for American values at airports all across this country. I'm at Logan Airport tonight to say: our diversity makes our country strong. We will not allow a Muslim ban in the United States of America. #NoBanNoWall #nomuslimban
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Today is going to be a tough day for millions of Americans, including myself. But we cannot throw up our hands in despair. We have to fight back as effectively and as vigorously as we can. We are not giving up.
## 3 Our government has a responsibility to defend our borders, but we must do so in a way that makes us safer and upholds all that is decent and exceptional about our nation.   It is clear from the confusion at our airports across the nation that President Trump’s executive order was not properly vetted. We are particularly concerned by reports that this order went into effect with little to no consultation with the Departments of State, Defense, Justice, and Homeland Security.   Such a hasty process risks harmful results. We should not stop green-card holders from returning to the country they call home. We should not stop those who have served as interpreters for our military and diplomats from seeking refuge in the country they risked their lives to help. And we should not turn our backs on those refugees who have been shown through extensive vetting to pose no demonstrable threat to our nation, and who have suffered unspeakable horrors, most of them women and children.  Ultimately, we fear this executive order will become a self-inflicted wound in the fight against terrorism. At this very moment, American troops are fighting side-by-side with our Iraqi partners to defeat ISIL. But this executive order bans Iraqi pilots from coming to military bases in Arizona to fight our common enemies. Our most important allies in the fight against ISIL are the vast majority of Muslims who reject its apocalyptic ideology of hatred. This executive order sends a signal, intended or not, that America does not want Muslims coming into our country. That is why we fear this executive order may do more to help terrorist recruitment than improve our security.
##         party MAX(posts.likes_count) COUNT(*)
## 1    Democrat                 421064    85129
## 2 Independent                 335572      893
## 3  Republican                 288231    89389
# number of posts by day of the month
dbGetQuery(db, 
  "SELECT SUBSTR(date, 9, 10) AS day_of_month, COUNT(*) as post_count
  FROM posts
  GROUP BY day_of_month")
##    day_of_month post_count
## 1            01       6198
## 2            02       5491
## 3            03       5493
## 4            04       5702
## 5            05       5306
## 6            06       6495
## 7            07       6331
## 8            08       5651
## 9            09       5137
## 10           10       5511
## 11           11       5806
## 12           12       6061
## 13           13       6722
## 14           14       6282
## 15           15       5876
## 16           16       5933
## 17           17       5684
## 18           18       5732
## 19           19       5508
## 20           20       6123
## 21           21       5747
## 22           22       5328
## 23           23       5192
## 24           24       5754
## 25           25       5431
## 26           26       5430
## 27           27       6014
## 28           28       6307
## 29           29       5091
## 30           30       4527
## 31           31       3548