For this first part of the class, we’ll be working with a SQLite database, which is self-contained in a file within our hard drive, without any need to set up a server. The dataset we will work with is all Facebook posts by Members of the U.S. Congress in 2017, as collected from the public Pages API earlier this year.
We’ll start by connecting to the database and adding the first table, with information at the member of congress level.
library(DBI)
# create database: this will create a file in our hard drive
db <- dbConnect(RSQLite::SQLite(), "facebook-db.sqlite")
# reading the first file
congress <- read.csv("../data/congress-facebook-2017.csv",
stringsAsFactors=F)
# adding first table: user-level data
dbWriteTable(db, "congress", congress)
# testing that it works with a simple query
dbListFields(db, "congress")
## [1] "bioid" "screen_name" "name" "gender" "type"
## [6] "party"
dbGetQuery(db, 'SELECT * FROM congress LIMIT 5')
## bioid screen_name name gender type party
## 1 A000055 RobertAderholt Robert B. Aderholt M rep Republican
## 2 A000360 senatorlamaralexander Lamar Alexander M sen Republican
## 3 A000367 repjustinamash Justin Amash M rep Republican
## 4 A000369 MarkAmodeiNV2 Mark E. Amodei M rep Republican
## 5 A000370 CongresswomanAdams Alma S. Adams F rep Democrat
Note that we have used dbWriteTable
to add the data frame to our database, and then check that it all worked with dbListFields
and dbGetQuery
. As we will see in a second, this second function is what we will use to query the database, and it will always return a table.
We now switch to the second table. However, the files are too big to open them all in memory. Instead, we will open them one by one, and then append them to the table. Let’s see how that works…
fls <- list.files("../data/posts", full.names=TRUE)
for (f in fls){
message(f)
# read file into memory
fb <- read.csv(f, stringsAsFactors=F)
# adding to table in SQL database
dbWriteTable(db, "posts", fb, append=TRUE)
}
## ../data/posts/106631626049851.csv
## ../data/posts/109135405838588.csv
## ../data/posts/113303673339.csv
## ../data/posts/115356957005.csv
## ../data/posts/116058275133542.csv
## ../data/posts/118514606128.csv
## ../data/posts/15083070102.csv
## ../data/posts/152569121550.csv
## ../data/posts/153423912663.csv
## ../data/posts/191056827594903.csv
## ../data/posts/214258646163.csv
## ../data/posts/301936109927957.csv
## ../data/posts/326420614138023.csv
## ../data/posts/395759603917487.csv
## ../data/posts/50375006903.csv
## ../data/posts/63158229861.csv
## ../data/posts/70063393423.csv
## ../data/posts/8037068318.csv
## ../data/posts/81058818750.csv
## ../data/posts/81125319109.csv
## ../data/posts/95696782238.csv
## ../data/posts/96007744606.csv
## ../data/posts/AndyHarrisMD.csv
## ../data/posts/aumuaamata.csv
## ../data/posts/betoorourketx16.csv
## ../data/posts/BlaineLuetkemeyer.csv
## ../data/posts/BlakeFarenthold.csv
## ../data/posts/bobcorker.csv
## ../data/posts/BobGoodlatte.csv
## ../data/posts/boblatta.csv
## ../data/posts/ChelliePingree.csv
## ../data/posts/chrismurphyct.csv
## ../data/posts/chrisvanhollen.csv
## ../data/posts/CongressmanAlexMooney.csv
## ../data/posts/CongressmanAndreCarson.csv
## ../data/posts/CongressmanBennieGThompson.csv
## ../data/posts/CongressmanBillFoster.csv
## ../data/posts/congressmanbobbyrush.csv
## ../data/posts/CongressmanBoyle.csv
## ../data/posts/CongressmanBradSchneider.csv
## ../data/posts/CongressmanBuchanan.csv
## ../data/posts/congressmanbuddycarter.csv
## ../data/posts/CongressmanClayHiggins.csv
## ../data/posts/congressmancomer.csv
## ../data/posts/CongressmanCulberson.csv
## ../data/posts/CongressmanDarrenSoto.csv
## ../data/posts/CongressmanDavidCicilline.csv
## ../data/posts/CongressmanDavidValadao.csv
## ../data/posts/CongressmanDennyHeck.csv
## ../data/posts/CongressmanDent.csv
## ../data/posts/CongressmanDKDavis.csv
## ../data/posts/CongressmanDougLamborn.csv
## ../data/posts/CongressmanDuncan.csv
## ../data/posts/CongressmanEricSwalwell.csv
## ../data/posts/CongressmanErikPaulsen.csv
## ../data/posts/CongressmanGarretGraves.csv
## ../data/posts/CongressmanGaryPalmer.csv
## ../data/posts/CongressmanGeorgeHolding.csv
## ../data/posts/CongressmanGerryConnolly.csv
## ../data/posts/congressmangkbutterfield.csv
## ../data/posts/CongressmanGT.csv
## ../data/posts/CongressmanGuthrie.csv
## ../data/posts/CongressmanHalRogers.csv
## ../data/posts/CongressmanJimBridenstine.csv
## ../data/posts/CongressmanJimHimes.csv
## ../data/posts/CongressmanJimLangevin.csv
## ../data/posts/CongressmanJodyHice.csv
## ../data/posts/congressmanjohndelaney.csv
## ../data/posts/CongressmanKevinCramer.csv
## ../data/posts/CongressmanKevinYoder.csv
## ../data/posts/CongressmanLance.csv
## ../data/posts/CongressmanLouBarletta.csv
## ../data/posts/CongressmanMarcVeasey.csv
## ../data/posts/CongressmanMarino.csv
## ../data/posts/CongressmanMattCartwright.csv
## ../data/posts/CongressmanMattGaetz.csv
## ../data/posts/CongressmanMcHenry.csv
## ../data/posts/CongressmanMikeDRogers.csv
## ../data/posts/CongressmanNadler.csv
## ../data/posts/CongressmanPatrickMeehan.csv
## ../data/posts/congressmanpittenger.csv
## ../data/posts/congressmanraja.csv
## ../data/posts/CongressmanRalphAbraham.csv
## ../data/posts/CongressmanRaulRuizMD.csv
## ../data/posts/CongressmanRickAllen.csv
## ../data/posts/congressmanrodblum.csv
## ../data/posts/CongressmanRyanCostello.csv
## ../data/posts/CongressmanScottTipton.csv
## ../data/posts/CongressmanSethMoulton.csv
## ../data/posts/CongressmanSteveCohen.csv
## ../data/posts/CongressmanTedDeutch.csv
## ../data/posts/CongressmanTedYoho.csv
## ../data/posts/CongressmanTomMacArthur.csv
## ../data/posts/CongressmanWarrenDavidson.csv
## ../data/posts/Congresswoman.Hartzler.csv
## ../data/posts/CongresswomanAdams.csv
## ../data/posts/CongresswomanAnnieKuster.csv
## ../data/posts/CongresswomanBarragan.csv
## ../data/posts/congresswomanbonamici.csv
## ../data/posts/CongresswomanClark.csv
## ../data/posts/CongresswomanEBJtx30.csv
## ../data/posts/CongresswomanNorton.csv
## ../data/posts/CongresswomanRosaDeLauro.csv
## ../data/posts/CongresswomanSheilaJacksonLee.csv
## ../data/posts/congresswomansusanwbrooks.csv
## ../data/posts/CongresswomanTitus.csv
## ../data/posts/darrellissa.csv
## ../data/posts/DaveLoebsack.csv
## ../data/posts/dennis.ross.376.csv
## ../data/posts/derek.kilmer.csv
## ../data/posts/DianaDeGette.csv
## ../data/posts/DianeBlackTN06.csv
## ../data/posts/DonaldNorcrossNJ.csv
## ../data/posts/DonaldPayneJr.csv
## ../data/posts/doris.matsui.csv
## ../data/posts/drnealdunnfl2.csv
## ../data/posts/DrPhilRoe.csv
## ../data/posts/DuncanHunter.csv
## ../data/posts/EdJMarkey.csv
## ../data/posts/EdRoyce.csv
## ../data/posts/elijahcummings.csv
## ../data/posts/emanuelcleaverii.csv
## ../data/posts/grassley.csv
## ../data/posts/GreggHarper.csv
## ../data/posts/GusBilirakis.csv
## ../data/posts/GwenSMoore.csv
## ../data/posts/herrerabeutler.csv
## ../data/posts/HurdOnTheHill.csv
## ../data/posts/iroslehtinen.csv
## ../data/posts/JackieSpeier.csv
## ../data/posts/jameseclyburn.csv
## ../data/posts/janschakowsky.csv
## ../data/posts/jaredpolis.csv
## ../data/posts/jefffortenberry.csv
## ../data/posts/jeffmerkley.csv
## ../data/posts/jerrymcnerney.csv
## ../data/posts/jerrymoran.csv
## ../data/posts/JimCooper.csv
## ../data/posts/jiminhofe.csv
## ../data/posts/JodeyArrington.csv
## ../data/posts/joecourtney.csv
## ../data/posts/JoeManchinIII.csv
## ../data/posts/JoeWilson.csv
## ../data/posts/johnbarrasso.csv
## ../data/posts/JohnBoozman.csv
## ../data/posts/JohnKennedyLouisiana.csv
## ../data/posts/johnmccain.csv
## ../data/posts/judgecarter.csv
## ../data/posts/Keith.Ellison.csv
## ../data/posts/keithrothfus.csv
## ../data/posts/kevinbrady.csv
## ../data/posts/LamarSmithTX21.csv
## ../data/posts/lloyddoggett.csv
## ../data/posts/madeleine.bordallo.csv
## ../data/posts/MarkAmodeiNV2.csv
## ../data/posts/MarkRWarner.csv
## ../data/posts/marshablackburn.csv
## ../data/posts/MartinHeinrich.csv
## ../data/posts/MaxineWaters.csv
## ../data/posts/mcmorrisrodgers.csv
## ../data/posts/mdiazbalart.csv
## ../data/posts/michaelcburgess.csv
## ../data/posts/michaeltmccaul.csv
## ../data/posts/mike.conaway.csv
## ../data/posts/mikecrapo.csv
## ../data/posts/mikeenzi.csv
## ../data/posts/mitchmcconnell.csv
## ../data/posts/NancyPelosi.csv
## ../data/posts/pascrell.csv
## ../data/posts/PeterRoskam.csv
## ../data/posts/PeterWelch.csv
## ../data/posts/petesessions.csv
## ../data/posts/raul.r.labrador.csv
## ../data/posts/Rep.Billy.Long.csv
## ../data/posts/Rep.BluntRochester.csv
## ../data/posts/Rep.Grijalva.csv
## ../data/posts/Rep.PeteOlson.csv
## ../data/posts/Rep.Shuster.csv
## ../data/posts/RepAdamSchiff.csv
## ../data/posts/RepAdamSmith.csv
## ../data/posts/repalgreen.csv
## ../data/posts/RepAlLawsonJr.csv
## ../data/posts/RepAmiBera.csv
## ../data/posts/RepAndyBarr.csv
## ../data/posts/RepAndyBiggs.csv
## ../data/posts/RepAnnaEshoo.csv
## ../data/posts/RepAnnWagner.csv
## ../data/posts/RepAnthonyBrown.csv
## ../data/posts/RepAustinScott.csv
## ../data/posts/RepBarbaraComstock.csv
## ../data/posts/RepBarbaraLee.csv
## ../data/posts/RepBenRayLujan.csv
## ../data/posts/repbettymccollum.csv
## ../data/posts/RepBillFlores.csv
## ../data/posts/RepBillJohnson.csv
## ../data/posts/RepBobbyScott.csv
## ../data/posts/RepBobGibbs.csv
## ../data/posts/RepBonnie.csv
## ../data/posts/RepBost.csv
## ../data/posts/RepBradWenstrup.csv
## ../data/posts/RepBrianBabin.csv
## ../data/posts/repbrianfitzpatrick.csv
## ../data/posts/RepBrianHiggins.csv
## ../data/posts/repbrianmast.csv
## ../data/posts/RepByrne.csv
## ../data/posts/repcardenas.csv
## ../data/posts/RepCarolynMaloney.csv
## ../data/posts/RepCharlieCrist.csv
## ../data/posts/RepCheri.csv
## ../data/posts/RepChrisCollins.csv
## ../data/posts/RepChrisSmith.csv
## ../data/posts/RepChrisStewart.csv
## ../data/posts/repchuck.csv
## ../data/posts/RepClaudiaTenney.csv
## ../data/posts/RepColleenHanabusa.csv
## ../data/posts/repcurbelo.csv
## ../data/posts/RepDanDonovan.csv
## ../data/posts/RepDanKildee.csv
## ../data/posts/repdanlipinski.csv
## ../data/posts/RepDaveBrat.csv
## ../data/posts/RepDaveJoyce.csv
## ../data/posts/repdavereichert.csv
## ../data/posts/RepDaveTrott.csv
## ../data/posts/RepDavidEPrice.csv
## ../data/posts/RepDavidKustoff.csv
## ../data/posts/repdavidschweikert.csv
## ../data/posts/RepDavidYoung.csv
## ../data/posts/RepDebbieDingell.csv
## ../data/posts/RepDelBene.csv
## ../data/posts/RepDeSantis.csv
## ../data/posts/RepDonBacon.csv
## ../data/posts/repdonbeyer.csv
## ../data/posts/RepDonYoung.csv
## ../data/posts/RepDrewFerguson.csv
## ../data/posts/RepDwightEvans.csv
## ../data/posts/RepDWS.csv
## ../data/posts/RepEliotLEngel.csv
## ../data/posts/RepEliseStefanik.csv
## ../data/posts/RepEspaillat.csv
## ../data/posts/RepEsty.csv
## ../data/posts/RepEvanJenkins.csv
## ../data/posts/RepFrankLucas.csv
## ../data/posts/repfrankpallone.csv
## ../data/posts/RepFredUpton.csv
## ../data/posts/RepFrenchHill.csv
## ../data/posts/repgaramendi.csv
## ../data/posts/RepGeneGreen.csv
## ../data/posts/repgosar.csv
## ../data/posts/repgracemeng.csv
## ../data/posts/RepGraceNapolitano.csv
## ../data/posts/repgregwalden.csv
## ../data/posts/RepGrothman.csv
## ../data/posts/RepGutierrez.csv
## ../data/posts/RepHakeemJeffries.csv
## ../data/posts/RepHensarling.csv
## ../data/posts/RepHuffman.csv
## ../data/posts/RepHuizenga.csv
## ../data/posts/RepHultgren.csv
## ../data/posts/RepJackBergman.csv
## ../data/posts/RepJackieWalorski.csv
## ../data/posts/RepJackyRosen.csv
## ../data/posts/RepJasonLewis.csv
## ../data/posts/repjasonsmith.csv
## ../data/posts/RepJayapal.csv
## ../data/posts/RepJeffDenham.csv
## ../data/posts/RepJeffDuncan.csv
## ../data/posts/RepJimBanks.csv
## ../data/posts/RepJimCosta.csv
## ../data/posts/repjimjordan.csv
## ../data/posts/RepJimMcGovern.csv
## ../data/posts/RepJimmyGomez.csv
## ../data/posts/RepJimmyPanetta.csv
## ../data/posts/repjimrenacci.csv
## ../data/posts/RepJoeBarton.csv
## ../data/posts/repjoecrowley.csv
## ../data/posts/RepJohnFaso.csv
## ../data/posts/RepJohnKatko.csv
## ../data/posts/RepJohnLarson.csv
## ../data/posts/RepJohnLewis.csv
## ../data/posts/RepJoseSerrano.csv
## ../data/posts/RepJoshG.csv
## ../data/posts/RepJoyceBeatty.csv
## ../data/posts/RepJuanVargas.csv
## ../data/posts/RepJudyChu.csv
## ../data/posts/RepJuliaBrownley.csv
## ../data/posts/RepKarenBass.csv
## ../data/posts/RepKathleenRice.csv
## ../data/posts/RepKayGranger.csv
## ../data/posts/repkenbuck.csv
## ../data/posts/RepKennyMarchant.csv
## ../data/posts/RepKihuen.csv
## ../data/posts/RepKinzinger.csv
## ../data/posts/replahood.csv
## ../data/posts/RepLaMalfa.csv
## ../data/posts/RepLarryBucshon.csv
## ../data/posts/RepLeeZeldin.csv
## ../data/posts/replizcheney.csv
## ../data/posts/RepLoisFrankel.csv
## ../data/posts/RepLouCorrea.csv
## ../data/posts/reploudermilk.csv
## ../data/posts/RepLouiseSlaughter.csv
## ../data/posts/RepLowenthal.csv
## ../data/posts/RepLowey.csv
## ../data/posts/RepLujanGrisham.csv
## ../data/posts/RepLukeMesser.csv
## ../data/posts/replynnjenkins.csv
## ../data/posts/repmacthornberry.csv
## ../data/posts/RepMarciaLFudge.csv
## ../data/posts/RepMarkDeSaulnier.csv
## ../data/posts/Repmarkmeadows.csv
## ../data/posts/repmarkpocan.csv
## ../data/posts/RepMarkTakano.csv
## ../data/posts/RepMarkWalker.csv
## ../data/posts/RepMcEachin.csv
## ../data/posts/RepMcKinley.csv
## ../data/posts/RepMcSally.csv
## ../data/posts/repmialove.csv
## ../data/posts/RepMichaelCapuano.csv
## ../data/posts/RepMikeBishop.csv
## ../data/posts/repmikecoffman.csv
## ../data/posts/RepMikeGallagher.csv
## ../data/posts/RepMikeJohnson.csv
## ../data/posts/repmikequigley.csv
## ../data/posts/RepMikeThompson.csv
## ../data/posts/RepMikeTurner.csv
## ../data/posts/RepMimiWalters.csv
## ../data/posts/RepMoBrooks.csv
## ../data/posts/RepMoolenaar.csv
## ../data/posts/RepMorganGriffith.csv
## ../data/posts/RepMullin.csv
## ../data/posts/RepNewhouse.csv
## ../data/posts/RepNormaTorres.csv
## ../data/posts/repohalleran.csv
## ../data/posts/RepPatTiberi.csv
## ../data/posts/RepPaulCook.csv
## ../data/posts/reppaulmitchell.csv
## ../data/posts/reppaultonko.csv
## ../data/posts/RepPerlmutter.csv
## ../data/posts/reppeteaguilar.csv
## ../data/posts/RepPeterDeFazio.csv
## ../data/posts/RepPoliquin.csv
## ../data/posts/RepRalphNorman.csv
## ../data/posts/repraskin.csv
## ../data/posts/RepRatcliffe.csv
## ../data/posts/Representative.Martha.Roby.csv
## ../data/posts/RepresentativeDougCollins.csv
## ../data/posts/RepresentativeMarcyKaptur.csv
## ../data/posts/RepresentativeSteveKnight.csv
## ../data/posts/RepresentativeValDemings.csv
## ../data/posts/reprichardneal.csv
## ../data/posts/RepRichHudson.csv
## ../data/posts/RepRichmond.csv
## ../data/posts/RepRickCrawford.csv
## ../data/posts/RepRickLarsen.csv
## ../data/posts/RepRobBishop.csv
## ../data/posts/RepRobertBrady.csv
## ../data/posts/reprobinkelly.csv
## ../data/posts/RepRobWittman.csv
## ../data/posts/RepRobWoodall.csv
## ../data/posts/RepRodneyDavis.csv
## ../data/posts/RepRogerWilliams.csv
## ../data/posts/RepRoKhanna.csv
## ../data/posts/RepRonEstes.csv
## ../data/posts/repronkind.csv
## ../data/posts/RepRooney.csv
## ../data/posts/RepRouzer.csv
## ../data/posts/RepRoybalAllard.csv
## ../data/posts/RepRubenGallego.csv
## ../data/posts/RepRutherfordFL.csv
## ../data/posts/repsaludcarbajal.csv
## ../data/posts/RepSamJohnson.csv
## ../data/posts/RepSandyLevin.csv
## ../data/posts/RepSanfordSC.csv
## ../data/posts/RepSarbanes.csv
## ../data/posts/repschrader.csv
## ../data/posts/repscottperry.csv
## ../data/posts/RepScottPeters.csv
## ../data/posts/RepScottTaylor.csv
## ../data/posts/RepSeanDuffy.csv
## ../data/posts/RepSeanMaloney.csv
## ../data/posts/RepSewell.csv
## ../data/posts/repsheaporter.csv
## ../data/posts/repshimkus.csv
## ../data/posts/RepSmucker.csv
## ../data/posts/repstaceyplaskett.csv
## ../data/posts/repstephenlynch.csv
## ../data/posts/RepStephMurphy.csv
## ../data/posts/RepSteveChabot.csv
## ../data/posts/RepStevePearce.csv
## ../data/posts/RepSteveScalise.csv
## ../data/posts/RepSusanDavis.csv
## ../data/posts/RepTedBudd.csv
## ../data/posts/RepTedLieu.csv
## ../data/posts/RepThomasMassie.csv
## ../data/posts/RepToddRokita.csv
## ../data/posts/reptomemmer.csv
## ../data/posts/reptomgraves.csv
## ../data/posts/RepTomReed.csv
## ../data/posts/reptomrice.csv
## ../data/posts/reptomrooney.csv
## ../data/posts/RepTomSuozzi.csv
## ../data/posts/reptrentkelly.csv
## ../data/posts/reptrey.csv
## ../data/posts/RepTreyGowdy.csv
## ../data/posts/RepTsongas.csv
## ../data/posts/RepTulsiGabbard.csv
## ../data/posts/repvisclosky.csv
## ../data/posts/RepWalberg.csv
## ../data/posts/RepWebster.csv
## ../data/posts/RepWesterman.csv
## ../data/posts/RepWilson.csv
## ../data/posts/repyvettedclarke.csv
## ../data/posts/rfrelinghuysen.csv
## ../data/posts/RichardShelby.csv
## ../data/posts/RobertAderholt.csv
## ../data/posts/rogermarshallmd.csv
## ../data/posts/sanfordbishop.csv
## ../data/posts/ScottDesJarlaisTN04.csv
## ../data/posts/sen.johncornyn.csv
## ../data/posts/senatoralfranken.csv
## ../data/posts/SenatorAngusSKingJr.csv
## ../data/posts/senatorbencardin.csv
## ../data/posts/SenatorBlunt.csv
## ../data/posts/SenatorBobCasey.csv
## ../data/posts/senatorcantwell.csv
## ../data/posts/senatorchriscoons.csv
## ../data/posts/SenatorCortezMasto.csv
## ../data/posts/SenatorDavidPerdue.csv
## ../data/posts/senatordebfischer.csv
## ../data/posts/senatordonnelly.csv
## ../data/posts/SenatorDurbin.csv
## ../data/posts/senatorelizabethwarren.csv
## ../data/posts/senatorfeinstein.csv
## ../data/posts/SenatorHassan.csv
## ../data/posts/SenatorHeidiHeitkamp.csv
## ../data/posts/senatorhirono.csv
## ../data/posts/senatorjeffflake.csv
## ../data/posts/SenatorJohnHoeven.csv
## ../data/posts/SenatorKaine.csv
## ../data/posts/SenatorKamalaHarris.csv
## ../data/posts/senatorlamaralexander.csv
## ../data/posts/SenatorLankford.csv
## ../data/posts/SenatorLutherStrange.csv
## ../data/posts/SenatorMarcoRubio.csv
## ../data/posts/senatormccaskill.csv
## ../data/posts/senatormenendez.csv
## ../data/posts/senatormikelee.csv
## ../data/posts/SenatorMikeRounds.csv
## ../data/posts/senatororrinhatch.csv
## ../data/posts/SenatorPatrickLeahy.csv
## ../data/posts/SenatorRandPaul.csv
## ../data/posts/SenatorRichardBurr.csv
## ../data/posts/senatorsanders.csv
## ../data/posts/SenatorSasse.csv
## ../data/posts/SenatorShaheen.csv
## ../data/posts/SenatorSherrodBrown.csv
## ../data/posts/SenatorStabenow.csv
## ../data/posts/senatortammybaldwin.csv
## ../data/posts/SenatorTedCruz.csv
## ../data/posts/senatortester.csv
## ../data/posts/SenatorThomTillis.csv
## ../data/posts/SenatorTimScott.csv
## ../data/posts/SenatorToddYoung.csv
## ../data/posts/SenatorTomCotton.csv
## ../data/posts/senatortomudall.csv
## ../data/posts/senatortoomey.csv
## ../data/posts/SenatorWhitehouse.csv
## ../data/posts/SenatorWicker.csv
## ../data/posts/senbennetco.csv
## ../data/posts/SenBlumenthal.csv
## ../data/posts/SenBrianSchatz.csv
## ../data/posts/SenCoryGardner.csv
## ../data/posts/SenDanSullivan.csv
## ../data/posts/SenDeanHeller.csv
## ../data/posts/SenDuckworth.csv
## ../data/posts/SenGaryPeters.csv
## ../data/posts/SenJackReed.csv
## ../data/posts/senjoniernst.csv
## ../data/posts/SenKirstenGillibrand.csv
## ../data/posts/SenLisaMurkowski.csv
## ../data/posts/SenPatRoberts.csv
## ../data/posts/senrobportman.csv
## ../data/posts/senronjohnson.csv
## ../data/posts/senschumer.csv
## ../data/posts/senshelley.csv
## ../data/posts/speakerryan.csv
## ../data/posts/SteveDainesMT.csv
## ../data/posts/stevekingia.csv
## ../data/posts/stevenpalazzo.csv
## ../data/posts/susancollins.csv
## ../data/posts/timryan.csv
## ../data/posts/tomcarper.csv
## ../data/posts/TomColeOK04.csv
## ../data/posts/TXRandy14.csv
## ../data/posts/USCongressmanFilemonVela.csv
## ../data/posts/USCongressmanVicenteGonzalez.csv
## ../data/posts/USRepKathyCastor.csv
## ../data/posts/usrepmikedoyle.csv
## ../data/posts/UsRepRickNolan.csv
## ../data/posts/USSenatorLindseyGraham.csv
## ../data/posts/WhipHoyer.csv
## ../data/posts/zoelofgren.csv
# testing that it works
dbListFields(db, "posts")
## [1] "screen_name" "id" "from_name" "date"
## [5] "datetime" "message" "type" "link"
## [9] "domain" "likes_count" "comments_count" "shares_count"
## [13] "love_count" "haha_count" "wow_count" "sad_count"
## [17] "angry_count"
dbGetQuery(db, 'SELECT * FROM posts LIMIT 5')
## screen_name id from_name date
## 1 1.066316e+14 106631626049851_1273804329332569 Ted Poe 2017-01-03
## 2 1.066316e+14 106631626049851_1274017475977921 Ted Poe 2017-01-03
## 3 1.066316e+14 106631626049851_1275444719168530 Ted Poe 2017-01-05
## 4 1.066316e+14 106631626049851_1275484255831243 Ted Poe 2017-01-05
## 5 1.066316e+14 106631626049851_1276290242417311 Ted Poe 2017-01-06
## datetime
## 1 2017-01-03 17:16:40
## 2 2017-01-03 23:11:15
## 3 2017-01-05 14:36:24
## 4 2017-01-05 15:38:22
## 5 2017-01-06 14:31:19
## message
## 1 It’s great to be back in Washington for the start of the 115th Congress! Enjoyed seeing the Adams Family from Kingwood this morning. Click below to watch the Swearing-In Ceremony and Floor Proceedings.
## 2 For years, Washington bureaucrats have repeatedly preached that the border is secure. The reality is that the majority of the southern border territory is controlled by someone other than the United States. Border patrol and local law enforcement do the best they can with the resources they currently have, but they need more help from the federal government.
## 3 Today, I introduced H.R. 241, the Timely Repatriation Act of 2017, to restrict diplomatic visas to countries that deny or unreasonably delay the repatriation of a national ordered removed from the U.N. The United States can no longer afford to give convicted foreign criminals a get-out-of-jail-free card. The safety of our own citizens is at stake. The issue is that we cannot permanently detain criminals and their countries of origin refuse to take them back. It is time we make them. Read the entire release here: http://poe.house.gov/2017/1/poe-introduces-the-timely-repatriation-act-of-2017
## 4 "Being sworn in is always an honor and a privilege, but this year is even more special for me. I am happy to also announce today that I am in remission from Leukemia. My fight is not yet over, but the progress that has been made in the last six months has been nothing short of miraculous, thanks to the Good Lord, the experts at MD Anderson Cancer Center, and the support and prayers of my family, friends, colleagues, staff and all of you. Since remission does not mean “cancer free”, I will continue to receive treatment both in Washington and Houston. After a challenging six months, I am grateful to be able to be here in Washington doing what I love to do: fight for Texas"
## 5 TERRORISM UPDATE: ·ISIS fighters attacked an Iraqi army outpost and a police station near the city of Tikrit on Friday, killing at least four soldiers and wounding 12 others, military and police sources said. The terrorists used a car bomb and two suicide attackers in their assault shortly after midnight on the army outpost in the town of al-Dour on Tikrit's outskirts, killing two officers and two soldiers, the sources said. Gunmen separately attacked the police station a short distance away and set fire to the building before fleeing the area. There were no casualties from that attack. ·Bangladeshi police shot dead two Islamist terrorists on Friday in a gunfight in Dhaka, including a prime suspect in the killing of 20 hostages, mostly foreigners, in a cafe in the capital last year. Nurul Islam Marjan, a commander of a splinter group of the Jamaat-ul-Mujahideen Bangladesh (JMB), was killed along with another suspected terrorist, Saddam Hossain. ·Elite Iraqi troops bridged a river under the cover of darkness to seize a district of Mosul in an unprecedented night raid early on Friday, part of a new phase in the battle to capture the city from ISIS after weeks of slowed momentum. Since last week, Iraqi forces have launched a renewed push to seize ground in the nearly three month-old offensive to capture the city, after progress stalled last month because of a need to slow the advance to protect civilians. ·US-backed militias in Syria have captured an ancient citadel from ISIS in a strategically significant advance against the jihadist group in its stronghold of Raqqa province, a spokesman said on Friday. The Jabar citadel on the banks of Lake Assad was taken by the Syrian Democratic Forces alliance on Thursday, militia spokesman Talal Silo said. It is located near a dam on the Euphrates River that the US-backed alliance also aims to capture in the current phase of its campaign. ·The Department of Defense announced yesterday that multiple airstrikes killed an estimated 20 or more al Qaeda “militants” in northern Syria at the beginning of the month. The results of the bombings are still being “assessed,” but US forces “struck two al Qaeda vehicles that had departed a large al Qaeda headquarters near Sarmada,” which is in the northern Idlib province, on January 1st. Then, on January 3rd, US forces “struck the headquarters compound itself, including multiple vehicles and structures.” Although the Defense Department doesn’t say it, the airstrikes were likely among the most significant carried out against al Qaeda facilities in Syria to date. ·The State Department announced on Thursday that Osama bin Laden’s son, Hamza, has been added to the US government’s list of designated terrorists. State cites Hamza’s appearances in al Qaeda’s propaganda since Aug. 2015 as evidence of his important role in the organization. The junior bin Laden has called for terror attacks throughout the West, including inside the US, and he also encouraged “Saudi Arabian-based tribes to unite with” Al Qaeda in the Arabian Peninsula (AQAP) “to wage war against the Kingdom of Saudi Arabia.” Al Qaeda has released several messages from Hamza since mid-2015, but they have all been audio speeches. The terror organization has not published photos or video of Hamza as an adult, likely for security reasons. ·Two car bombs in Baghdad claimed by ISIS killed at least 14 people on Thursday, police and medics said, part of a surge in violence across the capital. The first blast shook Baghdad's eastern al-Obeidi area during the morning rush, killing six and wounding 15. The second explosion hit the central district of Bab al-Moadham near a security checkpoint, killing eight. ·The US has killed four mid-level ISIS commanders involved in military, suicide, chemical, and financial operations in airstrikes in both Iraq and Syria since the beginning of December 2016, the military announced over the past week.
## type
## 1 link
## 2 link
## 3 photo
## 4 link
## 5 link
## link
## 1 https://www.c-span.org/video/?420804-1%2Fus-house-meets-elect-speaker-swear-members&live
## 2 http://poe.house.gov/2017/1/congressman-poe-introduces-the-smart-border-act
## 3 https://www.facebook.com/JudgeTedPoe/photos/a.137207746325572.19816.106631626049851/1275444719168530/?type=3
## 4 http://poe.house.gov/press-releases?ID=443ADE64-03E2-461A-BA56-FFB1489E9402
## 5 http://poe.house.gov/index.cfm?p=terrorism-update
## domain likes_count comments_count shares_count love_count
## 1 c-span.org 121 15 6 5
## 2 poe.house.gov 182 16 31 8
## 3 <NA> 137 13 25 9
## 4 poe.house.gov 1125 166 170 244
## 5 poe.house.gov 24 8 12 1
## haha_count wow_count sad_count angry_count
## 1 0 0 0 0
## 2 0 0 1 2
## 3 0 0 0 0
## 4 0 10 0 0
## 5 0 5 1 3
# what if we make a mistake and want to remove the table?
# dbRemoveTable(db, "posts")
# and we close the connection for now
dbDisconnect(db)
Now that we have our tables in the database, let’s see how we can query them. First we connect using dbConnect
and then we type our query with dbGetQuery
. Generally, we will write the query clauses in capital letters and the column names in lowercase.
db <- dbConnect(RSQLite::SQLite(), "facebook-db.sqlite")
test <- dbGetQuery(db, 'SELECT * FROM congress LIMIT 5')
str(test)
## 'data.frame': 5 obs. of 6 variables:
## $ bioid : chr "A000055" "A000360" "A000367" "A000369" ...
## $ screen_name: chr "RobertAderholt" "senatorlamaralexander" "repjustinamash" "MarkAmodeiNV2" ...
## $ name : chr "Robert B. Aderholt" "Lamar Alexander" "Justin Amash" "Mark E. Amodei" ...
## $ gender : chr "M" "M" "M" "M" ...
## $ type : chr "rep" "sen" "rep" "rep" ...
## $ party : chr "Republican" "Republican" "Republican" "Republican" ...
RMarkdown actually allows us to write the SQL query directly as a chunk of code:
SELECT * FROM congress LIMIT 5
bioid | screen_name | name | gender | type | party |
---|---|---|---|---|---|
A000055 | RobertAderholt | Robert B. Aderholt | M | rep | Republican |
A000360 | senatorlamaralexander | Lamar Alexander | M | sen | Republican |
A000367 | repjustinamash | Justin Amash | M | rep | Republican |
A000369 | MarkAmodeiNV2 | Mark E. Amodei | M | rep | Republican |
A000370 | CongresswomanAdams | Alma S. Adams | F | rep | Democrat |
This is the list of query clauses that we will use to work with the database table:
SELECT: determines which columns to include in the query’s result set
FROM: identifies the tables from which to draw data and how tables should be joined
LIMIT: in combination with SELECT, limits the number of records returned
WHERE: filter out unwanted data
GROUP BY: groups rows together by common column values
ORDER BY: sorts the rows in the final result set by one or more columns
Let’s start with some examples of SELECT:
# querying just one column
dbGetQuery(db, "SELECT name FROM congress LIMIT 10")
## name
## 1 Robert B. Aderholt
## 2 Lamar Alexander
## 3 Justin Amash
## 4 Mark E. Amodei
## 5 Alma S. Adams
## 6 Pete Aguilar
## 7 Rick W. Allen
## 8 Ralph Lee Abraham
## 9 Jodey C. Arrington
## 10 Joe Barton
# multiple columns
dbGetQuery(db, "SELECT name, party FROM congress LIMIT 10")
## name party
## 1 Robert B. Aderholt Republican
## 2 Lamar Alexander Republican
## 3 Justin Amash Republican
## 4 Mark E. Amodei Republican
## 5 Alma S. Adams Democrat
## 6 Pete Aguilar Democrat
## 7 Rick W. Allen Republican
## 8 Ralph Lee Abraham Republican
## 9 Jodey C. Arrington Republican
## 10 Joe Barton Republican
# adding expressions
dbGetQuery(db, "SELECT from_name, likes_count/comments_count, UPPER(type) FROM posts LIMIT 10")
## from_name likes_count/comments_count UPPER(type)
## 1 Ted Poe 8 LINK
## 2 Ted Poe 11 LINK
## 3 Ted Poe 10 PHOTO
## 4 Ted Poe 6 LINK
## 5 Ted Poe 3 LINK
## 6 Ted Poe 11 VIDEO
## 7 Ted Poe 17 PHOTO
## 8 Ted Poe 4 LINK
## 9 Ted Poe 10 PHOTO
## 10 Ted Poe 11 PHOTO
# adding aliases to the new columns
dbGetQuery(db, "SELECT from_name, likes_count/comments_count AS like_ratio FROM posts LIMIT 10")
## from_name like_ratio
## 1 Ted Poe 8
## 2 Ted Poe 11
## 3 Ted Poe 10
## 4 Ted Poe 6
## 5 Ted Poe 3
## 6 Ted Poe 11
## 7 Ted Poe 17
## 8 Ted Poe 4
## 9 Ted Poe 10
## 10 Ted Poe 11
# it's often a good idea to write a query in multiple lines so that it's easier to read it:
dbGetQuery(db, "SELECT from_name,
likes_count/comments_count AS like_ratio
FROM posts
LIMIT 10")
## from_name like_ratio
## 1 Ted Poe 8
## 2 Ted Poe 11
## 3 Ted Poe 10
## 4 Ted Poe 6
## 5 Ted Poe 3
## 6 Ted Poe 11
## 7 Ted Poe 17
## 8 Ted Poe 4
## 9 Ted Poe 10
## 10 Ted Poe 11
# selecting unique values from a column
dbGetQuery(db, "SELECT DISTINCT from_name
FROM posts
LIMIT 10")
## from_name
## 1 Ted Poe
## 2 Congressman Wm. Lacy Clay
## 3 Congressman David Scott
## 4 Congressman Hank Johnson
## 5 Rep. Steve Stivers
## 6 Sam Graves
## 7 Walter Jones
## 8 U.S. Congressman Henry Cuellar (TX-28)
## 9 Gregorio Kilili Camacho Sablan
## 10 Representative Mike Kelly
Now some examples of how we can use WHERE to subset the database based on values of specific columns:
# selecting based on values of a column
dbGetQuery(db, "SELECT name, party
FROM congress
WHERE party = 'Republican'
LIMIT 10")
## name party
## 1 Robert B. Aderholt Republican
## 2 Lamar Alexander Republican
## 3 Justin Amash Republican
## 4 Mark E. Amodei Republican
## 5 Rick W. Allen Republican
## 6 Ralph Lee Abraham Republican
## 7 Jodey C. Arrington Republican
## 8 Joe Barton Republican
## 9 Roy Blunt Republican
## 10 Kevin Brady Republican
# greater than, working with dates
dbGetQuery(db, "SELECT from_name, type, date
FROM posts
WHERE date > '2017-01-01'
LIMIT 10")
## from_name type date
## 1 Ted Poe link 2017-01-03
## 2 Ted Poe link 2017-01-03
## 3 Ted Poe photo 2017-01-05
## 4 Ted Poe link 2017-01-05
## 5 Ted Poe link 2017-01-06
## 6 Ted Poe video 2017-01-06
## 7 Ted Poe photo 2017-01-09
## 8 Ted Poe link 2017-01-13
## 9 Ted Poe photo 2017-01-18
## 10 Ted Poe photo 2017-01-18
# AND operator
dbGetQuery(db, "SELECT from_name, type, date, likes_count
FROM posts
WHERE date > '2017-06-01' AND type != 'photo'
AND likes_count > 500
LIMIT 10")
## from_name type date likes_count
## 1 Ted Poe status 2017-11-07 1101
## 2 Rep. Steve Stivers status 2017-08-15 1019
## 3 Sam Graves link 2017-09-25 1105
## 4 Walter Jones status 2017-08-22 526
## 5 Walter Jones link 2017-09-14 617
## 6 Gregorio Kilili Camacho Sablan link 2017-06-30 767
## 7 Representative Mike Kelly link 2017-09-24 1659
## 8 Congressman John Yarmuth (KY-3) status 2017-06-08 1246
## 9 Congressman John Yarmuth (KY-3) video 2017-06-10 509
## 10 Congressman John Yarmuth (KY-3) video 2017-06-22 729
# OR operator
dbGetQuery(db, "SELECT from_name, type, date, comments_count
FROM posts
WHERE comments_count>100 AND (type = 'photo' OR type = 'video')
LIMIT 10")
## from_name type date comments_count
## 1 Ted Poe video 2017-10-03 123
## 2 Ted Poe photo 2017-11-24 176
## 3 Ted Poe photo 2017-12-19 164
## 4 Congressman Wm. Lacy Clay photo 2017-01-04 317
## 5 Congressman Wm. Lacy Clay video 2017-01-12 163
## 6 Congressman David Scott video 2017-11-16 108
## 7 Congressman David Scott photo 2017-12-19 158
## 8 Rep. Steve Stivers photo 2017-05-03 132
## 9 Rep. Steve Stivers photo 2017-12-19 121
## 10 Sam Graves photo 2017-02-07 112
# membership, IN
dbGetQuery(db, "SELECT from_name, type, date, comments_count
FROM posts
WHERE type IN ('video', 'event')
LIMIT 10")
## from_name type date comments_count
## 1 Ted Poe video 2017-01-06 7
## 2 Ted Poe video 2017-01-24 23
## 3 Ted Poe video 2017-01-26 14
## 4 Ted Poe video 2017-01-31 19
## 5 Ted Poe video 2017-02-02 44
## 6 Ted Poe video 2017-02-06 44
## 7 Ted Poe video 2017-02-15 22
## 8 Ted Poe video 2017-02-23 47
## 9 Ted Poe video 2017-02-24 27
## 10 Ted Poe video 2017-03-01 23
# matching conditions:
# _ matches exactly one character
# % matches any number of characters
dbGetQuery(db, "SELECT from_name, type, date, comments_count
FROM posts
WHERE date LIKE '2017-01-__'
LIMIT 10")
## from_name type date comments_count
## 1 Ted Poe link 2017-01-03 15
## 2 Ted Poe link 2017-01-03 16
## 3 Ted Poe photo 2017-01-05 13
## 4 Ted Poe link 2017-01-05 166
## 5 Ted Poe link 2017-01-06 8
## 6 Ted Poe video 2017-01-06 7
## 7 Ted Poe photo 2017-01-09 13
## 8 Ted Poe link 2017-01-13 10
## 9 Ted Poe photo 2017-01-18 10
## 10 Ted Poe photo 2017-01-18 8
dbGetQuery(db, "SELECT from_name, type, date, comments_count
FROM posts
WHERE date LIKE '2017-03%'
LIMIT 10")
## from_name type date comments_count
## 1 Ted Poe photo 2017-03-01 1
## 2 Ted Poe video 2017-03-01 23
## 3 Ted Poe photo 2017-03-02 12
## 4 Ted Poe link 2017-03-03 27
## 5 Ted Poe photo 2017-03-07 7
## 6 Ted Poe link 2017-03-08 315
## 7 Ted Poe link 2017-03-10 34
## 8 Ted Poe video 2017-03-15 27
## 9 Ted Poe link 2017-03-17 142
## 10 Ted Poe photo 2017-03-29 39
# SQLite does not have regular expressions, but we can get creative...
dbGetQuery(db, "SELECT from_name, message, date
FROM posts
WHERE message LIKE '%london%'
LIMIT 1")
## from_name
## 1 Rep. Steve Stivers
## message
## 1 Inspiring story from London, OH in The Columbus Dispatch. It’s amazing to see how a community comes together to support one another, and I am excited for all the children who will be able to enjoy Noah’s Playground. http://bit.ly/2lN7aIp
## date
## 1 2017-03-02
Very useful for us – GROUP BY allows us to aggregate results based on values of a specific variables. We will come back to this later, so don’t worry about it for now.
dbGetQuery(db,
"SELECT from_name, COUNT(type) AS type_count
FROM posts
GROUP BY type")
## from_name type_count
## 1 US Rep Rick Nolan 1334
## 2 Representative Zoe Lofgren 61512
## 3 Senator Tammy Duckworth 21
## 4 Senator Brian Schatz 23
## 5 Representative Zoe Lofgren 67798
## 6 Representative Zoe Lofgren 14795
## 7 Representative Zoe Lofgren 29928
And we can then order the results using ORDER BY. The default is alphabetical order for the first column, but we can change that:
# sort by type_count
dbGetQuery(db,
"SELECT type, COUNT(type) AS type_count
FROM posts
GROUP BY type
ORDER BY type_count")
## type type_count
## 1 music 21
## 2 note 23
## 3 event 1334
## 4 status 14795
## 5 video 29928
## 6 link 61512
## 7 photo 67798
# now in descending orders
dbGetQuery(db,
"SELECT type, COUNT(type) AS type_count
FROM posts
GROUP BY type
ORDER BY type_count DESC")
## type type_count
## 1 photo 67798
## 2 link 61512
## 3 video 29928
## 4 status 14795
## 5 event 1334
## 6 note 23
## 7 music 21
# which was the most popular post?
dbGetQuery(db,
"SELECT from_name, message, likes_count, datetime
FROM posts
ORDER BY likes_count DESC
LIMIT 1")
## from_name
## 1 U.S. Senator Elizabeth Warren
## message
## 1 Tonight we fight for American values at airports all across this country. I'm at Logan Airport tonight to say: our diversity makes our country strong. We will not allow a Muslim ban in the United States of America. #NoBanNoWall #nomuslimban
## likes_count datetime
## 1 421064 2017-01-29 01:59:12
# You can also specify the column number instead of the name
dbGetQuery(db,
"SELECT from_name, message, likes_count, datetime
FROM posts
ORDER BY 3 DESC
LIMIT 1")
## from_name
## 1 U.S. Senator Elizabeth Warren
## message
## 1 Tonight we fight for American values at airports all across this country. I'm at Logan Airport tonight to say: our diversity makes our country strong. We will not allow a Muslim ban in the United States of America. #NoBanNoWall #nomuslimban
## likes_count datetime
## 1 421064 2017-01-29 01:59:12
# what was the post with the highest comment to like ratio? We subset only posts with 1000 likes or more to avoid outliers.
dbGetQuery(db,
"SELECT from_name, message, likes_count, comments_count, date,
comments_count/likes_count AS comment_like_ratio
FROM posts
WHERE likes_count > 1000
ORDER BY comment_like_ratio DESC
LIMIT 5")
## from_name
## 1 U.S. Senator Susan Collins
## 2 Senator Cory Gardner
## 3 Senator Cory Gardner
## 4 John Cornyn
## 5 Senator Roy Blunt
## message
## 1 After securing significant changes in the bill, as well as commitments to pass legislation to help lower health insurance premiums, I will cast my vote in support of the Senate tax reform bill. As revised, this bill will provide much-needed tax relief and simplification for lower- and middle-income families, while spurring the creation of good jobs and greater economic growth. Click below to read my full statement:
## 2 The teachers in our lives play an outsized role as we grow into adults and throughout our adult life - and for good reason. They taught me that we must work together in common cause for a greater school, community, and nation. I look forward to the great work ahead of us not as parents divided, but as people committed, in spite of these differences, to building a stronger education for our children and grandchildren.
## 3 Today marks a historic day for our country. It has been over 30 years since Congress has passed major tax reform and I’m excited to bring tax relief to millions of Coloradans. Read more here: https://www.gardner.senate.gov/newsroom/press-releases/gardner-statement-on-passage-of-historic-tax-relief-legislation
## 4 This week, I’ll be voting to confirm Betsy DeVos as Secretary of Education. Power over education should be handed back down to the states, so that parents and teachers can choose how best to accomplish our universal goal of making sure every child has a good education. Ms. DeVos will lead this effort, so that parents, teachers and our local school districts will have a greater role in these important decisions.
## 5 I believe Betsy DeVos understands that decisions about education need to be made much closer to where kids are. I look forward to working with her to find ways to get those decisions back to local school boards, and moms and dads. As Chairman of the Appropriations Subcommittee for the Department of Education, I will be working hard to ensure that public education continues to be appreciated as the cornerstone of opportunity in America.
## likes_count comments_count date comment_like_ratio
## 1 1070 14974 2017-12-01 13
## 2 1135 14298 2017-02-07 12
## 3 1499 15271 2017-12-02 10
## 4 3762 36849 2017-02-06 9
## 5 1595 15167 2017-02-01 9