The purpose of this application is to analyze the Twitter network of the Members of the U.S. Congress.

In the data folder, you will find two data files:

The first step will be to read these two datasets into R and construct the igraph object. How many nodes and edges does this network have?

nodes <- read.csv("data/congress-twitter-network-nodes.csv")
edges <- read.csv("data/congress-twitter-network-edges.csv")

library(igraph)
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
g <- graph_from_data_frame(d=edges, vertices=nodes, directed=TRUE)
g
## IGRAPH DN-- 517 61766 -- 
## + attr: name (v/c), twitter (v/c), bioid (v/c), gender (v/c),
## | chamber (v/c), party (v/c), followers_count (v/n)
## + edges (vertex names):
##  [1] Ander Crenshaw->Cathy McMorris Rodgers
##  [2] Ander Crenshaw->Bill Posey            
##  [3] Ander Crenshaw->Darrell E. Issa       
##  [4] Ander Crenshaw->Kevin McCarthy        
##  [5] Ander Crenshaw->John Boozman          
##  [6] Ander Crenshaw->Mario Diaz-Balart     
##  [7] Ander Crenshaw->Patrick T. McHenry    
## + ... omitted several edges

This network is too large for us to visualize it directly with R, so let’s try to learn more about it using what we have learned so far.

How many components does this network have? As you will see, in this particular case it makes sense that we work only with the giant component.

components(g)[c("csize", "no")]
## $csize
## [1] 514   1   1   1
## 
## $no
## [1] 4
g <- decompose(g)[[1]]

Who are the most relevant Members of Congress, according to different measures of centrality? Note that this is a directed network, which means there is a difference between indegree and outdegree.

tail(sort(degree(g, mode="in")))
##         Steny H. Hoyer           Nancy Pelosi Cathy McMorris Rodgers 
##                    227                    236                    238 
##        Darrell E. Issa         Kevin McCarthy              Paul Ryan 
##                    250                    279                    292
tail(sort(betweenness(g)))
##   Kevin McCarthy   Glenn Thompson Yvette D. Clarke  Darrell E. Issa 
##         3558.377         3815.390         4044.581         4049.154 
##   Kyrsten Sinema      Dean Heller 
##         4570.111         8115.120
tail(sort(page_rank(g)$vector))
##    John Cornyn Steny H. Hoyer    John McCain   Nancy Pelosi Kevin McCarthy 
##    0.004848300    0.004885626    0.005163731    0.005257823    0.005644866 
##      Paul Ryan 
##    0.007113986
tail(sort(authority_score(g)$vector))
##             Fred Upton         Jason Chaffetz Cathy McMorris Rodgers 
##              0.8698984              0.8777856              0.9228824 
##        Darrell E. Issa         Kevin McCarthy              Paul Ryan 
##              0.9516859              0.9807008              1.0000000

What communities can you find in the network? Use the additional node-level variables to try to identify whether these communities overlap with any of these other attributes. Try different community detection algorithms to see if you get different answers.

comm <- cluster_walktrap(g)
V(g)$comm <- membership(comm)
table(V(g)$comm, V(g)$party)
##    
##     Democrat Independent Republican
##   1      224           1          0
##   2        0           0        289
comm <- cluster_infomap(g)
V(g)$comm <- membership(comm)
table(V(g)$comm, V(g)$party, V(g)$chamber)
## , ,  = rep
## 
##    
##     Democrat Independent Republican
##   1        0           0        239
##   2      183           0          0
##   3        0           0          0
## 
## , ,  = sen
## 
##    
##     Democrat Independent Republican
##   1        0           0          4
##   2        0           0          0
##   3       41           1         46

Finally, we’ll try to visualize part of the network as well: only the Senators. Note also that instead of plotting it in the Viewer window, we’ll write directly to a PDF file. I have added a few options here for you so that it’s faster, but note that this will probably take 1-2 minutes.

sen <- induced_subgraph(g, V(g)$chamber=="sen")

set.seed(123)
fr <- layout_with_fr(sen, niter=1000)
V(sen)$color <- ifelse(V(sen)$party=="Republican", "red", "blue") # clue
V(sen)$label <- NA
V(sen)$size <- authority_score(sen)$vector * 5

pdf("congress-network.pdf")
par(mar=c(0,0,0,0))
plot(sen, edge.curved=.25, edge.width=.05, edge.arrow.mode=0)
dev.off()