The purpose of this application is to analyze the Twitter network of the Members of the U.S. Congress.
In the data
folder, you will find two data files:
congress-twitter-network-edges.csv
contains the edges of this network. Note that these nodes are directed: they indicate whether the legislator in the source
column follows the legislator in the target
column.congress-twitter-network-nodes.csv
contains information about each of the nodes. The only important variables we will use here are: id_str
(the unique Twitter ID for each legislator; same as in the edge list), name
(full name of each legislator), party
(Republican, Democrat or Independent), and chamber
(rep
for the House of Representatives, sen
for the Senate).The first step will be to read these two datasets into R and construct the igraph object. How many nodes and edges does this network have?
nodes <- read.csv("data/congress-twitter-network-nodes.csv")
edges <- read.csv("data/congress-twitter-network-edges.csv")
library(igraph)
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
g <- graph_from_data_frame(d=edges, vertices=nodes, directed=TRUE)
g
## IGRAPH DN-- 517 61766 --
## + attr: name (v/c), twitter (v/c), bioid (v/c), gender (v/c),
## | chamber (v/c), party (v/c), followers_count (v/n)
## + edges (vertex names):
## [1] Ander Crenshaw->Cathy McMorris Rodgers
## [2] Ander Crenshaw->Bill Posey
## [3] Ander Crenshaw->Darrell E. Issa
## [4] Ander Crenshaw->Kevin McCarthy
## [5] Ander Crenshaw->John Boozman
## [6] Ander Crenshaw->Mario Diaz-Balart
## [7] Ander Crenshaw->Patrick T. McHenry
## + ... omitted several edges
This network is too large for us to visualize it directly with R, so let’s try to learn more about it using what we have learned so far.
How many components does this network have? As you will see, in this particular case it makes sense that we work only with the giant component.
components(g)[c("csize", "no")]
## $csize
## [1] 514 1 1 1
##
## $no
## [1] 4
g <- decompose(g)[[1]]
Who are the most relevant Members of Congress, according to different measures of centrality? Note that this is a directed network, which means there is a difference between indegree and outdegree.
tail(sort(degree(g, mode="in")))
## Steny H. Hoyer Nancy Pelosi Cathy McMorris Rodgers
## 227 236 238
## Darrell E. Issa Kevin McCarthy Paul Ryan
## 250 279 292
tail(sort(betweenness(g)))
## Kevin McCarthy Glenn Thompson Yvette D. Clarke Darrell E. Issa
## 3558.377 3815.390 4044.581 4049.154
## Kyrsten Sinema Dean Heller
## 4570.111 8115.120
tail(sort(page_rank(g)$vector))
## John Cornyn Steny H. Hoyer John McCain Nancy Pelosi Kevin McCarthy
## 0.004848300 0.004885626 0.005163731 0.005257823 0.005644866
## Paul Ryan
## 0.007113986
tail(sort(authority_score(g)$vector))
## Fred Upton Jason Chaffetz Cathy McMorris Rodgers
## 0.8698984 0.8777856 0.9228824
## Darrell E. Issa Kevin McCarthy Paul Ryan
## 0.9516859 0.9807008 1.0000000
What communities can you find in the network? Use the additional node-level variables to try to identify whether these communities overlap with any of these other attributes. Try different community detection algorithms to see if you get different answers.
comm <- cluster_walktrap(g)
V(g)$comm <- membership(comm)
table(V(g)$comm, V(g)$party)
##
## Democrat Independent Republican
## 1 224 1 0
## 2 0 0 289
comm <- cluster_infomap(g)
V(g)$comm <- membership(comm)
table(V(g)$comm, V(g)$party, V(g)$chamber)
## , , = rep
##
##
## Democrat Independent Republican
## 1 0 0 239
## 2 183 0 0
## 3 0 0 0
##
## , , = sen
##
##
## Democrat Independent Republican
## 1 0 0 4
## 2 0 0 0
## 3 41 1 46
Finally, we’ll try to visualize part of the network as well: only the Senators. Note also that instead of plotting it in the Viewer window, we’ll write directly to a PDF file. I have added a few options here for you so that it’s faster, but note that this will probably take 1-2 minutes.
sen <- induced_subgraph(g, V(g)$chamber=="sen")
set.seed(123)
fr <- layout_with_fr(sen, niter=1000)
V(sen)$color <- ifelse(V(sen)$party=="Republican", "red", "blue") # clue
V(sen)$label <- NA
V(sen)$size <- authority_score(sen)$vector * 5
pdf("congress-network.pdf")
par(mar=c(0,0,0,0))
plot(sen, edge.curved=.25, edge.width=.05, edge.arrow.mode=0)
dev.off()