Importing network data into R

To familiarize ourselves with social network analysis before we turn to social media, we will looking at a network from the book ``A Storm of Words’’, the third book in the Song of Ice and Fire series by George R.R. Martin.

The source of this dataset is this blog post. Each character in the book will be a different nodes. Each edge between two characters indicates their names appeared within 15 words of one another in the text of the book.

The first step is to read the list of edges in this network:

##   Source  Target Weight
## 1  Aemon   Grenn      5
## 2  Aemon Samwell     31
## 3  Aerys   Jaime     18
## 4  Aerys  Robert      6
## 5  Aerys  Tyrion      5
## 6  Aerys   Tywin      8

How do we convert this dataset into a network object in R? There are multiple packages to work with networks, but the most popular is igraph because it’s very flexible and easy to do, and in my experience it’s much faster and scales well to very large networks. Other packages that you may want to explore are sna and networks.

Now, how do we create the igraph object? We can use the graph_from_data_frame function:

g <- graph_from_data_frame(d=edges, directed=FALSE)
g
## IGRAPH 30c7828 UN-- 107 352 -- 
## + attr: name (v/c), Weight (e/n)
## + edges from 30c7828 (vertex names):
##  [1] Aemon  --Grenn     Aemon  --Samwell   Aerys  --Jaime    
##  [4] Aerys  --Robert    Aerys  --Tyrion    Aerys  --Tywin    
##  [7] Alliser--Mance     Amory  --Oberyn    Arya   --Anguy    
## [10] Arya   --Beric     Arya   --Bran      Arya   --Brynden  
## [13] Arya   --Cersei    Arya   --Gendry    Arya   --Gregor   
## [16] Arya   --Jaime     Arya   --Joffrey   Arya   --Jon      
## [19] Arya   --Rickon    Arya   --Robert    Arya   --Roose    
## [22] Arya   --Sandor    Arya   --Thoros    Arya   --Tyrion   
## + ... omitted several edges

What does it mean? - U means undirected
- N means named graph
- 107 is the number of nodes
- 352 is the number of edges
- name (v/c) means name is a node attribute and it’s a character
- weight (e/n) means weight is an edge attribute and it’s numeric

This is how you access specific elements within the igraph object:

V(g) # nodes
## + 107/107 vertices, named, from 30c7828:
##   [1] Aemon        Aerys        Alliser      Amory        Arya        
##   [6] Balon        Belwas       Beric        Bran         Brienne     
##  [11] Bronn        Brynden      Catelyn      Cersei       Craster     
##  [16] Daario       Daenerys     Davos        Eddard       Eddison     
##  [21] Edmure       Gendry       Gilly        Gregor       Hodor       
##  [26] Hoster       Irri         Jaime        Janos        Joffrey     
##  [31] Jojen        Jon          Jon Arryn    Jorah        Kevan       
##  [36] Loras        Lothar       Luwin        Lysa         Mance       
##  [41] Meera        Melisandre   Meryn        Missandei    Myrcella    
##  [46] Oberyn       Podrick      Rattleshirt  Renly        Rhaegar     
## + ... omitted several vertices
V(g)$name # names of each node
##   [1] "Aemon"        "Aerys"        "Alliser"      "Amory"       
##   [5] "Arya"         "Balon"        "Belwas"       "Beric"       
##   [9] "Bran"         "Brienne"      "Bronn"        "Brynden"     
##  [13] "Catelyn"      "Cersei"       "Craster"      "Daario"      
##  [17] "Daenerys"     "Davos"        "Eddard"       "Eddison"     
##  [21] "Edmure"       "Gendry"       "Gilly"        "Gregor"      
##  [25] "Hodor"        "Hoster"       "Irri"         "Jaime"       
##  [29] "Janos"        "Joffrey"      "Jojen"        "Jon"         
##  [33] "Jon Arryn"    "Jorah"        "Kevan"        "Loras"       
##  [37] "Lothar"       "Luwin"        "Lysa"         "Mance"       
##  [41] "Meera"        "Melisandre"   "Meryn"        "Missandei"   
##  [45] "Myrcella"     "Oberyn"       "Podrick"      "Rattleshirt" 
##  [49] "Renly"        "Rhaegar"      "Rickard"      "Rickon"      
##  [53] "Robb"         "Robert"       "Robert Arryn" "Roose"       
##  [57] "Samwell"      "Sandor"       "Sansa"        "Shae"        
##  [61] "Shireen"      "Stannis"      "Tommen"       "Tyrion"      
##  [65] "Tywin"        "Val"          "Varys"        "Viserys"     
##  [69] "Walder"       "Walton"       "Ygritte"      "Grenn"       
##  [73] "Anguy"        "Thoros"       "Barristan"    "Illyrio"     
##  [77] "Nan"          "Theon"        "Jeyne"        "Petyr"       
##  [81] "Roslin"       "Elia"         "Ilyn"         "Pycelle"     
##  [85] "Karl"         "Drogo"        "Aegon"        "Kraznys"     
##  [89] "Rakharo"      "Worm"         "Cressen"      "Salladhor"   
##  [93] "Qyburn"       "Bowen"        "Margaery"     "Dalla"       
##  [97] "Orell"        "Qhorin"       "Styr"         "Lancel"      
## [101] "Olenna"       "Marillion"    "Ellaria"      "Mace"        
## [105] "Ramsay"       "Chataya"      "Doran"
E(g) # edges
## + 352/352 edges from 30c7828 (vertex names):
##  [1] Aemon  --Grenn     Aemon  --Samwell   Aerys  --Jaime    
##  [4] Aerys  --Robert    Aerys  --Tyrion    Aerys  --Tywin    
##  [7] Alliser--Mance     Amory  --Oberyn    Arya   --Anguy    
## [10] Arya   --Beric     Arya   --Bran      Arya   --Brynden  
## [13] Arya   --Cersei    Arya   --Gendry    Arya   --Gregor   
## [16] Arya   --Jaime     Arya   --Joffrey   Arya   --Jon      
## [19] Arya   --Rickon    Arya   --Robert    Arya   --Roose    
## [22] Arya   --Sandor    Arya   --Thoros    Arya   --Tyrion   
## [25] Balon  --Loras     Belwas --Barristan Belwas --Illyrio  
## [28] Beric  --Anguy     Beric  --Gendry    Beric  --Thoros   
## + ... omitted several edges
g[1:10, 1:10] # adjacency matrix
## 10 x 10 sparse Matrix of class "dgCMatrix"
##    [[ suppressing 10 column names 'Aemon', 'Aerys', 'Alliser' ... ]]
##                            
## Aemon   . . . . . . . . . .
## Aerys   . . . . . . . . . .
## Alliser . . . . . . . . . .
## Amory   . . . . . . . . . .
## Arya    . . . . . . . 1 1 .
## Balon   . . . . . . . . . .
## Belwas  . . . . . . . . . .
## Beric   . . . . 1 . . . . .
## Bran    . . . . 1 . . . . .
## Brienne . . . . . . . . . .
g[1,1:20] # first row of adjacency matrix
##    Aemon    Aerys  Alliser    Amory     Arya    Balon   Belwas    Beric 
##        0        0        0        0        0        0        0        0 
##     Bran  Brienne    Bronn  Brynden  Catelyn   Cersei  Craster   Daario 
##        0        0        0        0        0        0        0        0 
## Daenerys    Davos   Eddard  Eddison 
##        0        0        0        0

Measuring node importance

What are the most important nodes in a network? We can answer this question computing a metric of centrality.

The most basic measure is degree, the number of adjacent edges to each node. It is often considered a measure of direct influence. In this network, it will be the total number of times each user co-appears with someone else.

sort(degree(g))
##        Amory      Shireen       Walton      Illyrio         Karl 
##            1            1            1            1            1 
##        Aegon      Kraznys      Rakharo         Worm      Cressen 
##            1            1            1            1            1 
##    Salladhor       Qyburn        Orell       Lancel       Ramsay 
##            1            1            1            1            1 
##        Doran    Jon Arryn        Luwin    Missandei      Rickard 
##            1            2            2            2            2 
##        Anguy          Nan        Jeyne        Bowen         Styr 
##            2            2            2            2            2 
##       Olenna      Ellaria      Chataya      Alliser      Eddison 
##            2            2            2            3            3 
##       Hoster Robert Arryn      Viserys        Dalla    Marillion 
##            3            3            3            3            3 
##         Mace        Aerys       Belwas        Bronn       Daario 
##            3            4            4            4            4 
##       Gendry        Gilly        Hodor         Irri        Jojen 
##            4            4            4            4            4 
##   Melisandre     Myrcella  Rattleshirt        Roose          Val 
##            4            4            4            4            4 
##      Ygritte        Grenn        Theon       Roslin      Pycelle 
##            4            4            4            4            4 
##        Drogo        Aemon      Craster        Davos       Lothar 
##            4            5            5            5            5 
##        Meera      Podrick         Shae       Tommen       Thoros 
##            5            5            5            5            5 
##         Elia       Qhorin        Balon        Beric        Janos 
##            5            5            6            6            6 
##        Jorah        Kevan      Rhaegar       Rickon    Barristan 
##            6            6            6            6            6 
##         Ilyn      Brienne        Meryn       Oberyn        Varys 
##            6            7            7            7            7 
##        Petyr     Margaery      Brynden       Edmure        Renly 
##            7            7            8            8            8 
##       Walder        Loras         Lysa       Eddard       Gregor 
##            8            9           10           12           12 
##        Mance       Sandor         Bran     Daenerys      Stannis 
##           12           13           14           14           14 
##      Samwell      Catelyn      Joffrey       Robert         Arya 
##           15           18           18           18           19 
##       Cersei        Tywin        Jaime         Robb          Jon 
##           20           22           24           25           26 
##        Sansa       Tyrion 
##           26           36

In directed graphs, there are three types of degree: indegree (incoming edges), outdegree (outgoing edges), and total degree. You can compute these using mode="in" or mode="out" or mode="total".

tail(sort(degree(g, mode="in")))
##  Tywin  Jaime   Robb    Jon  Sansa Tyrion 
##     22     24     25     26     26     36
tail(sort(degree(g, mode="out")))
##  Tywin  Jaime   Robb    Jon  Sansa Tyrion 
##     22     24     25     26     26     36

Here they will be identical because the network is undirected.

Closeness measures how many steps are required to access every other node from a given node. It’s a measure of how long information takes to arrive (who hears news first?), or how easily a node can reach other nodes. Higher values mean less centrality.

tail(sort(closeness(g, normalized=TRUE)))
##   Stannis      Arya      Robb    Robert     Sansa    Tyrion 
## 0.4796380 0.4862385 0.4884793 0.5000000 0.5096154 0.5120773

Betweenness measures brokerage or gatekeeping potential. It is (approximately) the number of shortest paths between nodes that pass through a particular node. It defines the importance of a node is in terms of how frequently it connects other nodes.

tail(sort(betweenness(g)))
##     Sansa      Robb  Daenerys    Tyrion    Robert       Jon 
##  705.1986  706.5573  874.8372 1101.3850 1165.6025 1279.7534

Network properties

Let’s now try to describe what a network looks like as a whole. An important measure is edge_density – the proportion of edges in the network over all possible edges that could exist.

edge_density(g)
## [1] 0.06207018

reciprocity measures the propensity of each edge to be a mutual edge; that is, the probability that if i is connected to j, j is also connected to i.

reciprocity(g)
## [1] 1

Why is it 1?

transitivity, also known as clustering coefficient, measures that probability that adjacent nodes of a network are connected. In other words, if i is connected to j, and j is connected to k, what is the probability that i is also connected to k?

transitivity(g)
## [1] 0.3286615

Network communities

Networks often have different clusters or communities of nodes that are more densely connected to each other than to the rest of the network. The goal of community detection algorithms is to identify these subsets.

There are a few different algorithms, each following a different logic: walktrap algorith, infomap, fast and greedy, edge-betweenness, label propagation… We’ll get to them a bit later.

My experience is that infomap tends to work better in most social science examples (websites, social media, classrooms, etc), so we’ll go along with it for now just to take a look at the output

comm <- cluster_infomap(g)
# let's look at some of the small ones first...
comm[[4]]
##  [1] "Belwas"    "Daario"    "Daenerys"  "Irri"      "Jorah"    
##  [6] "Missandei" "Rhaegar"   "Viserys"   "Barristan" "Illyrio"  
## [11] "Drogo"     "Aegon"     "Kraznys"   "Rakharo"   "Worm"
comm[[5]]
## [1] "Arya"   "Beric"  "Eddard" "Gendry" "Sandor" "Anguy"  "Thoros"
comm[[6]]
## [1] "Bran"   "Hodor"  "Jojen"  "Luwin"  "Meera"  "Rickon" "Nan"    "Theon"
comm[[7]]
## [1] "Davos"      "Melisandre" "Shireen"    "Cressen"    "Salladhor"
# now back to the large ones
comm[[1]]
##  [1] "Aerys"    "Amory"    "Balon"    "Bronn"    "Cersei"   "Gregor"  
##  [7] "Jaime"    "Joffrey"  "Kevan"    "Loras"    "Meryn"    "Myrcella"
## [13] "Oberyn"   "Podrick"  "Renly"    "Robert"   "Sansa"    "Shae"    
## [19] "Stannis"  "Tommen"   "Tyrion"   "Tywin"    "Varys"    "Walton"  
## [25] "Elia"     "Ilyn"     "Pycelle"  "Qyburn"   "Margaery" "Lancel"  
## [31] "Olenna"   "Ellaria"  "Mace"     "Chataya"  "Doran"
comm[[2]]
##  [1] "Aemon"       "Alliser"     "Craster"     "Eddison"     "Gilly"      
##  [6] "Janos"       "Jon"         "Mance"       "Rattleshirt" "Samwell"    
## [11] "Val"         "Ygritte"     "Grenn"       "Karl"        "Bowen"      
## [16] "Dalla"       "Orell"       "Qhorin"      "Styr"
comm[[3]]
##  [1] "Brienne" "Brynden" "Catelyn" "Edmure"  "Hoster"  "Lothar"  "Rickard"
##  [8] "Robb"    "Roose"   "Walder"  "Jeyne"   "Petyr"   "Roslin"  "Ramsay"

The final way in which we can think about network communities is in terms of hierarchy or structure. We’ll discuss one of these methods.

K-core decomposition allows us to identify the core and the periphery of the network. A k-core is a maximal subnet of a network such that all nodes have at least degree K.

coreness(g)
##        Aemon        Aerys      Alliser        Amory         Arya 
##            4            4            3            1            7 
##        Balon       Belwas        Beric         Bran      Brienne 
##            6            3            4            6            6 
##        Bronn      Brynden      Catelyn       Cersei      Craster 
##            4            5            7            7            4 
##       Daario     Daenerys        Davos       Eddard      Eddison 
##            3            3            2            7            3 
##       Edmure       Gendry        Gilly       Gregor        Hodor 
##            5            4            4            7            4 
##       Hoster         Irri        Jaime        Janos      Joffrey 
##            3            3            7            4            7 
##        Jojen          Jon    Jon Arryn        Jorah        Kevan 
##            4            6            2            3            5 
##        Loras       Lothar        Luwin         Lysa        Mance 
##            6            4            2            6            4 
##        Meera   Melisandre        Meryn    Missandei     Myrcella 
##            4            3            6            2            4 
##       Oberyn      Podrick  Rattleshirt        Renly      Rhaegar 
##            4            4            4            6            3 
##      Rickard       Rickon         Robb       Robert Robert Arryn 
##            2            5            7            7            3 
##        Roose      Samwell       Sandor        Sansa         Shae 
##            4            4            7            7            4 
##      Shireen      Stannis       Tommen       Tyrion        Tywin 
##            1            7            4            7            7 
##          Val        Varys      Viserys       Walder       Walton 
##            3            5            3            5            1 
##      Ygritte        Grenn        Anguy       Thoros    Barristan 
##            4            3            2            4            3 
##      Illyrio          Nan        Theon        Jeyne        Petyr 
##            1            2            4            2            6 
##       Roslin         Elia         Ilyn      Pycelle         Karl 
##            4            4            6            4            1 
##        Drogo        Aegon      Kraznys      Rakharo         Worm 
##            3            1            1            1            1 
##      Cressen    Salladhor       Qyburn        Bowen     Margaery 
##            1            1            1            2            5 
##        Dalla        Orell       Qhorin         Styr       Lancel 
##            3            1            4            2            1 
##       Olenna    Marillion      Ellaria         Mace       Ramsay 
##            2            3            2            3            1 
##      Chataya        Doran 
##            2            1
table(coreness(g))
## 
##  1  2  3  4  5  6  7 
## 16 13 19 28  7 10 14
which(coreness(g)==7) # what is the core of the network?
##    Arya Catelyn  Cersei  Eddard  Gregor   Jaime Joffrey    Robb  Robert 
##       5      13      14      19      24      28      30      53      54 
##  Sandor   Sansa Stannis  Tyrion   Tywin 
##      58      59      62      64      65
# let's plot the core of the network...
core <- induced_subgraph(g, v=which(coreness(g)==7))
plot(core)