Importing network data into R

To familiarize ourselves with social network analysis before we turn to social media, we will looking at a network from the book ``A Storm of Words’’, the third book in the Song of Ice and Fire series by George R.R. Martin.

The source of this dataset is this blog post. Each character in the book will be a different nodes. Each edge between two characters indicates their names appeared within 15 words of one another in the text of the book.

The first step is to read the list of edges in this network:

##   Source  Target Weight
## 1  Aemon   Grenn      5
## 2  Aemon Samwell     31
## 3  Aerys   Jaime     18
## 4  Aerys  Robert      6
## 5  Aerys  Tyrion      5
## 6  Aerys   Tywin      8

How do we convert this dataset into a network object in R? There are multiple packages to work with networks, but the most popular is igraph because it’s very flexible and easy to do, and in my experience it’s much faster and scales well to very large networks. Other packages that you may want to explore are sna and networks.

Now, how do we create the igraph object? We can use the graph_from_data_frame function:

g <- graph_from_data_frame(d=edges, directed=FALSE)
g
## IGRAPH d1dd47e UNW- 107 352 -- 
## + attr: name (v/c), weight (e/n)
## + edges from d1dd47e (vertex names):
##  [1] Aemon  --Grenn     Aemon  --Samwell   Aerys  --Jaime     Aerys  --Robert   
##  [5] Aerys  --Tyrion    Aerys  --Tywin     Alliser--Mance     Amory  --Oberyn   
##  [9] Arya   --Anguy     Arya   --Beric     Arya   --Bran      Arya   --Brynden  
## [13] Arya   --Cersei    Arya   --Gendry    Arya   --Gregor    Arya   --Jaime    
## [17] Arya   --Joffrey   Arya   --Jon       Arya   --Rickon    Arya   --Robert   
## [21] Arya   --Roose     Arya   --Sandor    Arya   --Thoros    Arya   --Tyrion   
## [25] Balon  --Loras     Belwas --Barristan Belwas --Illyrio   Beric  --Anguy    
## [29] Beric  --Gendry    Beric  --Thoros    Bran   --Hodor     Bran   --Jojen    
## + ... omitted several edges

What does it mean? - U means undirected
- N means named graph
- 107 is the number of nodes
- 352 is the number of edges
- name (v/c) means name is a node attribute and it’s a character
- weight (e/n) means weight is an edge attribute and it’s numeric

This is how you access specific elements within the igraph object:

V(g) # nodes
## + 107/107 vertices, named, from d1dd47e:
##   [1] Aemon        Aerys        Alliser      Amory        Arya        
##   [6] Balon        Belwas       Beric        Bran         Brienne     
##  [11] Bronn        Brynden      Catelyn      Cersei       Craster     
##  [16] Daario       Daenerys     Davos        Eddard       Eddison     
##  [21] Edmure       Gendry       Gilly        Gregor       Hodor       
##  [26] Hoster       Irri         Jaime        Janos        Joffrey     
##  [31] Jojen        Jon          Jon Arryn    Jorah        Kevan       
##  [36] Loras        Lothar       Luwin        Lysa         Mance       
##  [41] Meera        Melisandre   Meryn        Missandei    Myrcella    
##  [46] Oberyn       Podrick      Rattleshirt  Renly        Rhaegar     
## + ... omitted several vertices
V(g)$name # names of each node
##   [1] "Aemon"        "Aerys"        "Alliser"      "Amory"        "Arya"        
##   [6] "Balon"        "Belwas"       "Beric"        "Bran"         "Brienne"     
##  [11] "Bronn"        "Brynden"      "Catelyn"      "Cersei"       "Craster"     
##  [16] "Daario"       "Daenerys"     "Davos"        "Eddard"       "Eddison"     
##  [21] "Edmure"       "Gendry"       "Gilly"        "Gregor"       "Hodor"       
##  [26] "Hoster"       "Irri"         "Jaime"        "Janos"        "Joffrey"     
##  [31] "Jojen"        "Jon"          "Jon Arryn"    "Jorah"        "Kevan"       
##  [36] "Loras"        "Lothar"       "Luwin"        "Lysa"         "Mance"       
##  [41] "Meera"        "Melisandre"   "Meryn"        "Missandei"    "Myrcella"    
##  [46] "Oberyn"       "Podrick"      "Rattleshirt"  "Renly"        "Rhaegar"     
##  [51] "Rickard"      "Rickon"       "Robb"         "Robert"       "Robert Arryn"
##  [56] "Roose"        "Samwell"      "Sandor"       "Sansa"        "Shae"        
##  [61] "Shireen"      "Stannis"      "Tommen"       "Tyrion"       "Tywin"       
##  [66] "Val"          "Varys"        "Viserys"      "Walder"       "Walton"      
##  [71] "Ygritte"      "Grenn"        "Anguy"        "Thoros"       "Barristan"   
##  [76] "Illyrio"      "Nan"          "Theon"        "Jeyne"        "Petyr"       
##  [81] "Roslin"       "Elia"         "Ilyn"         "Pycelle"      "Karl"        
##  [86] "Drogo"        "Aegon"        "Kraznys"      "Rakharo"      "Worm"        
##  [91] "Cressen"      "Salladhor"    "Qyburn"       "Bowen"        "Margaery"    
##  [96] "Dalla"        "Orell"        "Qhorin"       "Styr"         "Lancel"      
## [101] "Olenna"       "Marillion"    "Ellaria"      "Mace"         "Ramsay"      
## [106] "Chataya"      "Doran"
E(g) # edges
## + 352/352 edges from d1dd47e (vertex names):
##  [1] Aemon  --Grenn     Aemon  --Samwell   Aerys  --Jaime     Aerys  --Robert   
##  [5] Aerys  --Tyrion    Aerys  --Tywin     Alliser--Mance     Amory  --Oberyn   
##  [9] Arya   --Anguy     Arya   --Beric     Arya   --Bran      Arya   --Brynden  
## [13] Arya   --Cersei    Arya   --Gendry    Arya   --Gregor    Arya   --Jaime    
## [17] Arya   --Joffrey   Arya   --Jon       Arya   --Rickon    Arya   --Robert   
## [21] Arya   --Roose     Arya   --Sandor    Arya   --Thoros    Arya   --Tyrion   
## [25] Balon  --Loras     Belwas --Barristan Belwas --Illyrio   Beric  --Anguy    
## [29] Beric  --Gendry    Beric  --Thoros    Bran   --Hodor     Bran   --Jojen    
## [33] Bran   --Jon       Bran   --Luwin     Bran   --Meera     Bran   --Nan      
## [37] Bran   --Rickon    Bran   --Samwell   Bran   --Theon     Brienne--Loras    
## + ... omitted several edges
g[1:10, 1:10] # adjacency matrix
## 10 x 10 sparse Matrix of class "dgCMatrix"
##    [[ suppressing 10 column names 'Aemon', 'Aerys', 'Alliser' ... ]]
##                              
## Aemon   . . . .  . . .  . . .
## Aerys   . . . .  . . .  . . .
## Alliser . . . .  . . .  . . .
## Amory   . . . .  . . .  . . .
## Arya    . . . .  . . . 23 9 .
## Balon   . . . .  . . .  . . .
## Belwas  . . . .  . . .  . . .
## Beric   . . . . 23 . .  . . .
## Bran    . . . .  9 . .  . . .
## Brienne . . . .  . . .  . . .
g[1,1:20] # first row of adjacency matrix
##    Aemon    Aerys  Alliser    Amory     Arya    Balon   Belwas    Beric 
##        0        0        0        0        0        0        0        0 
##     Bran  Brienne    Bronn  Brynden  Catelyn   Cersei  Craster   Daario 
##        0        0        0        0        0        0        0        0 
## Daenerys    Davos   Eddard  Eddison 
##        0        0        0        0

Measuring node importance

What are the most important nodes in a network? We can answer this question computing a metric of centrality.

The most basic measure is degree, the number of adjacent edges to each node. It is often considered a measure of direct influence. In this network, it will be the unique number of times each user co-appears with someone else. For example, Tyrion co-appears at least once with 36 other characters.

sort(degree(g))
##        Amory      Shireen       Walton      Illyrio         Karl        Aegon 
##            1            1            1            1            1            1 
##      Kraznys      Rakharo         Worm      Cressen    Salladhor       Qyburn 
##            1            1            1            1            1            1 
##        Orell       Lancel       Ramsay        Doran    Jon Arryn        Luwin 
##            1            1            1            1            2            2 
##    Missandei      Rickard        Anguy          Nan        Jeyne        Bowen 
##            2            2            2            2            2            2 
##         Styr       Olenna      Ellaria      Chataya      Alliser      Eddison 
##            2            2            2            2            3            3 
##       Hoster Robert Arryn      Viserys        Dalla    Marillion         Mace 
##            3            3            3            3            3            3 
##        Aerys       Belwas        Bronn       Daario       Gendry        Gilly 
##            4            4            4            4            4            4 
##        Hodor         Irri        Jojen   Melisandre     Myrcella  Rattleshirt 
##            4            4            4            4            4            4 
##        Roose          Val      Ygritte        Grenn        Theon       Roslin 
##            4            4            4            4            4            4 
##      Pycelle        Drogo        Aemon      Craster        Davos       Lothar 
##            4            4            5            5            5            5 
##        Meera      Podrick         Shae       Tommen       Thoros         Elia 
##            5            5            5            5            5            5 
##       Qhorin        Balon        Beric        Janos        Jorah        Kevan 
##            5            6            6            6            6            6 
##      Rhaegar       Rickon    Barristan         Ilyn      Brienne        Meryn 
##            6            6            6            6            7            7 
##       Oberyn        Varys        Petyr     Margaery      Brynden       Edmure 
##            7            7            7            7            8            8 
##        Renly       Walder        Loras         Lysa       Eddard       Gregor 
##            8            8            9           10           12           12 
##        Mance       Sandor         Bran     Daenerys      Stannis      Samwell 
##           12           13           14           14           14           15 
##      Catelyn      Joffrey       Robert         Arya       Cersei        Tywin 
##           18           18           18           19           20           22 
##        Jaime         Robb          Jon        Sansa       Tyrion 
##           24           25           26           26           36

In directed graphs, there are three types of degree: indegree (incoming edges), outdegree (outgoing edges), and total degree. You can compute these using mode="in" or mode="out" or mode="total".

tail(sort(degree(g, mode="in")))
##  Tywin  Jaime   Robb    Jon  Sansa Tyrion 
##     22     24     25     26     26     36
tail(sort(degree(g, mode="out")))
##  Tywin  Jaime   Robb    Jon  Sansa Tyrion 
##     22     24     25     26     26     36

Here they will be identical because the network is undirected.

When edges have weights, if you want to compute weigthed degree, the correct function is strength:

sort(strength(g))
##      Cressen       Ramsay        Amory      Shireen        Doran         Karl 
##            4            4            5            5            5            6 
##        Orell      Rakharo       Lancel        Luwin        Aegon      Chataya 
##            6            7            7            8            8            9 
##       Walton      Illyrio      Kraznys      Ellaria    Jon Arryn      Rickard 
##           10           10           10           10           11           11 
##       Qyburn        Bowen       Olenna         Worm        Anguy    Salladhor 
##           11           11           12           14           15           16 
##        Roose     Myrcella          Nan Robert Arryn      Viserys         Mace 
##           17           18           18           19           19           20 
##        Dalla         Styr    Marillion      Eddison       Hoster      Pycelle 
##           21           23           23           24           24           24 
##        Jeyne      Alliser        Balon         Elia       Daario    Missandei 
##           28           29           29           29           30           30 
##       Tommen          Val       Roslin         Ilyn         Irri       Lothar 
##           31           31           32           32           33           34 
##        Drogo        Aerys        Janos        Theon      Rhaegar  Rattleshirt 
##           35           37           37           38           42           44 
##         Shae        Meryn        Varys        Kevan      Brynden        Renly 
##           45           47           49           50           55           55 
##        Bronn       Gendry       Qhorin       Thoros   Melisandre    Barristan 
##           59           59           59           60           62           63 
##      Podrick       Belwas        Gilly        Aemon        Beric      Craster 
##           64           67           69           74           75           75 
##        Loras       Oberyn       Rickon        Grenn      Ygritte        Davos 
##           76           76           81           81           82           87 
##       Walder        Jorah        Petyr     Margaery       Edmure       Eddard 
##           87           89           89           96           98          108 
##         Lysa       Gregor      Brienne        Jojen       Robert       Sandor 
##          108          117          122          125          128          137 
##        Meera      Stannis        Mance        Hodor      Catelyn        Tywin 
##          139          146          160          177          184          204 
##       Cersei     Daenerys      Joffrey         Arya      Samwell         Robb 
##          226          232          255          269          282          342 
##         Bran        Jaime        Sansa          Jon       Tyrion 
##          344          372          383          442          551

Closeness measures how many steps are required to access every other node from a given node. It’s a measure of how long information takes to arrive (who hears news first?), or how easily a node can reach other nodes. Higher values mean less centrality.

tail(sort(closeness(g, normalized=TRUE)))
##    Catelyn      Sansa       Arya    Stannis     Robert     Tyrion 
## 0.06579764 0.06708861 0.06799230 0.06821107 0.06978275 0.07061959

Betweenness measures brokerage or gatekeeping potential. It is (approximately) the number of shortest paths between nodes that pass through a particular node. It defines the importance of a node is in terms of how frequently it connects other nodes.

tail(sort(betweenness(g)))
##   Catelyn     Sansa   Stannis       Jon    Tyrion    Robert 
##  598.0000  683.5667  696.5167  921.6000 1163.7833 1166.1500

Network properties

Let’s now try to describe what a network looks like as a whole. An important measure is edge_density – the proportion of edges in the network over all possible edges that could exist.

edge_density(g)
## [1] 0.06207018

reciprocity measures the propensity of each edge to be a mutual edge; that is, the probability that if i is connected to j, j is also connected to i.

reciprocity(g)
## [1] 1

Why is it 1?

transitivity, also known as clustering coefficient, measures that probability that adjacent nodes of a network are connected. In other words, if i is connected to j, and j is connected to k, what is the probability that i is also connected to k?

transitivity(g)
## [1] 0.3286615