To familiarize ourselves with social network analysis before we turn to social media, we will looking at a network from the book ``A Storm of Words’’, the third book in the Song of Ice and Fire series by George R.R. Martin.
The source of this dataset is this blog post. Each character in the book will be a different nodes. Each edge between two characters indicates their names appeared within 15 words of one another in the text of the book.
The first step is to read the list of edges in this network:
## Source Target Weight
## 1 Aemon Grenn 5
## 2 Aemon Samwell 31
## 3 Aerys Jaime 18
## 4 Aerys Robert 6
## 5 Aerys Tyrion 5
## 6 Aerys Tywin 8
How do we convert this dataset into a network object in R? There are
multiple packages to work with networks, but the most popular is
igraph
because it’s very flexible and easy to do, and in my
experience it’s much faster and scales well to very large networks.
Other packages that you may want to explore are sna
and
networks
.
Now, how do we create the igraph object? We can use the
graph_from_data_frame
function:
g <- graph_from_data_frame(d=edges, directed=FALSE)
g
## IGRAPH d1dd47e UNW- 107 352 --
## + attr: name (v/c), weight (e/n)
## + edges from d1dd47e (vertex names):
## [1] Aemon --Grenn Aemon --Samwell Aerys --Jaime Aerys --Robert
## [5] Aerys --Tyrion Aerys --Tywin Alliser--Mance Amory --Oberyn
## [9] Arya --Anguy Arya --Beric Arya --Bran Arya --Brynden
## [13] Arya --Cersei Arya --Gendry Arya --Gregor Arya --Jaime
## [17] Arya --Joffrey Arya --Jon Arya --Rickon Arya --Robert
## [21] Arya --Roose Arya --Sandor Arya --Thoros Arya --Tyrion
## [25] Balon --Loras Belwas --Barristan Belwas --Illyrio Beric --Anguy
## [29] Beric --Gendry Beric --Thoros Bran --Hodor Bran --Jojen
## + ... omitted several edges
What does it mean? - U
means undirected
- N
means named graph
- 107
is the number of nodes
- 352
is the number of edges
- name (v/c)
means name is a node attribute and
it’s a character
- weight (e/n)
means weight is an edge attribute
and it’s numeric
This is how you access specific elements within the igraph object:
V(g) # nodes
## + 107/107 vertices, named, from d1dd47e:
## [1] Aemon Aerys Alliser Amory Arya
## [6] Balon Belwas Beric Bran Brienne
## [11] Bronn Brynden Catelyn Cersei Craster
## [16] Daario Daenerys Davos Eddard Eddison
## [21] Edmure Gendry Gilly Gregor Hodor
## [26] Hoster Irri Jaime Janos Joffrey
## [31] Jojen Jon Jon Arryn Jorah Kevan
## [36] Loras Lothar Luwin Lysa Mance
## [41] Meera Melisandre Meryn Missandei Myrcella
## [46] Oberyn Podrick Rattleshirt Renly Rhaegar
## + ... omitted several vertices
V(g)$name # names of each node
## [1] "Aemon" "Aerys" "Alliser" "Amory" "Arya"
## [6] "Balon" "Belwas" "Beric" "Bran" "Brienne"
## [11] "Bronn" "Brynden" "Catelyn" "Cersei" "Craster"
## [16] "Daario" "Daenerys" "Davos" "Eddard" "Eddison"
## [21] "Edmure" "Gendry" "Gilly" "Gregor" "Hodor"
## [26] "Hoster" "Irri" "Jaime" "Janos" "Joffrey"
## [31] "Jojen" "Jon" "Jon Arryn" "Jorah" "Kevan"
## [36] "Loras" "Lothar" "Luwin" "Lysa" "Mance"
## [41] "Meera" "Melisandre" "Meryn" "Missandei" "Myrcella"
## [46] "Oberyn" "Podrick" "Rattleshirt" "Renly" "Rhaegar"
## [51] "Rickard" "Rickon" "Robb" "Robert" "Robert Arryn"
## [56] "Roose" "Samwell" "Sandor" "Sansa" "Shae"
## [61] "Shireen" "Stannis" "Tommen" "Tyrion" "Tywin"
## [66] "Val" "Varys" "Viserys" "Walder" "Walton"
## [71] "Ygritte" "Grenn" "Anguy" "Thoros" "Barristan"
## [76] "Illyrio" "Nan" "Theon" "Jeyne" "Petyr"
## [81] "Roslin" "Elia" "Ilyn" "Pycelle" "Karl"
## [86] "Drogo" "Aegon" "Kraznys" "Rakharo" "Worm"
## [91] "Cressen" "Salladhor" "Qyburn" "Bowen" "Margaery"
## [96] "Dalla" "Orell" "Qhorin" "Styr" "Lancel"
## [101] "Olenna" "Marillion" "Ellaria" "Mace" "Ramsay"
## [106] "Chataya" "Doran"
E(g) # edges
## + 352/352 edges from d1dd47e (vertex names):
## [1] Aemon --Grenn Aemon --Samwell Aerys --Jaime Aerys --Robert
## [5] Aerys --Tyrion Aerys --Tywin Alliser--Mance Amory --Oberyn
## [9] Arya --Anguy Arya --Beric Arya --Bran Arya --Brynden
## [13] Arya --Cersei Arya --Gendry Arya --Gregor Arya --Jaime
## [17] Arya --Joffrey Arya --Jon Arya --Rickon Arya --Robert
## [21] Arya --Roose Arya --Sandor Arya --Thoros Arya --Tyrion
## [25] Balon --Loras Belwas --Barristan Belwas --Illyrio Beric --Anguy
## [29] Beric --Gendry Beric --Thoros Bran --Hodor Bran --Jojen
## [33] Bran --Jon Bran --Luwin Bran --Meera Bran --Nan
## [37] Bran --Rickon Bran --Samwell Bran --Theon Brienne--Loras
## + ... omitted several edges
g[1:10, 1:10] # adjacency matrix
## 10 x 10 sparse Matrix of class "dgCMatrix"
## [[ suppressing 10 column names 'Aemon', 'Aerys', 'Alliser' ... ]]
##
## Aemon . . . . . . . . . .
## Aerys . . . . . . . . . .
## Alliser . . . . . . . . . .
## Amory . . . . . . . . . .
## Arya . . . . . . . 23 9 .
## Balon . . . . . . . . . .
## Belwas . . . . . . . . . .
## Beric . . . . 23 . . . . .
## Bran . . . . 9 . . . . .
## Brienne . . . . . . . . . .
g[1,1:20] # first row of adjacency matrix
## Aemon Aerys Alliser Amory Arya Balon Belwas Beric
## 0 0 0 0 0 0 0 0
## Bran Brienne Bronn Brynden Catelyn Cersei Craster Daario
## 0 0 0 0 0 0 0 0
## Daenerys Davos Eddard Eddison
## 0 0 0 0
What are the most important nodes in a network? We can answer this question computing a metric of centrality.
The most basic measure is degree, the number of adjacent edges to each node. It is often considered a measure of direct influence. In this network, it will be the unique number of times each user co-appears with someone else. For example, Tyrion co-appears at least once with 36 other characters.
sort(degree(g))
## Amory Shireen Walton Illyrio Karl Aegon
## 1 1 1 1 1 1
## Kraznys Rakharo Worm Cressen Salladhor Qyburn
## 1 1 1 1 1 1
## Orell Lancel Ramsay Doran Jon Arryn Luwin
## 1 1 1 1 2 2
## Missandei Rickard Anguy Nan Jeyne Bowen
## 2 2 2 2 2 2
## Styr Olenna Ellaria Chataya Alliser Eddison
## 2 2 2 2 3 3
## Hoster Robert Arryn Viserys Dalla Marillion Mace
## 3 3 3 3 3 3
## Aerys Belwas Bronn Daario Gendry Gilly
## 4 4 4 4 4 4
## Hodor Irri Jojen Melisandre Myrcella Rattleshirt
## 4 4 4 4 4 4
## Roose Val Ygritte Grenn Theon Roslin
## 4 4 4 4 4 4
## Pycelle Drogo Aemon Craster Davos Lothar
## 4 4 5 5 5 5
## Meera Podrick Shae Tommen Thoros Elia
## 5 5 5 5 5 5
## Qhorin Balon Beric Janos Jorah Kevan
## 5 6 6 6 6 6
## Rhaegar Rickon Barristan Ilyn Brienne Meryn
## 6 6 6 6 7 7
## Oberyn Varys Petyr Margaery Brynden Edmure
## 7 7 7 7 8 8
## Renly Walder Loras Lysa Eddard Gregor
## 8 8 9 10 12 12
## Mance Sandor Bran Daenerys Stannis Samwell
## 12 13 14 14 14 15
## Catelyn Joffrey Robert Arya Cersei Tywin
## 18 18 18 19 20 22
## Jaime Robb Jon Sansa Tyrion
## 24 25 26 26 36
In directed graphs, there are three types of degree: indegree
(incoming edges), outdegree (outgoing edges), and total degree. You can
compute these using mode="in"
or mode="out"
or
mode="total"
.
tail(sort(degree(g, mode="in")))
## Tywin Jaime Robb Jon Sansa Tyrion
## 22 24 25 26 26 36
tail(sort(degree(g, mode="out")))
## Tywin Jaime Robb Jon Sansa Tyrion
## 22 24 25 26 26 36
Here they will be identical because the network is undirected.
When edges have weights, if you want to compute weigthed degree, the
correct function is strength
:
sort(strength(g))
## Cressen Ramsay Amory Shireen Doran Karl
## 4 4 5 5 5 6
## Orell Rakharo Lancel Luwin Aegon Chataya
## 6 7 7 8 8 9
## Walton Illyrio Kraznys Ellaria Jon Arryn Rickard
## 10 10 10 10 11 11
## Qyburn Bowen Olenna Worm Anguy Salladhor
## 11 11 12 14 15 16
## Roose Myrcella Nan Robert Arryn Viserys Mace
## 17 18 18 19 19 20
## Dalla Styr Marillion Eddison Hoster Pycelle
## 21 23 23 24 24 24
## Jeyne Alliser Balon Elia Daario Missandei
## 28 29 29 29 30 30
## Tommen Val Roslin Ilyn Irri Lothar
## 31 31 32 32 33 34
## Drogo Aerys Janos Theon Rhaegar Rattleshirt
## 35 37 37 38 42 44
## Shae Meryn Varys Kevan Brynden Renly
## 45 47 49 50 55 55
## Bronn Gendry Qhorin Thoros Melisandre Barristan
## 59 59 59 60 62 63
## Podrick Belwas Gilly Aemon Beric Craster
## 64 67 69 74 75 75
## Loras Oberyn Rickon Grenn Ygritte Davos
## 76 76 81 81 82 87
## Walder Jorah Petyr Margaery Edmure Eddard
## 87 89 89 96 98 108
## Lysa Gregor Brienne Jojen Robert Sandor
## 108 117 122 125 128 137
## Meera Stannis Mance Hodor Catelyn Tywin
## 139 146 160 177 184 204
## Cersei Daenerys Joffrey Arya Samwell Robb
## 226 232 255 269 282 342
## Bran Jaime Sansa Jon Tyrion
## 344 372 383 442 551
Closeness measures how many steps are required to access every other node from a given node. It’s a measure of how long information takes to arrive (who hears news first?), or how easily a node can reach other nodes. Higher values mean less centrality.
tail(sort(closeness(g, normalized=TRUE)))
## Catelyn Sansa Arya Stannis Robert Tyrion
## 0.06579764 0.06708861 0.06799230 0.06821107 0.06978275 0.07061959
Betweenness measures brokerage or gatekeeping potential. It is (approximately) the number of shortest paths between nodes that pass through a particular node. It defines the importance of a node is in terms of how frequently it connects other nodes.
tail(sort(betweenness(g)))
## Catelyn Sansa Stannis Jon Tyrion Robert
## 598.0000 683.5667 696.5167 921.6000 1163.7833 1166.1500
Let’s now try to describe what a network looks like as a whole. An
important measure is edge_density
– the proportion of edges
in the network over all possible edges that could exist.
edge_density(g)
## [1] 0.06207018
reciprocity
measures the propensity of each edge to be a
mutual edge; that is, the probability that if i
is
connected to j
, j
is also connected to
i
.
reciprocity(g)
## [1] 1
Why is it 1?
transitivity
, also known as clustering coefficient,
measures that probability that adjacent nodes of a network are
connected. In other words, if i
is connected to
j
, and j
is connected to k
, what
is the probability that i
is also connected to
k
?
transitivity(g)
## [1] 0.3286615