What are the most important nodes in a network? What is the propensity of two nodes that are connected to be both connected to a third node? What are the different hidden communities in a network? These are some of the descriptive questions that we will adress now.
This time, we will analyze a network from a different book – ``A Storm of Words’’, the third book in the Song of Ice and Fire series by George R.R. Martin. The source of this dataset is this blog post. Each character in the book will be a different nodes. Each edge between two characters indicates their names appeared within 15 words of one another in the text of the book.
We’ll start with descriptive statistics at the node level. All of these are in some way measures of importance or centrality.
The most basic measure is degree, the number of adjacent edges to each node. It is often considered a measure of direct influence. In this network, it will be the total number of times each user co-appears with someone else.
sort(degree(g))
## Amory Shireen Walton Illyrio Karl
## 1 1 1 1 1
## Aegon Kraznys Rakharo Worm Cressen
## 1 1 1 1 1
## Salladhor Qyburn Orell Lancel Ramsay
## 1 1 1 1 1
## Doran Jon Arryn Luwin Missandei Rickard
## 1 2 2 2 2
## Anguy Nan Jeyne Bowen Styr
## 2 2 2 2 2
## Olenna Ellaria Chataya Alliser Eddison
## 2 2 2 3 3
## Hoster Robert Arryn Viserys Dalla Marillion
## 3 3 3 3 3
## Mace Aerys Belwas Bronn Daario
## 3 4 4 4 4
## Gendry Gilly Hodor Irri Jojen
## 4 4 4 4 4
## Melisandre Myrcella Rattleshirt Roose Val
## 4 4 4 4 4
## Ygritte Grenn Theon Roslin Pycelle
## 4 4 4 4 4
## Drogo Aemon Craster Davos Lothar
## 4 5 5 5 5
## Meera Podrick Shae Tommen Thoros
## 5 5 5 5 5
## Elia Qhorin Balon Beric Janos
## 5 5 6 6 6
## Jorah Kevan Rhaegar Rickon Barristan
## 6 6 6 6 6
## Ilyn Brienne Meryn Oberyn Varys
## 6 7 7 7 7
## Petyr Margaery Brynden Edmure Renly
## 7 7 8 8 8
## Walder Loras Lysa Eddard Gregor
## 8 9 10 12 12
## Mance Sandor Bran Daenerys Stannis
## 12 13 14 14 14
## Samwell Catelyn Joffrey Robert Arya
## 15 18 18 18 19
## Cersei Tywin Jaime Robb Jon
## 20 22 24 25 26
## Sansa Tyrion
## 26 36
In directed graphs, there are three types of degree: indegree (incoming edges), outdegree (outgoing edges), and total degree. You can compute these using mode="in"
or mode="out"
or mode="total"
.
tail(sort(degree(g, mode="in")))
## Tywin Jaime Robb Jon Sansa Tyrion
## 22 24 25 26 26 36
tail(sort(degree(g, mode="out")))
## Tywin Jaime Robb Jon Sansa Tyrion
## 22 24 25 26 26 36
Here they will be identical because the network is undirected.
Closeness measures how many steps are required to access every other node from a given node. It’s a measure of how long information takes to arrive (who hears news first?), or how easily a node can reach other nodes. Higher values mean less centrality.
head(sort(closeness(g, normalized=TRUE)))
## Illyrio Shireen Karl Cressen Salladhor Amory
## 0.2226891 0.2500000 0.2500000 0.2500000 0.2500000 0.2617284
Betweenness measures brokerage or gatekeeping potential. It is (approximately) the number of shortest paths between nodes that pass through a particular node. It defines the importance of a node is in terms of how frequently it connects other nodes.
tail(sort(betweenness(g)))
## Sansa Robb Daenerys Tyrion Robert Jon
## 705.1986 706.5573 874.8372 1101.3850 1165.6025 1279.7534
Let’s now try to describe what a network looks like as a whole. An important measure is edge_density
– the proportion of edges in the network over all possible edges that could exist.
edge_density(g)
## [1] 0.06207018
reciprocity
measures the propensity of each edge to be a mutual edge; that is, the probability that if i
is connected to j
, j
is also connected to i
.
reciprocity(g)
## [1] 1
Why is it 1?
transitivity
, also known as clustering coefficient, measures that probability that adjacent nodes of a network are connected. In other words, if i
is connected to j
, and j
is connected to k
, what is the probability that i
is also connected to k
?
transitivity(g)
## [1] 0.3286615
Networks often have different clusters or communities of nodes that are more densely connected to each other than to the rest of the network. The goal of community detection algorithms is to identify these subsets.
There are a few different algorithms, each following a different logic: walktrap algorith, infomap, fast and greedy, edge-betweenness, label propagation… Exploring each of these goes beyond the scope of this course, but you can read the documentation for more details.
My experience is that infomap tends to work better in most social science examples (websites, social media, classrooms, etc), so we’ll go along with it just to take a look at the output
comm <- cluster_infomap(g)
# let's look at some of the small ones first...
comm[[4]]
## [1] "Belwas" "Daario" "Daenerys" "Irri" "Jorah"
## [6] "Missandei" "Rhaegar" "Viserys" "Barristan" "Illyrio"
## [11] "Drogo" "Aegon" "Kraznys" "Rakharo" "Worm"
comm[[5]]
## [1] "Arya" "Beric" "Eddard" "Gendry" "Sandor" "Anguy" "Thoros"
comm[[6]]
## [1] "Bran" "Hodor" "Jojen" "Luwin" "Meera" "Rickon" "Nan" "Theon"
comm[[7]]
## [1] "Davos" "Melisandre" "Shireen" "Cressen" "Salladhor"
# now back to the large ones
comm[[1]]
## [1] "Aerys" "Amory" "Balon" "Bronn" "Cersei" "Gregor"
## [7] "Jaime" "Joffrey" "Kevan" "Loras" "Meryn" "Myrcella"
## [13] "Oberyn" "Podrick" "Renly" "Robert" "Sansa" "Shae"
## [19] "Stannis" "Tommen" "Tyrion" "Tywin" "Varys" "Walton"
## [25] "Elia" "Ilyn" "Pycelle" "Qyburn" "Margaery" "Lancel"
## [31] "Olenna" "Ellaria" "Mace" "Chataya" "Doran"
comm[[2]]
## [1] "Aemon" "Alliser" "Craster" "Eddison" "Gilly"
## [6] "Janos" "Jon" "Mance" "Rattleshirt" "Samwell"
## [11] "Val" "Ygritte" "Grenn" "Karl" "Bowen"
## [16] "Dalla" "Orell" "Qhorin" "Styr"
comm[[3]]
## [1] "Brienne" "Brynden" "Catelyn" "Edmure" "Hoster" "Lothar" "Rickard"
## [8] "Robb" "Roose" "Walder" "Jeyne" "Petyr" "Roslin" "Ramsay"
The final way in which we can think about network communities is in terms of hierarchy or structure. We’ll discuss one of these methods.
K-core decomposition allows us to identify the core and the periphery of the network. A k-core is a maximal subnet of a network such that all nodes have at least degree K.
coreness(g)
## Aemon Aerys Alliser Amory Arya
## 4 4 3 1 7
## Balon Belwas Beric Bran Brienne
## 6 3 4 6 6
## Bronn Brynden Catelyn Cersei Craster
## 4 5 7 7 4
## Daario Daenerys Davos Eddard Eddison
## 3 3 2 7 3
## Edmure Gendry Gilly Gregor Hodor
## 5 4 4 7 4
## Hoster Irri Jaime Janos Joffrey
## 3 3 7 4 7
## Jojen Jon Jon Arryn Jorah Kevan
## 4 6 2 3 5
## Loras Lothar Luwin Lysa Mance
## 6 4 2 6 4
## Meera Melisandre Meryn Missandei Myrcella
## 4 3 6 2 4
## Oberyn Podrick Rattleshirt Renly Rhaegar
## 4 4 4 6 3
## Rickard Rickon Robb Robert Robert Arryn
## 2 5 7 7 3
## Roose Samwell Sandor Sansa Shae
## 4 4 7 7 4
## Shireen Stannis Tommen Tyrion Tywin
## 1 7 4 7 7
## Val Varys Viserys Walder Walton
## 3 5 3 5 1
## Ygritte Grenn Anguy Thoros Barristan
## 4 3 2 4 3
## Illyrio Nan Theon Jeyne Petyr
## 1 2 4 2 6
## Roslin Elia Ilyn Pycelle Karl
## 4 4 6 4 1
## Drogo Aegon Kraznys Rakharo Worm
## 3 1 1 1 1
## Cressen Salladhor Qyburn Bowen Margaery
## 1 1 1 2 5
## Dalla Orell Qhorin Styr Lancel
## 3 1 4 2 1
## Olenna Marillion Ellaria Mace Ramsay
## 2 3 2 3 1
## Chataya Doran
## 2 1
table(coreness(g))
##
## 1 2 3 4 5 6 7
## 16 13 19 28 7 10 14
which(coreness(g)==7) # what is the core of the network?
## Arya Catelyn Cersei Eddard Gregor Jaime Joffrey Robb Robert
## 5 13 14 19 24 28 30 53 54
## Sandor Sansa Stannis Tyrion Tywin
## 58 59 62 64 65
# let's plot the core of the network...
core <- induced_subgraph(g, v=which(coreness(g)==7))
plot(core)