01. Degree and Closeness Centrality

Node Importance

Different ways of thinking about "importance".

Degree: number of friends
Average proximity to other nodes.
Fraction of shortest paths that pass through node.

Network Centrality

Centrality measures identify the most important nodes in a network.

Centrality Measures

Degree centrality
Closeness centrality
Betweenness centrality
Load centrality
Page Rank
Katz centrality
Percolation centrality

Degree Centrality

Assumption: important nodes have many connections.

The most basic measure of centrality: number of neighbors.

Undirected Networks

\begin{aligned} C_{d e g} (v) = \frac{d_{v}}{| N | - 1} \\ N = the set of nodes in the network \\ d_{v} = the degree of node v \end{aligned}

, where N is the set of nodes in the network and dv is the degree of node v.

python

nx.degree_centrality(G)

Directed Networks

\begin{aligned} C_{i n d e g} (v) = \frac{d_{v}^{i n}}{| N | - 1} \\ N = the set of nodes in the network \\ d_{v}^{i n} = the in-degree of node v \end{aligned}

python

nx.in_degree_centrality(G)

\begin{aligned} C_{o u t d e g} (v) = \frac{d_{v}^{o u t}}{| N | - 1} \\ N = the set of nodes in the network \\ d_{v}^{o u t} = the out-degree of node v \end{aligned}

python

nx.out_degree_centrality(G)

Closeness Centrality

Assumption: important nodes are close to other nodes.

\begin{aligned} C_{c l o s e} (v) = \frac{| N | - 1}{\sum_{u \in N ∖ {v}} d (v, u)} \\ N = the set of nodes in the network \\ d (v, u) = length of shortest path from v to u \end{aligned}

python

nx.closeness_centrality(G)

Disconnected Nodes

If a node can't reach all other nodes:

Option 1: Consider only nodes that L can reach

\begin{aligned} C_{c l o s e} (L) = \frac{| R (L) |}{\sum_{u \in R (L)} d (L, u)} \\ R (L) = the set fo nodes L can reach \end{aligned}

Option 2: Consider only nodes that L can reach and normalize by the fraction of nodes L can reach:

\begin{aligned} C_{c l o s e} (L) = [\frac{| R (L) |}{| N - 1 |}] \frac{| R (L) |}{\sum_{u \in R (L)} d (L, u)} \end{aligned}

python

nx.closeness_centrality(G, normalized=False)
nx.closeness_centrality(G, normalized=True) # default: normalized=True

02. Betweenness Centrality

Assumption

Important nodes connect other nodes.

\begin{aligned} C_{b t w} (v) = \sum_{s, t \in N} \frac{σ_{s, t} (v)}{σ_{s, t}} \\ σ_{s, t} = the number of shortest paths between nodes s and t \\ σ_{s, t} (v) = the number of shortest paths between nodes s and t that pass through node v \end{aligned}

Endpoints: we can either include or exclude node v as node s and t in the computation of $C_{b t w}$ .

include node v
exclude node v

Disconnected Nodes: we only consider node s, t such that there is at least one path between them.

Normalization

Betweenness centrality values will be larger in graphs with many nodes.

To control for this, we divide centrality values by the number of pairs of nodes in the graph (excluding v):

in undirected graphs
$\frac{1}{2} (| N | - 1) (| N | - 2)$
in directed graphs
Between two nodes, them can probably have two differently directed path.
$(| N | - 1) (| N | - 2)$

python

# default: normalized=True, endpoints=False
nx.betweenness_centrality(G, normalized=True, endpoints=False)

Approximation

Complexity: Depending on the algorithm, this computation can take up to $O (| N |^{3})$ time.

Approximation: rather than computing betweenness centrality based on all pairs of nodes s, t, we can approximate it based on a sample of nodes.

python

nx.betweenness_centrality(G, normalized=True, endpoints=False, k=10)

Subsets

To find out what are the most important nodes in this network between a group of source nodes and a group of target nodes.

python

nx.betweenness_centrality_subset(G, [34, 33, 21], [1, 4, 13, 11])

Edge Betweenness Centrality

To find important edges instead of nodes:

\begin{aligned} C_{b t w} (e) = \sum_{s, t \in N} \frac{σ_{s, t} (e)}{σ_{s, t}} \\ σ_{s, t} = the number of shortest paths between nodes s and t \\ σ_{s, t} (e) = the number of shortest paths between nodes s and t that pass through edge e \end{aligned}

python

nx.edge_betweenness_centrality(G, normalized=True)
nx.edge_betweenness_centrality_subset(G, [34, 33, 21], [1, 4, 13, 11], normalized=True)

03. Basic Page Rank

Developed by Google founders to measure the importance of webpages from the hyperlink network structure.

PageRank assigns a score of importance to each node. Important nodes are those with many in-link from important pages.

PageRank is mainly useful for directed networks.

Steps of Basic PageRank

n = number of nodes in the network

k = number of steps

All nodes start with PageRank of 1/n.
Perform the Basic PageRank Update Rule k times.

Basic PageRank Update Rule

Each node gives an equal share of its current PageRank to all the nodes it links to.

The new PageRank of each node is the sum of all the PageRank it received from other nodes.

For most networks PageRank Values converge as k get larger (k -> ∞)

04. Scaled Page Rank

Interpreting PageRank

The PageRank of a node at step k is the probability that a random walker lands on the node after taking k steps.

Random walk of k steps:

Start on a random node.
Then choose an outgoing edge at random and follow it to the next node. Repeat k times.

PageRank Problem

In some networks, a few nodes can "stuck up" all the PageRank from the network.

04-001

Random walk of k steps with damping parameter α:

Start on a random node.
Then:
- With probability α: choose an outgoing edge at random and follow it to the next node.
- With probability 1 - α: choose a node at random and go to it.
Repeat k times.

Scaled PageRank

The Scaled PageRank of k steps and damping factor α of a node n is the probability that a random walk with damping factor α lands on a n after k steps.

For most networks, as k gets larger, Scaled PageRank converges to a unique value, which depends on α.

Typically we use α between 0.8 and 0.9.

Damping factor works better in very large networks like the web or large social networks.

python

nx.pagerank(G, alpha=0.8)

01. Degree and Closeness Centrality ​

Network Centrality ​

Centrality Measures ​

Degree Centrality ​

Undirected Networks ​

Directed Networks ​

Closeness Centrality ​

Disconnected Nodes ​

02. Betweenness Centrality ​

Assumption ​

Normalization ​

Approximation ​

Subsets ​

Edge Betweenness Centrality ​

03. Basic Page Rank ​

Steps of Basic PageRank ​

Basic PageRank Update Rule ​

04. Scaled Page Rank ​

Interpreting PageRank ​

PageRank Problem ​

Scaled PageRank ​

01. Degree and Closeness Centrality

Network Centrality

Centrality Measures

Degree Centrality

Undirected Networks

Directed Networks

Closeness Centrality

Disconnected Nodes

02. Betweenness Centrality

Assumption

Normalization

Approximation

Subsets

Edge Betweenness Centrality

03. Basic Page Rank

Steps of Basic PageRank

Basic PageRank Update Rule

04. Scaled Page Rank

Interpreting PageRank

PageRank Problem

Scaled PageRank