01. Preferential Attachment Model
Degree Distributions
The probability distribution of the degrees over the entire network.
G.degree()
In-Degree Distributions
G.in_degree()
Degree Distributions in Real Networks
Degree distribution look like a straight line when on a log-log scale.
Power law:
Preferential Attachment Model
Start with two nodes connected by an edge.
At each time step, add a new node with an edge connecting it to an existing node.
Choose the node to connect to at random with probability proportional to each node's degree.
The probability of connecting to a node u of degree
is
nx.barabasi_albert_graph(n, m)
# returns a network with n nodes. Each new node attches to m existing nodes according to the Preferential Attachment Model.
02. Small World Networks
Real social networks appear to have small shortest paths between nodes and high clustering coefficient.
The preferential attachment model produces networks with small shortest paths but very small clustering coefficient.
Small World Model
- Start with a ring of n nodes, where each node is connected to its k nearest neighbors.
- Fix a parameter
- Consider each edge (u, v). With probability p, select a node w at random and rewire the edge (u, v) so it becomes (u, w).
Regular Lattice (p=0): no edge is rewired.
Random Network (p=1): all edges are rewired.
Small World Network (0<p<1): Some edges are rewired. Network conserves some local structure but has some randomness.
As p increases:
- average shortest path decreases rapidly.
- average clustering coefficient decreases slowly.
For small values of p, small world networks have small average shortest path and high clustering coefficient, matching what we observe in real networks.
However, the degree distribution of small world networks is not a power law.
nx.watts_strogatz_graph(n, k, p)
# returns a small world network
Variants
Small world network can be disconnected,** which is sometimes undesirable.
nx.connected_watts_strogatz_graph(n, k, p, t)
# runs nx.watts_strogatz_graph(n, k, p) up to t times until it returns a connected small world network.
nx.newman_watts_strogatz_graph(n, k, p)
# runs a model simalar to the small world model, but rather than rewiring edges, new edges are added with probability p.
03. Link Prediction
Link prediction problem: Given a network, predict which edges will be formed in the future.
5 basic measures:
- Number of Common Neighbors
- Jaccard Coefficient
- Resource Allocation Index
- Adamic-Adar Index
- Preferential Attachment Score
2 measures that require community information:
- Common Neighbor Soundarajan-Hopcroft Score
- Resource Allocation Soundarajan-Hopcroft Score
Use these measures as features to train a classifier in order to make the prediction.
Measure 1: Common Neighbors
Triadic closure: the tendency for people who share connections in a social network to become connected.
The number of common neighbors of nodes X and Y is
where N(X) is the set of neighbors of node X.
# nx.non_edges(graph): Returns the non-existent edges in the graph.
common_neigh = [
(e[0], e[1], len(list(nx.common_neighbors(G, e[0], e[1]))))
for e in nx.non_edges(G)
]
Measure 2: Jaccard Coefficient
Number of common neighbors normalized by the total number of neighbors.
The Jaccard coefficient of nodes X and Y is
nx.jaccard_coefficient(G)
Measure 3: Resource Allocation
Fraction of a "resource" that a node can send to another through their common neighbors.
The Resource Allocation index of nodes X and Y is
nx.resource_allocation_index(G)
Measure 4: Adamic-Adar Index
Similar to resource allocation index, but with log in the denominator.
The Adamic-Adar index of nodes X and Y is
nx.adamic_adar_index(G)
Measure 5: Pref. Attachment
In the preferential attachment model, nodes with high degree get more neighbors.
Product of the nodes' degree
The preferential attachment score of nodes X and Y is
nx.preferential_attachment(G)
Community Structure
Assume the nodes in this network belong to different communities (sets of nodes).
Pairs of nodes who belong to the same community and have many common neighbors in their community are likely to form an edge.
Measure 6: Community Common Neighbors
Number of common neighbors with bonus for neighbors in same community.
The common Neighbor Soundarajan-Hopcroft score of nodes X and Y is:
# Assign nodes to communities with attribute node "community"
G.node['A']['community'] = 0
G.node['B']['community'] = 0
...
G.node['E']['community'] = 1
G.node['F']['community'] = 1
...
#
nx.cn_soundarajan_hopcroft(G)
Measure 7: Community Resource Allocation
Similar to resource allocation index, but only considering nodes in the same community.
The Resource Allocation Soundarajan-Hopcroft score of nodes X and Y is:
nx.ra_index_soundarajan_hopcroft(G)