Preamble: Run the cells below to import the necessary Python packages
import numpy as np
# Wipe all outputs from this notebook
from IPython.display import Image, clear_output, display
clear_output(True)
# Import local plotting functions and in-notebook display functions
import matplotlib.pyplot as plt
%matplotlib inline
Revisiting the coauthorship graph¶
- Coauthorship among physicists based arXiv postings in
astro-ph
- The graph contains nodes, which correspond to unique authors observed over the period 1993 -- 2003
- If Author i and Author j coauthored a paper during that period, the nodes are connected
- In order to analyze this large graph, we will downsample it to a smaller graph with nodes representing the most highly-connected authors
- This dataset is from the Stanford SNAP database
import networkx as nx
## Load the full coauthorship network
fpath = "../resources/ca-AstroPh.txt.gz"
# fpath = "../resources/ca-CondMat.txt.gz"
g = nx.read_edgelist(fpath)
## Create a subgraph of the 1000 most well-connected authors
subgraph = sorted(g.degree, key=lambda x: x[1], reverse=True)[:4000]
subgraph = [x[0] for x in subgraph]
g2 = g.subgraph(subgraph)
# rename nodes to sequential integers as they would appear in an adjacency matrix
g2 = nx.convert_node_labels_to_integers(g2, first_label=0)
pos = nx.spring_layout(g2)
# pos = nx.kamada_kawai_layout(g2)
# nx.draw_spring(g2, pos=pos, node_size=10, node_color='black', edge_color='gray', width=0.5)
nx.draw(g2, pos=pos, node_size=5, node_color='black', edge_color='gray', width=0.5, alpha=0.5)
plt.show()


Estimating degree distributions, centrality, and the PageRank algorithm¶
We previously saw that the degree distribution determines how often a random walk on the graph will visit a given node. This can be seen an an indicator of how “important” a given node is to the structure of the graph. If we wanted to rank the authors purely based on their degree distribution, we could write this ranking as
where is a vector of ones, is our adjacency matrix, and is our desired ranking. Notice how the matrix product with the ones vector allows us to compute the rowwise sum. This makes sense, because we expect that this operation should take operations, and matrix-vector products are .
A = nx.adjacency_matrix(g2).todense() # get the binary adjacency matrix of the graph
rank_degree = A @ np.ones(A.shape[0])
plt.figure(figsize=(7, 7))
nx.draw(g2, pos=pos, node_size=200, node_color=np.log10(rank_degree), edge_color='gray', width=0.01, cmap='viridis', alpha=0.2)
plt.title(f"Authors ranked by degree")
plt.show()
plt.figure(figsize=(7, 2))
plt.hist(rank_degree)
plt.ylabel("Number of Authors")
plt.xlabel("Degree")


How else might we group authors?¶
The degree ranking simply tells use who the most well-connected authors are. They are the individuals who directly co-authored a paper with someone else at any point during the period over which the arXiv dataset was collected.
Recall that we can think of the degree distribution as the long-time limit of a random walker that “hops” among the neighboring nodes in a graph.
What if we add stochasticity? What if we allow the random walker to sometimes jump to any new node, not connected to the current node? This will cause the walker to explore the graph more broadly, and discover authors who are less connected to the dominant central cluster.
We will draw a loose analogy with the Ornstein-Uhlenbeck process. The spring constant pulls a random walker back to the origin, but the random noise allows the walker to explore the space more broadly. Likewise, the degree distribution pulls the random walker to the most connected nodes, but random jumps allow the walker to explore the graph more broadly.
This is the idea behind the PageRank algorithm, which is how Google originally ranked web pages. In Google’s case, the entire internet is too large to know a priori the adjacency matrix of all web pages. Instead, they use a random walk algorithm to estimate the degree distribution of the web pages. Random hops allow the algorithm to more easily find smaller subnetworks, so that it doesn’t keep revisiting dominant nodes like Wikipedia or Facebook.
Implementing the PageRank algorithm¶
We are now going to implement the PageRank algorithm on the coauthorship graph. The PageRank algorithm is a variant of the random walk algorithm that allows the random walker to sometimes jump to a new node with probability .
If we interpret our desired ranking as a probability distribution, then we can write a master equation describing the dynamics of the probability distribution of many random walkers.
where is a vector of ones, is the adjacency matrix of the graph, and is a parameter that controls the probability of the random surfer jumping to a random node. The first term on the right-hand side represents the process of the walker moving to a neighboring node, while the second term represents the process of jumping to a new node.
Solving this equation for the steady state distribution results in the PageRank equation
This equation can be rewritten as an equation for ,
where is the identity matrix. We write these equations as a single matrix equation
where . If we set (no random jumps), then this equation becomes
alpha = 0.85 # a typical damping factor used in PageRank
M = (1 - alpha) * (np.eye(A.shape[0]) - alpha * A)
print(M.shape)
plt.imshow(M[:100, :100])
(4000, 4000)

Inverting a matrix¶
We have taken for granted, so far, that we can invert a large matrix . Given a linear problem
with , , and , we can solve for by inverting :
How would we do this by hand?¶
Gauss-Jordan elimination: perform a series of operations on both the left and the right hand sides of our equation (changing both and ), until the left side of the equation is triangular in form. Because Gauss-Jordan elimination consists of a series of row-wise addition and subtraction operations, we can think of the process as writing each row vector in terms of some linear combination of row vectors, making the whole process expressible as a matrix
where we have defined the upper-triangular matrix . This confirms our intuition that the time complexity of Gauss-Jordan elimination is , since that’s the time complexity of multiplying two matrices.
Why do we want triangular matrices?¶
If we can reduce our matrix to upper-triangular form, then we can quickly solve for using forward substitution. This should take operations if

def solve_tril(a, b):
"""
Given a triangular matrix, solve using forward subtitution
Args:
a (np.ndarray): A lower triangular matrix
b (np.ndarray): A vector
Returns:
x (np.ndarray): The solution to the system
"""
#a = a.T # make it lower triangular for cleaner notation
n = a.shape[0]
x = np.zeros(n)
for i in range(n):
x[i] = (b[i] - np.dot(a[i, :i], x[:i])) / a[i, i]
return x
# A random lower triangular matrix
a = np.tril(np.random.random((10, 10)))
b = np.random.random(10)
print(np.linalg.solve(a, b))
print(solve_tril(a, b))
# Check that the numpy and our implementation give the same result
print(np.allclose(np.linalg.solve(a, b), solve_tril(a, b)))
# print(np.all(np.linalg.solve(a, b) == solve_tril(a, b)))
# np.sum(np.abs(np.linalg.solve(a, b) - solve_tril(a, b))) < 1e-14
[ 2.79707624 -2.18975392 53.23153183 -45.2687191 30.5757885
-655.20868552 537.77417227 188.26440703 -566.85239614 -94.10449712]
[ 2.79707624 -2.18975392 53.23153183 -45.2687191 30.5757885
-655.20868552 537.77417227 188.26440703 -566.85239614 -94.10449712]
True
import timeit
all_times1, all_times2 = list(), list()
nvals = np.arange(10, 500)
for n in nvals:
## Upper triangular solve
a = np.tril(np.random.random((n, n)))
b = np.random.random(n)
all_reps = [timeit.timeit("solve_tril(a, b)", globals=globals(), number=10) for _ in range(10)]
all_times1.append(np.mean(all_reps))
## Full solve
a = np.random.random((n, n))
b = np.random.random(n)
all_reps = [timeit.timeit("np.linalg.solve(a, b)", globals=globals(), number=10) for _ in range(10)]
all_times2.append(np.mean(all_reps))
plt.loglog(nvals, all_times1, label="Triangular solve")
plt.loglog(nvals, all_times2, label="Full solve")
plt.xlabel("Matrix size")
plt.ylabel("Time (s)")
plt.legend()
plt.show()

To invert a matrix, we just need to reach triangular form¶
We know that we want to modify our full matrix equation
using a Gauss-Jordan elimination procedure, which we know has the form of a matrix multiplication
We define , and we know that we want to reduce to upper-triangular form.
But what is the form of ?
Our key insight is that the matrix turns out to be a triangular matrix as well.¶
Moreover, because the inverse of a triangular matrix is also a triangular matrix (this can be proven by writing out the terms in the matrix multiplication algebra for ), we therefore write a transformed version of our problem
defining , we arrive at the celebrated LU matrix factorization
where is a lower triangular matrix, and is an upper triangular matrix.
We can therefore solve for by introducing an intermediate variable , and then
where .
If we can find the LU form, exactly solving the linear problem consists of the following steps:
- Solve the equation . Since is a triangular matrix, this can be performed using substitution
- Solve the equation . Since is also triangular, this is also fast
About LU factorization¶
Our iterative implementation uses Gauss-Jordan elimination in a principled order. We first eliminate the first column of the matrix, then the second column, and so on.
A more sophisticated algorithm called Crout’s method with pivoting, which performs optimal order of decomposition based on the specific values present in a given row or column
Decomposition into L and U is for an matrix
Given a matrix and target , what is the overall runtime to find ?
The LU decomposition algorithm¶
Set and (the input matrix)
For each column in :
- Choose the pivot element in the th column
- For each row below the pivot:
- Find the multiplier
- Subtract the th row from the th row in
- Store in the th row of
In the context of the LU decomposition algorithm, the ‘pivot’ refers to the diagonal element in the current column, denoted as , that we are processing in the outer loop. The pivot serves as a divisor to find the multipliers which are then used to eliminate the elements below the pivot, making them zero to carve out an upper triangular matrix .
Simultaneously, these multipliers are stored in the lower triangular matrix . Essentially, we are utilizing the pivot to find scalar values that can help perform row operations to systematically zero out elements below the diagonal in , while building up the matrix.
We can see that there are three nested loops, which means that the time complexity of the LU decomposition algorithm is .
class LinearSolver:
"""
Solve a linear matrix equation via LU decomposition (naive algorithm)
"""
def __init__(self):
# Run a small test upon construction
self.test_lu()
def lu(self, a):
"""Perform LU factorization of a matrix"""
n = a.shape[0]
L, U = np.identity(n), np.copy(a)
for i in range(n):
factor = U[i+1:, i] / U[i, i]
L[i + 1:, i] = factor
U[i + 1:] -= factor[:, None] * U[i]
return L, U
# A unit test of a single class method
def test_lu(self):
"""A small test method that the factorization is correct"""
X = np.random.random((10, 10))
L, U = self.lu(X)
assert np.allclose(X, L @ U), "LU decomposition failed"
def forward_substitution(self, L, b):
"""Solve a lower triangular matrix equality of the form Lx = b for x"""
n = L.shape[0]
y = np.zeros(n)
y[0] = b[0] / L[0, 0]
for i in range(1, n):
y[i] = (b[i] - np.dot(L[i,:i], y[:i])) / L[i,i]
return y
def backward_substitution(self, U, b):
"""Solve an upper triangular matrix equality of teh form Ux = b for x"""
n = U.shape[0]
y = np.zeros(n)
y[-1] = b[-1] / U[-1, -1]
for i in range(n-2, -1, -1):
y[i] = (b[i] - np.dot(U[i,i+1:], y[i+1:])) / U[i,i]
return y
def solve(self, X, b):
L, U = self.lu(X)
self.L, self.U = L, U
# Intermediate variable
h = self.forward_substitution(L, b)
return self.backward_substitution(U, h)
A = np.random.rand(4, 4)
b = np.random.rand(4)
model = LinearSolver()
print(model.solve(A, b))
# Using the numpy built-in solver
print(np.linalg.solve(A, b))
print(np.allclose(model.solve(A, b), np.linalg.solve(A, b) ))
[ 1.13327806 -1.41619997 0.5899517 0.65301225]
[ 1.13327806 -1.41619997 0.5899517 0.65301225]
True
Applying LU Decomposition of orthogonal basis sets
A Hadamard matrix is a square matrix comprising a complete collection of length mutually-orthogonal vectors with all elements equal to either +1 or -1
We can convert any complete orthonormal basis to a Hadamard matrix by ordering the vectors systematically, and substituting 0 and 1 for with a recursive rule
def hadamard(n):
"""
Create a Hadamard matrix of size n
"""
if n == 1:
return np.array([[1]])
else:
H = hadamard(n // 2)
return np.block([[H, H], [H, -H]])
plt.figure(figsize=(10, 10))
a = hadamard(2**8).astype(float)
plt.imshow(a, cmap='gray')
plt.axis('off')
(-0.5, 255.5, 255.5, -0.5)

ll, uu = model.lu(a)
plt.figure(figsize=(9, 4.5))
plt.subplot(121)
plt.imshow(ll, cmap='gray')
plt.axis('off')
plt.subplot(122)
plt.imshow(uu.astype(bool), cmap='gray')
plt.axis('off')
## show plots closer together
plt.subplots_adjust(wspace=0.1)

Applying PageRank to the coauthorship graph¶
We can now apply the PageRank algorithm to the coauthorship graph.
A = nx.adjacency_matrix(g2).todense() # get the binary adjacency matrix of the graph
def find_pagerank(A, alpha=0.85, verbose=False):
"""
Find the PageRank of a graph using matrix inversion
Args:
A (np.ndarray): The adjacency matrix of the graph
alpha (float): The damping factor. The default value is 0.85
Returns:
page_rank (np.ndarray): The PageRank of each node
"""
M = (1 - alpha) * (np.eye(A.shape[0]) - alpha * A)
if verbose:
print(f"Condition number of M: {np.linalg.cond(M)}", flush=True)
## use our LU solver to solve for the PageRank
# page_rank = np.linalg.inv(M) @ np.ones(M.shape[0])
l, u = LinearSolver().lu(M)
Minv = np.linalg.inv(u) @ np.linalg.inv(l)
page_rank = Minv @ np.ones(M.shape[0])
return page_rank
plt.figure(figsize=(8, 8))
nx.draw(g2, pos=pos, node_size=200, node_color=np.log10(rank_degree),
edge_color='gray', width=0.02, cmap='viridis', alpha=0.2)
plt.title(f"Degree Ranking")
plt.show()
page_rank1 = find_pagerank(A, alpha=0.05, verbose=True)
plt.figure(figsize=(8, 8))
nx.draw(g2, pos=pos, node_size=200, node_color=np.log10(page_rank1),
edge_color='gray', width=0.02, cmap='viridis', alpha=0.2)
plt.title(f"PageRank centrality with alpha=0.05")
plt.show()
page_rank2 = find_pagerank(A, alpha=0.99, verbose=True)
plt.figure(figsize=(8, 8))
nx.draw(g2, pos=pos, node_size=200, node_color=np.log10(page_rank2),
edge_color='gray', width=0.02, cmap='viridis', alpha=0.2)
plt.title(f"PageRank centrality with alpha=0.95")
plt.show()

Condition number of M: 3808.383285542044
/var/folders/xt/9wdl4pmx26gf_qytq8_d528c0000gq/T/ipykernel_30515/3448364456.py:36: RuntimeWarning: invalid value encountered in log10
nx.draw(g2, pos=pos, node_size=200, node_color=np.log10(page_rank1),

Condition number of M: 112286.22458188192
/var/folders/xt/9wdl4pmx26gf_qytq8_d528c0000gq/T/ipykernel_30515/3448364456.py:44: RuntimeWarning: invalid value encountered in log10
nx.draw(g2, pos=pos, node_size=200, node_color=np.log10(page_rank2),

We can see that high values of cause the PageRank algorithm to look more like the degree distribution. This emphasizes larger subnetworks and and more highly-connected nodes. Conversely, low values of cause the PageRank algorithm to converge to somether closer to a uniform distribution.