Preamble: Run the cells below to import the necessary Python packages
This notebook created by William Gilpin. Consult the course website for all content and GitHub repository for raw files and runnable online code.
import numpy as np
from IPython.display import Image, display
import matplotlib.pyplot as plt
%matplotlib inline
Motivation: eigenvector alignment and dimensionality¶
np.random.seed(0) # ensures same every time
A = np.random.random((10, 10))
A = A.T + A # Hermitian matrix
b = np.random.random((10,))
n = 2
A, b = A[:n, :n], b[:n]
eig_vals, eig_vecs = np.linalg.eigh(A)
eig_vec_leading = eig_vecs[np.argmax(np.abs(eig_vals))]
plt.plot([0, eig_vec_leading[0]], [0, eig_vec_leading[1]], 'r-')
y = A @ b
y /= np.linalg.norm(y)
plt.plot([0, y[0]], [0, y[1]], 'b-', label='A @ b')
plt.title('Leading eigenvector and first iterate A b')

Krylov subspace methods¶
- We’ve seen several iterative algorithms that involve repeatedly multiplying a matrix by a vector (QR algorithm, power method, etc)
- Our choice of target vector and our operator jointly define a the Krylov subspace of the pair
It’s called a subspace because it is often a subspace of the column span of that depends on . We can choose any value of up to a maximum (the full column space)
Properties of the Krylov subspace¶
- The matrix performs a shift on the Krylov subspace:
- If , then the Krylov subspace is rank-deficient (there are linearly dependent matrices)
- If is full-rank matrix, then is also a full-rank matrix
Why use Krylov subspace methods?¶
The main reason: we can work only with matrix-vector products, and not directly with A, which could be a very large matrix.
Suppose a linear equation has a residual vector:
For example, if we are performing partial linear regression, then the residual denotes our error terms
Different Krylov subspace methods boil down to choosing a vector , and thus subspace , for which the residual satisfies certain desiderata:
- Minimize . Iterative least-squares via GMRES, MINRES
- Minimize (find orthogonal residual). Conjugate gradient method
where is shorthand for . In these iterative algorithms, we seek to minimize the desired cost function at each step , corresponding to a single power iteration of the matrix.
Arnoldi iteration¶
- An eigenvalue calculation algorithm
- Similar to QR factorization, but allows us to stop after the first terms of the factorization of the full matrix
- The full factorization of this matrix would have the form
- But if we want to just compute the first eigenvalues of a matrix, we want a lower-rank approximation of
- The partial matrix has the property
where is the first and elements of the full Hessenberg matrix .
Arnoldi iteration algorithm¶
- Choose a random starting vector of length and normalize it to have unit length
- For :
- Compute .
- For :
- Compute
- Compute
- Compute
- Compute
- Return and
What is going on here?¶
This is almost the same algorithm as modified Gram-Schmidt, which we saw before
The difference is that we are applying the matrix as an operator to in each iteration, and thus switching to a new column vector in our Krylov subspace
Our output is the reduced matrix
Each iteration finds a new basis vector for the corresponding rank- Krylov subspace
After stopping at our of interest, we can find the reduced Hessenberg matrix , where . Can solve for eigenvalues of this reduced matrix
class ArnoldiIteration:
"""
Perform the Arnoldi iteration on a matrix A and a vector b.
"""
def __init__(self, k):
pass
def arnoldi_iteration(self, A, b, k):
n = A.shape[0]
Q, H = np.zeros((n, k + 1)), np.zeros((k + 1, k))
Q[:, 0] = b / np.linalg.norm(b)
for j in range(k):
v = A @ Q[:, j] # the main difference between the QR and Arnoldi iteration
for i in range(j + 1):
H[i, j] = Q[:, i] @ v
v -= H[i, j] * Q[:, i]
H[j + 1, j] = np.linalg.norm(v)
Q[:, j + 1] = v / H[j + 1, j]
return Q, H
def eig_k(self, A, k):
"""
Compute the k largest eigenvalues of A.
"""
Q, H = self.arnoldi_iteration(A, np.random.rand(A.shape[0]), k)
return np.linalg.eigvals(H[:-1])
a_big = np.random.random((3000, 3000))
a_big = a_big + a_big.T
eig_topk_np = np.sort(np.linalg.eigvals(a_big))[::-1][:10]
model = ArnoldiIteration(10)
eig_topk_arnoldi = model.eig_k(a_big, 10)
plt.plot(eig_topk_np)
plt.plot(eig_topk_arnoldi)

Conjugate gradient methods¶
Iterative method of solving the linear problem: Find such that
An iterative method. We saw that iterative methods are useful for eigenvalue problems, and so it’s useful to have a similar approach for linear solving.
The basic version only holds for symmetric, positive-definite matrices (like the Hessians of potential functions)
Dynamical systems view: the solution is the fixed point of the dynamics of an iterative algorithm, and we want to approach it as fast as possible.
The CG iterate is the unique member of which minimizes the distance to the “true” solution.
Why not use Gauss-Jordan, or LU factorization, which give us exact answers after finite steps?
- If an iterative algorithm converges faster, then we can save time
Conjugate gradient algorithm¶
- Choose a random starting vector of length and set the initial residual
- For :
- Compute
- Compute
- Compute
- Return
We can stabilize the algorithm by including a momentum term in the update rule for . This is called the Fletcher-Reeves method.
Conjugate gradient algorithm (modified)¶
- Choose a random starting vector of length and set the initial residual
- For :
- Compute
- Compute
- Compute
- Compute
- Compute
- Return
What’s going on here?¶
- is the vector residual between times our current guess and the target at the step
- is orthogonal to (last step looks like a Gram-Schmidt update). It tells us which direction to step in
- is a learning rate for that encourages larger steps when the residual is large, but smaller steps when the direction of the orthogonal vector is changing quickly
- is a learning rate for that encourages larger steps when the residual is large, but smaller steps when the residual is changing quickly (avoids oscillations)
- At each step, we update to be orthogonal to . is orthogonal not only to , but also all previous , .
class ConjugateGradientMethod:
"""
Solve a linear system Ax = b using the conjugate gradient method. A must be a
symmetric, positive definite matrix.
Parameters
max_iter (int): Maximum number of iterations to perform.
store_history (bool): If True, store the history of the solution at each
iteration.
"""
def __init__(self, max_iter=1000, store_history=False):
self.max_iter = max_iter
self.store_history = store_history
def solve(self, A, b):
if self.store_history:
self.history = list()
x = np.zeros_like(b)
r = b
p = r
for i in range(self.max_iter):
alpha = (r.T @ r) / (p.T @ A @ p)
x = x + alpha * p
r = r - alpha * A @ p
# if np.linalg.norm(r) < 1e-12:
# break
beta = (r.T @ r) / (b.T @ b)
p = r + beta * p
if self.store_history:
self.history.append(x.copy())
return x
def preconditioner(self, A):
"""
Return a Jacobi preconditioner for the matrix A.
"""
return np.diag(1 / np.diag(A))
def solve_with_preconditioner(self, A, b):
if self.store_history:
self.history = list()
x = np.zeros_like(b)
r = b
M = self.preconditioner(A)
z = M @ r # M is the preconditioner
p = r
for i in range(self.max_iter):
alpha = (r.T @ z) / (p.T @ A @ p)
x = x + alpha * p
r = r - alpha * A @ p
z = M @ r
beta = (r.T @ z) / (b.T @ b)
p = z + beta * p
if self.store_history:
self.history.append(x.copy())
return x
np.random.seed(0)
a = np.random.rand(10, 10)
a = a.T @ a # A*A^T is always positive definite
b = np.random.rand(10, 1)
model = ConjugateGradientMethod(max_iter=2000, store_history=True)
sol_cg = model.solve(a, b).squeeze()
sol_np = np.linalg.solve(a, b).squeeze()
plt.figure(figsize=(6, 6))
plt.plot(sol_np, sol_cg, '.k', markersize=10)
plt.xlabel("Numpy solution")
plt.ylabel("Conjugate gradient solution")
plt.figure(figsize=(6, 6))
all_norms = np.linalg.norm(np.array(model.history).squeeze() - sol_np, axis=1)
plt.semilogy(all_norms, 'k', markersize=10)
plt.xlabel("Iteration")
plt.ylabel("Residual norm")


from matplotlib.animation import FuncAnimation
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import HTML
# Assuming `data` is a 2D numpy array with shape (num_frames, num_data_points)
# For demonstration, generating random data
data = model.history[::3][:200]
fig, ax = plt.subplots()
# Initialize the plot with the first frame's data
plt.plot(sol_np, model.history[0], '.k', markersize=20);
plt.xlabel("Numpy solution")
plt.ylabel("Conjugate gradient solution")
def update(frame):
ax.clear()
plt.plot(sol_np, frame, '.k', markersize=20);
plt.xlabel("Numpy solution")
plt.ylabel("Conjugate gradient solution")
ani = FuncAnimation(fig, update, frames=data, interval=40)
HTML(ani.to_jshtml())
Appendix and other functions¶
## Make an interactive video
from ipywidgets import interact, interactive, fixed, interact_manual, Layout
import ipywidgets as widgets
def plotter(i):
plt.figure(figsize=(6, 6))
plt.plot(sol_np, model.history[i], '.k', markersize=20);
plt.xlabel("Numpy solution")
plt.ylabel("Conjugate gradient solution")
plt.show()
interact(
plotter,
i=widgets.IntSlider(0, 0, len(model.history) - 1, 1, layout=Layout(width='800px'))
)
class Householder:
def __init__(self, A):
self.A = A
self.Q = np.zeros_like(A)
self.R = np.zeros_like(A)
self._decompose()
def _decompose(self):
for i in range(self.A.shape[1]):
self.Q[:, i] = self.A[:, i]
for j in range(i):
self.R[j, i] = np.dot(self.Q[:, j], self.A[:, i])
self.Q[:, i] -= self.R[j, i] * self.Q[:, j]
self.R[i, i] = np.linalg.norm(self.Q[:, i])
self.Q[:, i] /= self.R[i, i]
def solve(self, b):
y = np.dot(self.Q.T, b)
return np.linalg.solve(self.R, y)
def inverse(self):
return np.linalg.solve(self.R.T, np.linalg.solve(self.R, np.eye(self.A.shape[0])))
def __str__(self):
return "Q:\n{}\nR:\n{}".format(self.Q, self.R)
def __repr__(self):
return self.__str__()
class Givens:
def __init__(self, A):
self.A = A
self.Q = np.zeros_like(A)
self.R = np.zeros_like(A)
self._decompose()
def _decompose(self):
for i in range(self.A.shape[1]):
self.Q[:, i] = self.A[:, i]
for j in range(i):
self.R[j, i] = np.dot(self.Q[:, j], self.A[:, i])
self.Q[:, i] -= self.R[j, i] * self.Q[:, j]
self.R[i, i] = np.linalg.norm(self.Q[:, i])
self.Q[:, i] /= self.R[i, i]
def solve(self, b):
y = np.dot(self.Q.T, b)
return np.linalg.solve(self.R, y)
def inverse(self):
return np.linalg.solve(self.R.T, np.linalg.solve(self.R, np.eye(self.A.shape[0])))
def __str__(self):
return "Q:\n{}\nR:\n{}".format(self.Q, self.R)
def __repr__(self):
return self.__str__()
class QR:
def __init__(self, A):
self.A = A
self.Q = np.zeros_like(A)
self.R = np.zeros_like(A)
self._decompose()
def _decompose(self):
for i in range(self.A.shape[1]):
self.Q[:, i] = self.A[:, i]
for j in range(i):
self.R[j, i] = np.dot(self.Q[:, j], self.A[:, i])
self.Q[:, i] -= self.R[j, i] * self.Q[:, j]
self.R[i, i] = np.linalg.norm(self.Q[:, i])
self.Q[:, i] /= self.R[i, i]
def solve(self, b):
y = np.dot(self.Q.T, b)
return np.linalg.solve(self.R, y)
def inverse(self):
return np.linalg.solve(self.R.T, np.linalg.solve(self.R, np.eye(self.A.shape[0])))
class Cholesky:
def __init__(self, A):
self.A = A
self.L = np.zeros_like(A)
self._decompose()
def _decompose(self):
for i in range(self.A.shape[0]):
for j in range(i):
self.L[i, j] = (self.A[i, j] - np.dot(self.L[i, :j], self.L[j, :j])) / self.L[j, j]
self.L[i, i] = np.sqrt(self.A[i, i] - np.dot(self.L[i, :i], self.L[i, :i]))
def solve(self, b):
y = np.linalg.solve(self.L, b)
return np.linalg.solve(self.L.T, y)
def inverse(self):
return np.linalg.solve(self.L.T, np.linalg.solve(self.L, np.eye(self.A.shape[0])))
def __str__(self):
return "L:\n{}\nL.T:\n{}".format(self.L, self.L.T)
def __repr__(self):
return self.__str__()
def test():
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 10]])
b = np.array([3, 3, 4])
print("A:\n{}\nb:\n{}".format(A, b))
print("QR decomposition")
qr = QR(A)
print(qr)
print("QR solution")
print(qr.solve(b))
print("QR inverse")
print(qr.inverse())
print("Cholesky decomposition")
cholesky = Cholesky(A)
print(cholesky)
print("Cholesky solution")
print(cholesky.solve(b))
print("Cholesky inverse")
print(cholesky.inverse())