import numpy as np
# Wipe all outputs from this notebook
from IPython.display import Image, clear_output, display
clear_output(True)
# Import local plotting functions and in-notebook display functions
import matplotlib.pyplot as plt
%matplotlib inline
Eigensystems as dominant “stretching” directions¶
Spectral theorem: If a matrix is Hermitian (including real-symmetric), then: (1) The eigenvectors are an orthogonal basis, and (2) the eigenvalues are real
Consistent with our intuition from the dynamical systems view: A “blob” of initial conditions gets anisotropically stretched at different rates along different directions.
Eigenvalues are invariant to rotations (ie, multiplication by an orthonormal matrix)
Why are Hermitian matrices so important?¶
Many physical systems arise from the minimization of a scalar potential, such as a free energy or loss function. The dynamics of such systems are described by the gradient of the potential,
The Jacobian matrix of partial derivatives determines the local geometry and concavity of this scalar field, and thus stability and acceleration along various directions
Dynamical systems (continuous or discrete ) always have a Jacobian, but they do not necessarily have a corresponding scalar potential, which requires that the Jacobian be Hermitian.
The loss function in machine learning. Often in machine learning, we need to compare to datapoints, or to compare a high-dimensional model prediction to some ground truth value. If we have two vectors , a loss function or distance function assigns a scalar value to the pair,
The gradient of the loss function with respect to vector describes how much the loss function changes as we make small changes to the prediction along each dimension.
For example, we often seek to minimize the mean squared error or Euclidean distance between a vector of predictions and a vector of ground truth values ,
The gradient of the loss function with respect to is then the signed element-wise difference between the two vectors,
def random_points_disk(n, r=1):
"""Generate n random points within a disk of radius r"""
theta = np.random.uniform(0, 2 * np.pi, n)
r = np.sqrt(np.random.uniform(0, r**2, n)) # notice the sqrt
x, y = r * np.cos(theta), r * np.sin(theta)
return np.vstack((x, y)).T
pts = np.array(random_points_disk(1000))
plt.figure(figsize=(5, 5))
plt.scatter(pts[:,0], pts[:,1])
plt.axis('equal')
plt.title("Points before linear transformation")
# Define a random linear transformation
np.random.seed(0)
A = np.random.uniform(-1, 1, size=(2, 2)) # random 2x2 matrix with entries in [-1, 1]
A = A + A.T # symmetrize matrix to make it Hermitian
# Apply the transformation
pts = pts @ A
plt.figure(figsize=(5,5))
plt.scatter(pts[:,0], pts[:,1])
plt.axis('equal')
plt.title("Points after linear transformation - iteration 1")
# Apply the transformation again
pts = pts @ A
plt.figure(figsize=(5,5))
plt.scatter(pts[:,0], pts[:,1])
plt.axis('equal')
plt.title("Points after linear transformation - iteration 2")



Now let’s repeat that iteration many times¶
pts = np.array(random_points_disk(1000))
all_pts = [np.copy(pts)]
for _ in range(10):
pts = pts @ A
all_pts.append(np.copy(pts))
eig_vals, eig_vecs = np.linalg.eig(A)
total_scale = eig_vals[0]**10
plt.figure(figsize=(5,5))
plt.plot(pts[:, 0] / total_scale, pts[:, 1] / total_scale, '.')
plt.axis('equal')
eig_vec1 = eig_vecs[:, 0]
eig_vec2 = eig_vecs[:, 1]
plt.plot([0, eig_vec1[0]], [0, eig_vec1[1]], 'r', lw=2, label="Eigenvector 1")
plt.plot([0, eig_vec2[0]], [0, eig_vec2[1]], 'r', lw=2, label="Eigenvector 2")
plt.legend()
plt.title("Points after many linear transformations, scaled")

## Make an interactive video
from ipywidgets import interact, interactive, fixed, interact_manual, Layout
import ipywidgets as widgets
def plotter(i):
total_scale = eig_vals[0]**i
pts = all_pts[i]
plt.figure(figsize=(5,5))
plt.plot(pts[:, 0] / total_scale, pts[:, 1] / total_scale, '.')
plt.plot([0, eig_vec1[0]], [0, eig_vec1[1]], 'r', lw=2)
plt.plot([0, eig_vec2[0]], [0, eig_vec2[1]], 'r', lw=2)
plt.show()
interact(
plotter,
i=widgets.IntSlider(0, 0, 10, 1, layout=Layout(width='800px'))
)
plt.figure(figsize=(8, 6))
plt.plot(np.max(np.array(all_pts) @ eig_vec1, axis=-1), '.r', label="Eigenvector 1")
plt.plot(np.max(np.array(all_pts) @ eig_vec2, axis=-1), '.b', label="Eigenvector 2")
plt.legend()
plt.title("Max span of cloud along eigenvector directions vs. iteration")
plt.plot(np.exp(np.log(eig_vals[0]) * np.arange(11)), color='r', label="Eigenvalue geometric series 1")
plt.plot(np.exp(np.log(-eig_vals[1]) * np.arange(11)), color='b', label="Eigenvalue geometric series 2")
plt.legend()
plt.xlabel("Iteration")
plt.ylabel("Max span")

What am I leaving out?¶
- Non-Hermitian matrices rotate their input space, in addition to anisotopically squeezing it
- A hot topic of research in stat mech and active matter theory; these describe non-reciprocal interactions (broken detailed balance if the matrix describes kinetics, “odd” elasticity, etc)
What about non-square matrices? Singular Value Decomposition¶
.
- For example, if we have a video of a fluid flow or sensor readings, we might have frames from sensors/pixels
can be overdetermined () or underdetermined ().
However, we can instead calculate the eigenspectrum of the square matrices or . We refer to the eigenvectors as the “right” or “left” singular vectors, respectively, and their associated eigenvalues are called the singular values.
- Note: and share the same eigenvalues. Let be an eigenvector of with corresponding eigenvalue :
- Consequently, is also an eigenvalue of with corresponding eigenvector .
We can now factorize as:
where the columns of are the right singular vectors, the columns of are the left singular vectors, and is a square diagonal matrix. The non-zero entries of are the square roots of the non-zero eigenvalues of or . Since the eigenvalues and eigenvectors can be listed in any order, by convention we normally list the elements in descending order of eigenvalue magnitude.
Decompose a dataset into characteristic directions in rowspace (left eigenvectors) and characteristic directions in columnspace (right eigenvectors)
Unlike traditional eigenvalue decomposition, SVD applies to non-square matrices. We can also apply SVD to complex matrices!
Different interpretations: (1) Geometric, (2) Dynamical, (3) Data compression, pattern discovery
See image below from Chris Rycroft’s AM205 class

From this diagram, we can define several quantities
The singular values are the square roots of the eigenvalues of either or . They are always real and non-negative, and are usually indexed so that they are listed in descending order.
The left singular vectors are the eigenvectors of . They are orthogonal unit vectors that point in the directions of the principal axes of
The right singular vectors are the eigenvectors of . They are preimages of the vectors, so that .
Truncated versus full SVD¶
In full SVD, we calculate all singular values and vectors, and the matrices and are square with dimensions and , respectively, while the matrix is rectangular with dimensions .
In truncated SVD, we calculate only the first singular values and vectors associated with the largest singular values ,
We define the truncated matrices of size , of size . still has size .
The truncated SVD often approximates the original matrix well, making it useful for data compression and dimensionality reduction.
- See image below for a visual representation of the truncated SVD from Chris Rycroft’s AM205 class

Dynamical systems interpretation of SVD¶
In dynamical systems, we might have skew-product structure, where the dynamics of some variables depend only on a subset of the others
Examples: is a a time like variable, is a random number generator (like we saw for the Ornstein-Uhlenbeck simulation); is a periodic driver.
Can write this system has rectangular structure $$
=
$z_tzxyyx$/
We can’t do any spectral analysis anymore, because our coefficient matrix . However, we can instead analyze the two square matrices and . This is like analyzing two iterations of a square dynamical system.
a_response = np.array([[4, 1], [1, 2]]).T
print("Two iterations of square response subsystem without driving:")
print(a_response @ a_response.T, end='\n\n')
# print(a_response.T @ a_response, end='\n\n')
# Decoupled limit
a_full = np.array([[4, 1, 0], [1, 2, 0]]).T
print("Left matrix product with zero driving:")
print(a_full @ a_full.T, end='\n\n')
# print(a_full.T @ a_full, end='\n\n')
# Coupled limit
print("Left matrix product with driving:")
a_full = np.array([[4, 1, 3], [1, 2, 2]]).T
print(a_full @ a_full.T, end='\n\n')
Two iterations of square response subsystem without driving:
[[17 6]
[ 6 5]]
Left matrix product with zero driving:
[[17 6 0]
[ 6 5 0]
[ 0 0 0]]
Left matrix product with driving:
[[17 6 14]
[ 6 5 7]
[14 7 13]]
Loosely, the left matrix as propagation of an impulse in across the response subsystem (coupling of dynamical variables to the driver. Tells us about the dependence of response variables on earlier values
a_response = np.array([[4, 1], [1, 2]]).T
print("Two iterations of transpose of square response subsystem without driving:")
print(a_response.T @ a_response, end='\n\n')
# print(a_response.T @ a_response, end='\n\n')
# Decoupled limit
a_full = np.array([[4, 1, 0], [1, 2, 0]]).T
print("Right matrix product with zero driving:")
print(a_full.T @ a_full, end='\n\n')
# print(a_full.T @ a_full, end='\n\n')
# Coupled limit
print("Right matrix product with driving:")
a_full = np.array([[4, 1, 3], [1, 2, 2]]).T
print(a_full.T @ a_full, end='\n\n')
Two iterations of transpose of square response subsystem without driving:
[[17 6]
[ 6 5]]
Right matrix product with zero driving:
[[17 6]
[ 6 5]]
Right matrix product with driving:
[[26 12]
[12 9]]
Loosely, the right matrix tells us about the driver system. If the driver couplings have the form , then the right matrix is perturbed by elementwise-adding the structured matrix to the right matrix. The diagonal elements tell us something about how much the different dynamical variables are affected
- This approach might remind you of density matrices
bias = np.array([2, 3])[:, None]
print(bias @ bias.T, end='\n\n')
[[4 6]
[6 9]]
print(a_full.T @ a_full - a_response.T @ a_response, end='\n\n')
[[9 6]
[6 4]]
Data representation and compression views¶
- is a data matrix, where we have observations of an dimensional process. Examples include a video ( is number of frames, is number of pixels), or a collection of points in space ()
- The left eigenvectors give characteristic “modes” in the data; they have the same dimensionality. The right eigenvectors give us the characteristic “loadings”
- SVD can be used to reduce the dimensionality of a problem by truncating relevant eigenvalues; a basic compression method for high-dimensional data
- The eigenvalue spectrum provides a clue into the dimensionality of the underlying data generating process
Application: Image compression¶
Important: SVD applies to any rectangular matrix. We can reconstruct individual images by treating them as collections of columns, and reconstructing by adding together subsets of characteristic columns
If the singular values decay rapidly, then we can approximate the matrix by truncating this sum to the first terms
from PIL import Image as img
## Local import to open the image
# im = img.open("../resources/hank.png")
## Fetch image from online hosting
import requests
from io import BytesIO
url = "https://raw.githubusercontent.com/williamgilpin/cphy/main/resources/hank.png"
response = requests.get(url)
im = img.open(BytesIO(response.content))
# convert to a numpy image using just the red channel
im = np.array(im)[..., 0]
print("Image shape is", im.shape)
plt.imshow(im, cmap='gray')
plt.axis('off');
Image shape is (372, 248)

# Perform SVD on the image
U, s, V = np.linalg.svd(im)
sigma = s * np.eye(len(s))
# im = U @ sigma @ V.T
# im = U @ np.diag(s) @ V.T
print("U shape is", U.shape)
print("s shape is", s.shape)
print("V shape is", V.shape)
plt.figure(figsize=(8, 5))
plt.semilogy(s)
plt.xlabel("Singular value index");
plt.ylabel("Singular magnitude");
plt.title("Singular values of Hank Hill");
U shape is (372, 372)
s shape is (248,)
V shape is (248, 248)

Us = U[:, :20]
ss = np.diag(s[:20])
Vs = V[:20, :]
print("Us shape is", Us.shape)
print("ss shape is", ss.shape)
print("Vs shape is", Vs.shape)
im_approx = Us @ ss @ Vs
print("Approximation shape is", im_approx.shape)
plt.imshow(im_approx, cmap='gray')
plt.axis('off');
Us shape is (372, 20)
ss shape is (20, 20)
Vs shape is (20, 248)
Approximation shape is (372, 248)

import numpy as np
import matplotlib.pyplot as plt
### make a grid of 5 x 5 subplots
fig, axs = plt.subplots(4, 4, figsize=(10, 10))
for i in range(16):
im_approx = U[:, :i+1] @ np.diag(s[:i+1]) @ V[:i+1, :]
axs[i//4, i%4].imshow(im_approx, cmap='gray')
axs[i//4, i%4].set_title(f"k = {i}")
axs[i//4, i%4].set_xticks([])
axs[i//4, i%4].set_yticks([])
plt.tight_layout()
plt.show()

from matplotlib.animation import FuncAnimation
from IPython.display import HTML
# Assuming frames is a numpy array with shape (num_frames, height, width)
# frames = np.array(model.history).copy()
fig = plt.figure(figsize=(6, 6))
img = plt.imshow(U[:, :1] @ np.diag(s[:1]) @ V[:1, :], vmin=np.min(im), vmax=np.max(im), cmap="gray");
plt.xticks([]); plt.yticks([])
# tight margins
plt.margins(0,0)
plt.gca().xaxis.set_major_locator(plt.NullLocator())
frames = [U[:, :i] @ np.diag(s[:i]) @ V[:i, :] for i in range(1, 60)]
def update(frame):
img.set_array(frame);
ani = FuncAnimation(fig, update, frames=frames, interval=50)
HTML(ani.to_jshtml())
Density matrices and product states¶
From our image compression example, we can see that we can approximate a matrix by a sum over rank-1 (outer product) matrices
For a Hermitian matrix, , we can instead write this as a sum over rank-1 matrices with the same vector in both slots
where the are orthonormal eigenvectors and are the corresponding eigenvalues.
Recall that, in quantum mechanics, that if we have a system with “pure” eigenstates , then we can define the density matrix for a given mixed state as
We can see truncated SVD is equivalent to approximating by only including the most-probable terms with largest , thus removing rare states.
The Moore-Penrose pseudoinverse¶
Suppose we have an overdetermined system of equations
where , , and . If , then we have more equations than unknowns, and the system is overdetermined.
Deriving the pseudoinverse¶
We write the rectangular matrix in terms of its singular value decomposition
where, , , and . We multiply both sides by and note that , producing the equation
We next define the pseudoinverse of , denoted by , as the matrix with the same dimensions as that has the reciprocal of each non-zero element along the diagonal, and zeros everywhere else. If is square, then the pseudoinverse is the true inverse.
We can then multiply both sides by to get
We can then multiply both sides by to get
We can then define the pseudoinverse of a rectangular matrix as
What does the pseudoinverse mean?¶
The pseudoinverse is a generalization of the inverse to non-square matrices. In cases where no exists that satisfies , the pseudoinverse gives us the “best” solution in the sense that it minimizes the squared error .
# 1000 samples, 10 features
# 1000 equations/constraints, 10 unknowns
A = np.random.normal(size=(15, 10))
b = np.random.normal(size=(15, 1))
## Compute the Least Squares solution
def moore_penrose(A):
U, S, Vt = np.linalg.svd(A, full_matrices=False)
S_inv = np.zeros(S.shape)
# Calculate reciprocal of non-zero singular values
non_zero = S > 1e-15 # Identify non-zero singular values
S_inv[non_zero] = 1 / S[non_zero]
# Compute pseudoinverse
A_pseudo = Vt.T @ np.diag(S_inv) @ U.T
return A_pseudo
# A @ x = b
# A @ x - b
x = moore_penrose(A) @ b
plt.imshow(moore_penrose(A) @ A)

plt.plot(b, A @ x, '.') # plot the data

Appendix¶
## Make an interactive video of Hank's face
from ipywidgets import interact, interactive, fixed, interact_manual, Layout
import ipywidgets as widgets
def plotter(i):
Us = U[:, :i]
ss = np.diag(s[:i])
Vs = V[:i, :]
im_approx = Us @ ss @ Vs
plt.imshow(im_approx, cmap='gray')
plt.show()
interact(
plotter,
i=widgets.IntSlider(0, 1, V.shape[0], 1, layout=Layout(width='800px'))
)