Skip to article frontmatterSkip to article content

Monte-Carlo methods for Hard Sphere packings

The University of Texas at Austin

Preamble: Run the cells below to import the necessary Python packages

This notebook created by William Gilpin. Consult the course website for all content and GitHub repository for raw files and runnable online code.

import numpy as np
# from IPython.display import Image, display

import sys
sys.path.append('../') # Add the course directory to the Python path
import cphy.plotting as cplot

import matplotlib.pyplot as plt
%matplotlib inline

Gradient-free optimization

  • We saw before that many optimization methods implicitly require the locally-computed gradient of the potential, either explicitly or a finite-difference approximation.

  • For problems for which the gradient is impossible to compute (i.e. discrete systems like cellular automata), we used genetic algorithms to optimize systems. But what about problems that are nearly continuous and differentiable (in the sense of having a meaningful sense of distance among configurations), but which are very high-dimensional?

  • We saw that approximating the gradient by finite differences costs O(D)O(D) evaluations of the potential, where DD is the dimensionality of the system. If the potential is expensive to evaluate, this can be prohibitive.

  • Monte Carlo methods can be used to optimize a potential without computing its gradient. They exploit local information about the potential to guide the search, yet do not require the gradient.

Monte Carlo and Metropolis Sampling

The total energy of an NN particle system with pairwise interactions is given by

E=12i=1NjiNVijE = \dfrac12 \sum_{i=1}^N \sum_{j \neq i}^N V_{ij}

where the interaction energy VijV_{ij} includes interactions like V(rirj)V(\mathbf{r}_i - \mathbf{r}_j) that depend only on the separation between particles ii and jj.

In traditional Metropolis sampling, we propose a new configuration of the system by moving a single particle by a random amount. We compute the energy change between the original and proposed configurations,

ΔE=EnewEold\Delta E = E_\text{new} - E_\text{old}

If ΔE<0\Delta E < 0, we accept the new configuration. If ΔE>0\Delta E > 0, we accept the new configuration with probability exp(βΔE)\exp(-\beta \Delta E), where β=1/kBT\beta = 1/k_B T is the inverse temperature of the sampling process. In a machine learning context, we would identify β\beta as a hyperparameter that controls properties of the optimization.

Our apporach

  • We will draw a random number, ρ\rho, from a uniform distribution between 0 and 1. If ρ<exp(βΔE)\rho < \exp(-\beta \Delta E), we accept a higher-energy configuration. Otherwise, we reject the new configuration and keep the old one.

  • We will vectorize our implementation, in order to allow us to optimize particles starting at multiple initial positions simultaneously.

Why the probabilistic acceptance rule?

You might think that the fastest convergence would occur if we always accepted updates that lower the energy, and always rejected updates that increase the energy (equivalent to β=0\beta=0). The probabilistic update rule here represents a way for the optimizer to avoid getting stuck in local minima.


class MetropolisMonteCarlo:
    """
    Metropolis algorithm for optimizing a loss function.

    Attributes:
        energy (function): The loss function to be minimized
        beta (float): The inverse temperature
        amplitude (float): The amplitude of the perturbation applied to the state
        max_iter (int): The maximum number of iterations
        random_state (int): The random seed
        store_history (bool): Whether to store the history of the state
    """
    def __init__(
            self, 
            loss, 
            beta=1.0, 
            amplitude=1.0, 
            max_iter=500,
            random_state=None, 
            store_history=False
        ):
        self.energy = loss
        self.beta = beta
        self.amplitude = amplitude
        self.n_iter = max_iter
        self.store_history = store_history
        self.random_state = random_state

    def perturb(self, X):
        """
        Randomly the state of the system.
        
        Args:
            X (ndarray): The current state of the system

        Returns:
            Xp (ndarray): The perturbed state of the system
        """
        Xp = X + self.amplitude * np.random.normal(size=X.shape)
        return Xp

    def update(self, X):
        """
        Update the state of the system.
        
        Args:
            X (ndarray): The current state of the system

        Returns
            X (ndarray): The updated state of the system
        """
        rho = np.random.random(X.shape[0])
        Xp = self.perturb(X)
        e_delta = self.energy(Xp) - self.energy(X)

        # Create vector of update or not

        do_update = np.zeros(X.shape[0])
        do_update[e_delta < 0] = 1
        do_update[rho < np.exp(-self.beta * e_delta)] = 1

        X_out = X.copy()
        X_out[do_update == 1] = Xp[do_update == 1]
        return X_out

    def fit(self, X):
        """
        Run the Monte Carlo simulation.
        
        Args:
            X (ndarray): The initial state of the system

        Returns:
            X (ndarray): The final state of the system
        """
        np.random.seed(self.random_state)
        if self.store_history:
            self.history = [X]
        for _ in range(self.n_iter):
            X = self.update(X)
            if self.store_history:
                self.history.append(X)
        return X
from cphy.losses import RandomGaussianLandscape

# Instantiate a random loss
loss = RandomGaussianLandscape(random_state=0, n_wells=8)
loss.plot()
plt.axis('off')
(-3.0, 3.0, -3.0, 3.0)
<Figure size 800x800 with 1 Axes>

# Initialize optimizer
# optimizer = MetropolisMonteCarlo(loss, beta=1e2, amplitude=0.1, random_state=0, store_history=True)
optimizer = MetropolisMonteCarlo(loss, beta=1e6, amplitude=0.1, random_state=0, store_history=True)
# optimizer = MetropolisMonteCarlo(loss, beta=1e-2, amplitude=0.1, random_state=0, store_history=True)

# Initialize starting point
np.random.seed(0)
X0 = np.random.normal(loc=(0, 0.5), size=(100, 2))

# Fit optimizer
Xopt = optimizer.fit(X0.copy()).squeeze()
/var/folders/xt/9wdl4pmx26gf_qytq8_d528c0000gq/T/ipykernel_39660/818174836.py:60: RuntimeWarning: overflow encountered in exp
  do_update[rho < np.exp(-self.beta * e_delta)] = 1


Xs = np.array(optimizer.history).squeeze()
loss.plot()
plt.plot(Xs[0, :, 0], Xs[0, :, 1], '.k');
plt.plot(Xs[:, :, 0], Xs[:, :, 1], 'r');
plt.plot(Xs[-1, :, 0], Xs[-1, :, 1], '.b');
plt.xlim([-3, 3])
plt.ylim([-3, 3])
plt.axis('off')
(-3.0, 3.0, -3.0, 3.0)
<Figure size 800x800 with 1 Axes>
## Make an interactive video
from ipywidgets import interact, interactive, fixed, interact_manual, Layout
import ipywidgets as widgets

Xs = np.array(optimizer.history).squeeze()

def plotter(i):
    loss.plot()
    plt.plot(Xs[0, :, 0], Xs[0, :, 1], '.k');
    plt.plot(Xs[:i+1, :, 0], Xs[:i+1, :, 1], 'r');
    plt.plot(Xs[i, :, 0], Xs[i, :, 1], '.b');
    plt.xlim([-3, 3])
    plt.ylim([-3, 3])
    plt.axis('off')
    plt.show()

interact(
    plotter, 
    i=widgets.IntSlider(0, 0, Xs.shape[0] - 1, 1, layout=Layout(width='800px'))
)
Loading...

Hard sphere packing with the Metropolis algorithm

A basic problem in materials science, statistical physics, and the theory of disordered systems is to find the densest possible packing of hard spheres in a finite volume.

We define a hard sphere as a sphere of radius RR that cannot overlap with any other sphere. The packing fraction ϕ\phi is the fraction of the volume that is occupied by the spheres.

The interaction potential between two spheres is given by

V(r)={r<2R0r>2RV(r) = \begin{cases} \infty & r < 2R \\ 0 & r > 2R \end{cases}

where rr is the distance between the centers of the spheres. Note that, because we have an infinite potential, the Metropolis update step simplifies dramatically. Recall that, in traditional Metropolis, we calculate the ratio of the Boltzmann factors for the current and proposed configurations:

eβEneweβEold=eβ(EnewEold)\frac{e^{-\beta E_{\text{new}}}}{e^{-\beta E_{\text{old}}}} = e^{-\beta (E_{\text{new}} - E_{\text{old}})}

In our case, the energy difference is either zero or infinity, so the ratio is either 1 or 0. This means that the acceptance probability is either 1 or 0, and we can simply accept or reject the proposed move based on whether the proposal causes an overlap between two spheres

Our implementation

  • We will use periodic boundary conditions, so that the system is infinite in all directions. This means that we can place spheres anywhere in the simulation box, and we don’t have to worry about the edges of the box.

  • Rather than parametreize the system in terms of the particle radius RR, we will parameterize it in terms of the packing fraction ϕ\phi, which equals the volume of the spheres divided by the volume of the simulation box. This is a more natural parameterization for the problem, because it is dimensionless.

  • At low ϕ\phi, the spheres will be able to move around freely, and the system will be in a fluid state. At high ϕ\phi, the spheres will be jammed together, and the system will reach a solid state.

References

  • Random Heterogeneous Materials: Microstructure and Macroscopic Properties, by Sal Torquato
from scipy.spatial.distance import pdist
import itertools

## We define this helper function externally so we can use it outside of the class
def augment_lattice(coords):
    """
    Given a set of 3D coordinates, build an augmented lattice of coordinates by 
    shifting each coordinate by 1 in each direction.

    Args:
        coords (np.ndarray): coordinates of the particles of shape (N, 3)

    Returns:
        np.ndarray: augmented coordinates of shape (27 * N, 3)
    """
    coords_augmented = list()
    # there are 27 possible shifts to fully surround a simulation box with 
    # replicates to create boundary conditions
    for shift in itertools.product([-1., 0., 1.], repeat=3):
        coords_augmented.append(coords.copy() + np.array(shift))
    return np.vstack(coords_augmented).copy()

class HardSpherePacking:
    """
    Simulate hard sphere packing using Metropolis algorithm. The simulation in
    specified by the number of spheres, and the desired volume fraction set initially. 
    The radius of the spheres is calculated to achieve the target volume fraction. 

    Args:
        n (int): number of particles
        phi (float): packing fraction
        zeta (float): maximum displacement

    Attributes:
        coords (np.ndarray): coordinates of sphere centers
        overlap (float): minimum distance between sphere centers
    """

    def __init__(self, n, phi, d=3, zeta=0.005, store_history=False):
        self.N = n
        self.phi = phi 
        self.zeta = zeta
        self.d = d
        
        self.coords = self.initialize_lattice()
        self.radius = (6 * phi/(np.pi * n))**(1/3)
        # self.radius = (phi * d / n)**(1 / d)
        # pi factor is for 3D
        self.store_history = store_history

        self.acceptance_rate = None
        if store_history:
            self.history = list()

    def initialize_lattice(self):
        """
        Generate a cubic initial conditions mesh within the unit cube

        Returns:
            np.ndarray: coordinates of particles in a three-dimensional mesh
        """
        nx = ny = nz = int(np.ceil(self.N**(1/3)))
        x = np.linspace(0, 1, nx + 1)[:-1]
        y = np.linspace(0, 1, ny + 1)[:-1]
        z = np.linspace(0, 1, nz + 1)[:-1]
        xx, yy, zz = np.meshgrid(x, y, z)
        coords = np.vstack((xx.flatten(), yy.flatten(), zz.flatten())).T

        #coords = np.random.choice(coords, size=self.N, replace=False)
        return coords[:self.N]

    def is_overlap(self, coords, index):
        """
        Given a set of coordinates, check whether the sphere at the specified index 
        overlaps with any other sphere.

        Args:
            coords (np.ndarray): coordinates of the particles
            index (int): index of the particle to check

        Returns:
            bool: whether the particle overlaps with any other particle
        """
        # apply periodic boundary conditions
        coos = augment_lattice(coords) 

        # compute distances between all particles and indexed particle
        dists = np.sqrt(np.sum((coords[index] - coos)**2, axis=1))

        dists = dists[dists > 0] # remove self-overlap term
        return np.any(dists < self.radius)

    def fit(self, num_iter=1000, verbose=False):
        """
        Run the Metropolis algorithm for the specified number of iterations. Can be 
        restarted to continue the simulation.

        Args:
            num_iter (int): number of iterations
            verbose (bool): whether to print progress

        Returns:
            self
        """
        accepts, rejects = 0, 0
        if self.store_history:
            self.history.append(self.coords.copy())
        for i in range(num_iter):

            for j in range(self.N):
                newcoords = self.coords.copy()
                x0 = self.coords[j]
                pert = self.zeta * (2 * np.random.rand(3) - 1)
                xp = x0 + pert

                ## Boundary conditions in applied displacements
                xp = xp - np.floor(xp + 0.5)

                newcoords[j] = xp

                ## Update if no overlap, and tally accept/reject
                if not self.is_overlap(newcoords, j):
                    self.coords[j] = xp
                    accepts += 1
                else:
                    rejects += 1

            ## Report progress and update history      
            if verbose:
                if i % 50 == 0: print(f"Step {i}", flush=True)
            if self.store_history:
                self.history.append(self.coords.copy())

        self.acceptance_rate = accepts / (accepts + rejects)
        return self
## Run simulation

hsp = HardSpherePacking(100, 0.4, store_history=True)

hsp.fit(num_iter=500, verbose=True)
## Save the history because this simulation is demanding
# np.savez('../private_resources/monte_carlo_metropolis.npz', np.array(hsp.history))

print(f"Terminated Metropolis with average acceptance rate {hsp.acceptance_rate}")
Step 0
Step 50
Step 100
Step 150
Step 200
Step 250
Step 300
Step 350
Step 400
Step 450
Terminated Metropolis with average acceptance rate 0.56476
## Make an interactive video
from ipywidgets import interact
import ipywidgets as widgets


def plotter(i):
    coords = hsp.history[i]
    fig = plt.figure(figsize=(8,8))
    ax = fig.add_subplot(111, projection='3d')
    ax.scatter(coords[:, 0], coords[:, 1], coords[:, 2], c='b', marker='o', s=2000, alpha=0.2)
    ax.set_zlim(-0.5, 0.5)
    plt.xlim([-0.5, 0.5])
    plt.ylim([-0.5, 0.5])
    
    plt.show()

interact(
    plotter, 
    i=widgets.IntSlider(min=5, max=len(hsp.history)-1, step=1, value=0)
)

Loading...

What happens as we vary the packing fraction?

phi_range = np.linspace(0.01, 0.5, 10)

final_coords = list()
for phi in phi_range:
    hsp = HardSpherePacking(100, phi, store_history=True)
    hsp.fit(num_iter=500, verbose=False)
    print(f"At packing fraction {phi}, Metropolis had average acceptance rate {hsp.acceptance_rate}")

    coords = hsp.history[-1]
    final_coords.append(coords.copy())
At packing fraction 0.01, Metropolis had average acceptance rate 0.99776
At packing fraction 0.06444444444444444, Metropolis had average acceptance rate 0.99104
At packing fraction 0.11888888888888888, Metropolis had average acceptance rate 0.9721
At packing fraction 0.17333333333333334, Metropolis had average acceptance rate 0.95844
At packing fraction 0.22777777777777777, Metropolis had average acceptance rate 0.9261
At packing fraction 0.2822222222222222, Metropolis had average acceptance rate 0.88954
At packing fraction 0.33666666666666667, Metropolis had average acceptance rate 0.79408
At packing fraction 0.3911111111111111, Metropolis had average acceptance rate 0.64774
At packing fraction 0.44555555555555554, Metropolis had average acceptance rate 0.0
At packing fraction 0.5, Metropolis had average acceptance rate 0.0

The pairwise correlation function

The pairwise correlation function is defined as the probability of finding a particle at a distance rr from another particle, given that the other particle is at the origin. It is given by

g2(r)=1ρNi=1Njiδ(rrirj)g_2(r) = \frac{1}{\rho N} \langle \sum_{i=1}^N \sum_{j \neq i} \delta(r - |\mathbf{r}_i - \mathbf{r}_j|) \rangle

where ρ\rho is the overall number density of particles. The sum over jj excludes the i=ji=j term, which would be infinite. The delta function ensures that the sum is only over pairs of particles that are separated by a distance rr. The factor of 1/N1/N ensures that the sum is normalized by the number of particles, while the factor of 1/ρ1/\rho ensures that the sum is normalized by the volume of the system.

For finite-size systems, g2(r)g_2(r) as defined here appears “spiky,” comprising a series corresponding to all possible pairs of particles. In the thermodynamic limit, however, the spikes become infinitely narrow and the function becomes smooth. In order to estimate g2(r)g_2(r) for a finite-size system, we must smooth the function by binning it. We will use a bin size of Δr\Delta r. In this case, the correlation function is given by

g2(r)=1ρNi=1Nji1ΔrΘ(rrirj)g_2(r) = \frac{1}{\rho N} \langle \sum_{i=1}^N \sum_{j \neq i} \frac{1}{\Delta r} \Theta(r - |\mathbf{r}_i - \mathbf{r}_j|) \rangle

where Θ\Theta is the Heaviside step function. The factor of 1/Δr1/\Delta r ensures that the sum is normalized by the bin size.

from scipy.spatial.distance import pdist

def compute_g2(coords, nbins=180, rmax=0.5):
    """
    Given a set of coordinates, compute the pair correlation function g2(r)

    Args:
        coords (np.ndarray): coordinates of the particles
        nbins (int): number of bins to use for the histogram
        rmax (float): maximum distance to consider for the histogram

    Returns:
        bins (np.ndarray): bin edges
        g2 (np.ndarray): pair correlation function
    """
    D = pdist(coords) # computes pairwise distances among all particles
    bins = np.linspace(0, rmax, nbins)

    ## Bin the particles based on their distance
    hist, bins = np.histogram(D, bins=bins)
    
    ## divide by the volume of the shell
    g2 = hist / (len(coords) * 4*np.pi/3 * (bins[1:]**3 - bins[:-1]**3))

    ## normalize by the average density
    #rho_ave = len(coords) / (4*np.pi/3 * rmax**3)
    g2 = g2 / g2[-3:].mean()
    return bins, g2
## Run large-scalesimulation

hsp = HardSpherePacking(1000, 0.45, store_history=True)

# hsp.fit(num_iter=5000, verbose=True)
# print(f"Terminated Metropolis with acceptance rate {hsp.acceptance_rate}")
## Save the history because this simulation is demanding
# np.savez('../private_resources/monte_carlo_metropolis.npz', np.array(hsp.history))

## Load the history to avoid re-running the simulation
hsp.history = np.load('../private_resources/monte_carlo_metropolis.npz', allow_pickle=True)['arr_0']
hsp.coords = hsp.history[-1]
from cphy.plotting import plt_sphere

## Load the history
# hsp.history = np.load('../private_resources/monte_carlo_metropolis.npz', allow_pickle=True)['arr_0']
# hsp.coords = hsp.history[-1]

bins, hist = compute_g2(augment_lattice(hsp.coords), nbins=150)
plt.figure()
plt.plot(bins[:-1], hist, 'k')
plt.xlabel(r'r')
plt.ylabel(r'$g_2(r)$')

plt.figure()
fig = plt.figure()
plt_sphere(hsp.history[1].copy(), hsp.radius)
plt.xlim([-0.5, 0.5])
plt.ylim([-0.5, 0.5])
plt.gca().set_zlim([-0.5, 0.5])

fig = plt.figure()
plt_sphere(hsp.history[-1].copy(), hsp.radius),
plt.xlim([-0.5, 0.5])
plt.ylim([-0.5, 0.5])
plt.gca().set_zlim([-0.5, 0.5]) 
(-0.5, 0.5)
<Figure size 640x480 with 1 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 1 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 1 Axes>

We can compare our results to the typical appearance of the correlation function for a dense packing of hard spheres. The correlation function is a smooth function that decays to one (the asymptotic average density) at long distances.

This figure from Wang et al. PRE 2010 shows the correlation function computed from molecular dynamics simulations, as well as an analytic approximate solution for g(r),

## Make an interactive video
from ipywidgets import interact
import ipywidgets as widgets


def plotter(i):
    coords = hsp.history[i]
    fig = plt.figure(figsize=(8,8))
    ax = fig.add_subplot(111, projection='3d')
    ax.scatter(coords[:, 0], coords[:, 1], coords[:, 2], c='b', marker='o', s=100, alpha=0.2)
    ax.set_zlim(-0.5, 0.5)
    plt.xlim([-0.5, 0.5])
    plt.ylim([-0.5, 0.5])
    
    plt.show()

interact(
    plotter, 
    i=widgets.IntSlider(min=5, max=len(hsp.history)-1, step=1, value=0)
)

Loading...

Questions

  • We have a double loop: an outer loop over all iterations, and an inner loop where we go over each individual particle, jostle it, and check for overlaps. Why can’t this inner loop be vectorized?

  • How would we generalize this approach to a soft-repulsive potential?

from IPython.display import Image
Image("../resources/hardsphere_diagram.png", width=500)
# Computer simulations of dense hard‐sphere systems, Rintoul & Torquato. JCP 1996
<IPython.core.display.Image object>

What about long-range interactions?

  • For the hard-sphere system, we needed to compute the energy change based on the potentially-overlapping neighbors of each particle.

  • Locality: When a single particle position updates, we only need to check locally for potential new overlaps---we don’t have to check particles that are across the lattice.

  • For N particles with volume vv in a volume VV, the number density is ρ=N/v\rho = N/v. The number of overlapping neighbors is therefore, on average, equal to K=Nv/VK = N v / V.

  • The energy update for the hard-sphere potential therefore takes O(NK)O(N K) time to compute. Usually, KNK \ll N and so we say the cost is O(N)O(N). This is also true for soft-spheres, and it is effectively true for short-range potentials like the Lennard-Jones potential.

  • But what about gravitationally-interacting systems? For every update, we have to check interactions with every other particle. This is O(N2)O(N^2), and it is a major computational bottleneck for gravitational N-body simulations.

  • One solution involves using approximate force fields for long-range interactions. Examples of algorithms that do this include the Barnes-Hut algorithm, the Fast Multipole Method, and Ewald summation.

Further reading

A recent modification of the Metropolis algorithm by Muller et al., rearranges the acceptance step

ΔE=logρβΔEth\Delta E = -\dfrac{\log \rho}{\beta} \equiv \Delta E_{th}

where ρ\rho is a random number between 0 and 1. This approach therefore re-frames the Metropolis update as determining whether the proposed energy change ΔE\Delta E lies above or below the random threshold ΔEth\Delta E_{th}.

The key insight of Müller et al. is that ΔE\Delta E does not need to be known exactly. Rather, it is sufficient to gradually update a set of upper and lower bounds, ΔE[ΔEmin,ΔEmax]\Delta E \in [\Delta E_{min}, \Delta E_{max}]. If ΔEmin<ΔEth\Delta E_{min} < \Delta E_{th} then the update is accepted. If ΔEmax<ΔEth\Delta E_{max} < \Delta E_{th} then the update is rejected probabilistically.

Thus, rather than exactly compute the energy change, the authors can use a tree-like decomposition of the domain to gradually improve bounds on the true energy change, until one of the stopping conditions above is triggered. Importantly, this is an exact algorithm, and it does not require any approximations.

References
  1. Müller, F., Christiansen, H., Schnabel, S., & Janke, W. (2023). Fast, Hierarchical, and Adaptive Algorithm for Metropolis Monte Carlo Simulations of Long-Range Interacting Systems. Physical Review X, 13(3). 10.1103/physrevx.13.031006