Vectorization and arrays: Period-doubling bifurcations, and the Mandelbrot Set¶
Preamble: Run the cells below to import the necessary Python packages
This notebook created by William Gilpin. Consult the course website for all content and GitHub repository for raw files and runnable online code.
# pre-installed library containing a grab bag of system functions
import sys
print(sys.version)
# Import some local code, which I'll use behind the scenes
# from ../resources import helper_functions
# Wipe all outputs from this notebook
from IPython.display import Image, clear_output
clear_output(True)
# Import local plotting functions and in-notebook display functions
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:41:52) [Clang 15.0.7 ]
External libraries¶
A major benefit of the Python ecosystem is the ability to easily import external libraries. This is done with the import
command. For example, the following imports the numpy
library, which contains many useful functions for scientific computing.
import numpy as np
Common libraries you might use in the natural sciences include
- matplotlib (plotting)
- numpy (linear algebra; lots of MATLAB functions)
- scipy (common scientific algorithms)
- scikit-learn (machine learning)
- networkx (making graphs)
Major libraries you might use in data science include
- numba (just-in-time compilation of numpy-based code)
- pandas (large dataset organization; lots of R functionality)
- statsmodels (statistics; lots of R functionality)
- seaborn (plotting; alternative to matplotlib with API similar to R's ggplot)
- PyTorch (deep learning)
- JAX (deep learning)
Typical usage (in terminal): install
conda activate myenvironment
conda install pandas
When conda fails, or the package is not available, use pip
pip install pandas
Then either launch launch JupyterLab from terminal, or open VSCode app and select the appropriate environment in the upper-right hand corner
import matplotlib.pyplot as plt # Plotting library
import numpy as np # Linear algebra library
import scipy # General scientific computing library
Arrays, indexing, broadcasting, vectorization¶
Numpy is technically an external library, but we will use it so often in scientific computing that it's basically a built-in data type for our purposes.
Suite of support for the
np.ndarray
datatype, which represents tensors. Most MATLAB functions are within numpy
import numpy as np
a = np.arange(10)
print(a)
type(a)
[0 1 2 3 4 5 6 7 8 9]
numpy.ndarray
The logistic map¶
The logistic map is a polynomial mapping (equivalently, recurrence relation) of degree 2, often cited as an archetypal example of how complex, chaotic behaviour can arise from very simple non-linear dynamical equations.
The map is defined by:
$$x_{n+1} = r x_n (1 - x_n)$$
where the dynamical variable $x_n \in (0, 1)$ is a number between zero and one. The parameter $r$ represents a growth rate, and varies between zero and four. For $r$ values between zero and one, the population always dies out. However, as $r$ increases past one, a bifurcation occurs at $r=3$, where the population begins to oscillate between two values. As $r$ increases further, period doubling occurs, leading to four values, then eight, and so on, until finally chaos occurs, and the population takes on seemingly random values between zero and one.
# logistic map
import numpy as np
import matplotlib.pyplot as plt
def logistic_map(x, r):
"""
The logistic map, a discrete-time dynamical system.
"""
return r * x * (1 - x)
r = 3.2
x = [0.65]
for i in range(1, 100):
x.append(logistic_map(x[i-1], r))
plt.figure()
plt.plot(x, 'k')
plt.xlabel('Iteration')
plt.ylabel('x')
r = 3.8
x = [0.65]
for i in range(1, 100):
x.append(logistic_map(x[i-1], r))
plt.figure()
plt.plot(x, 'k')
plt.xlabel('Iteration')
plt.ylabel('x')
Text(0, 0.5, 'x')
Runtime complexity and dynamical systems¶
A common question we might ask for an algorithm is: How does the runtime scale with the input size? What about the memory usage?
These are sometimes called the asymptotic time and space complexities of an algorithm.
The asymptotic complexity of an algorithm is independent of the hardware and software environment. It is a representationn-invariant mathematical property of the algorithm.
import numpy as np
def simulate_logistic(n, r=3.8, x0=0.65):
"""
Simulate the logistic map for n iterations using numpy and preallocation
"""
x = np.zeros(n) # Preallocate an array of zeros
x[0] = x0
for i in range(1, n):
prev = x[i-1]
x[i] = r * prev * (1 - prev)
return x
# x = simulate_logistic(1000)
# plt.plot(x, 'k')
%timeit simulate_logistic(10000)
2.4 ms ± 32.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
import timeit
n_vals = np.arange(1, 6000, 5)
# n_vals = np.logspace(1, 5, 50, dtype=np.int32)
times = []
for n in n_vals:
times.append(timeit.timeit("simulate_logistic(n)", globals=globals(), number=10))
plt.figure()
plt.plot(n_vals, times)
plt.title("Sum Iterative runtime")
plt.xlabel("Number of iterations")
plt.ylabel("Runtime (s)")
Text(0, 0.5, 'Runtime (s)')
Notice that in the asymptotic regime the runtime scales linearly.
Why is the runtime $\sim N$? We only "touch" each element of our solution array twice---once to calculate it, and once to update the next state. We don't need to look back at all the previous states.
The memory usage is also $\sim N$ because we are storing the full, cumulative history in order to plot the trajectory. If we only wanted to plot the final state, we could get away with only storing the final state. In this case, the memory would be $\sim 1$.
Now that we have a fast implementation of the logistic map, we can explore its properties in more detail.
The logistic map follows the celebrated period-doubling route to chaos. As the parameter $r$ increases, the period of the oscillations doubles, until chaos occurs at $r \approx 3.57$.
We can see this effect by creating a bifurcation diagram, which shows the values of $x_n$ for each value of $r$.
def bifurcation_diagram(dmap, x0, pvals, n=1000, transient=0.5):
"""
Create a bifurcation diagram of a discrete map.
Args:
dmap (callable): A function that takes two arguments, x and p, and returns the next
value of x in the discrete map.
x0 (float): The initial condition of the system.
pvals (list): A list of parameter values.
n (int): The number of iterations of the map to run for each parameter value.
transient (float): The proportion of the initial iterations to discard as transient.
Returns:
xvals (list): A list of lists of x values.
"""
n_discard = int(transient * n)
all_xvals = []
for p in pvals:
vals = simulate_logistic(n, r=p, x0=x0)
all_xvals.append(vals[n_discard:])
return xvals
def bifurcation_diagram(dmap, x0, pvals, n=1000, transient=0.1):
"""
Create a bifurcation diagram of a discrete map.
Args:
dmap (callable): A function that takes two arguments, x and p, and returns the next
value of x in the discrete map.
x0 (float): The initial condition of the system.
pvals (list): A list of parameter values.
n (int): The number of iterations of the map to run for each parameter value.
transient (float): The proportion of the initial iterations to discard as transient.
Returns:
xvals (list): A list of lists of x values.
"""
xvals = list()
n_discard = int(transient * n)
for p in pvals:
xvals.append(simulate_logistic(n, r=p, x0=x0)[n_discard:])
return xvals
rvals = np.linspace(2.5, 4.0, 1000)
xvals = bifurcation_diagram(logistic_map, 0.65, rvals)
for r, x in zip(rvals, xvals):
plt.plot([r] * len(x), x, '.k', markersize=0.05)
plt.xlim(rvals[0], rvals[-1])
plt.ylim(0, 1)
plt.xlabel('r')
plt.ylabel('x')
Text(0, 0.5, 'x')
Notice how the spacing in $r$ between the period-doubling bifurcations decreases as $r$ increases. The geometric rate of decrease in the spacing between bifurcations is a constant value called the Feigenbaum constant.
Slicing and indexing numpy arrays¶
- These are extremely useful for manipulating arrays in Python
Slicing¶
traj = simulate_logistic(1000)
plt.figure()
plt.plot(traj[1:], traj[:-1], '.k', alpha=0.05)
plt.xlabel('Iteration')
plt.ylabel('Next Iteration')
Text(0, 0.5, 'Next Iteration')
Boolean Selection with numpy arrays¶
- Can pass boolean arrays of the same shape as an array
# simulate before and after period doubling
# Take last 50 iterations to remove transient
traj2 = simulate_logistic(500, r=3)[-50:]
plt.figure()
plt.plot(traj2, 'k')
plt.xlabel('Iteration')
plt.ylabel('x')
traj4 = simulate_logistic(500, r=3.5)[-50:]
plt.figure()
plt.plot(traj4, 'k')
plt.xlabel('Iteration')
plt.ylabel('x')
traj8 = simulate_logistic(1000, r=3.56)[-50:]
plt.figure()
plt.plot(traj8, 'k')
plt.xlabel('Iteration')
plt.ylabel('x')
traj_inf = simulate_logistic(500)[-50:]
plt.figure()
plt.plot(traj_inf, 'k')
plt.xlabel('Iteration')
plt.ylabel('x')
Text(0, 0.5, 'x')
We can use slicing to map the trajectories before and after each period doubling onto one another.
plt.figure(figsize=(6, 2))
plt.plot(traj2)
plt.figure(figsize=(6, 2))
plt.plot(traj4[::2])
plt.figure(figsize=(6, 2))
plt.plot(traj8[::4])
[<matplotlib.lines.Line2D at 0x2b30fdff0>]
select_inds = traj > 0.5
# print(select_inds)
# nvals = np.arange(len(traj))
plt.figure()
plt.plot(nvals, traj, 'k')
plt.xlabel('Iteration')
plt.ylabel('Next Iteration')
plt.plot(nvals[select_inds], traj[select_inds], '.r')
[<matplotlib.lines.Line2D at 0x2b324f1f0>]
a = np.random.random((5, 5))
plt.figure()
plt.imshow(a, cmap='Blues', vmin=0)
print((a < 0.5).shape)
a[a < 0.5] = 0
plt.figure()
plt.imshow(a, cmap='Blues', vmin=0)
(5, 5)
<matplotlib.image.AxesImage at 0x2b3301060>
Fancy indexing¶
- Can pass lists of indices to modify
a = np.array([1, 2, 3, 4, 5, 6, 7])
sel_inds = [0, 2, 4]
a[sel_inds] = 0
print(a)
[0 2 0 4 0 6 7]
Broadcasting¶
- We can slice numpy arrays just like lists, but in multiple dimensions
- singleton dimensions versus empty dimensions
- Lifting dimensionality makes linear algebra easier
positions = np.random.random(size=(1000,))
print("Empty dimensions", positions.shape) # Elementwise (Hadamard)
print((positions * positions.T).shape)
print((positions * positions.T).shape) # Implicit outer product
Empty dimensions (1000,) (1000,) (1000,)
positions = np.random.random(size=(1000,))
positions = positions[:, None]
print("Singleton dimensions", positions.shape)
print((positions * positions.T).shape) # Implicit outer product
Singleton dimensions (1000, 1) (1000, 1000)
Vectorization¶
- Does not change runtime scaling, but changes the prefactor
- Runtime at expense of memory
Let's try finding the pairwise distance matrix between two sets of points in two-dimensions. We will do this in two ways: a naive way, and a vectorized way.
# Random positions dataset
positions = np.random.random(size=(100, 2))
positions.shape
(100, 2)
# pairwise distance matrix will be 100 x 100
def find_distances_iterative(positions):
"""Find the pairwise distances between a set of points"""
list_of_distances = list()
for i in range(positions.shape[0]):
row_distances = list()
for j in range(positions.shape[0]):
row_distances.append(
np.sqrt(
np.sum((positions[i, :] - positions[j, :])**2)
)
)
list_of_distances.append(row_distances)
distance_matrix = np.array(list_of_distances)
return distance_matrix
%timeit find_distances_iterative(positions)
59.5 ms ± 378 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
def find_distances_preallocated(positions):
"""
Find distance matrix with preallocated memory and built-in elementwise numpy functions
"""
distances = np.zeros((positions.shape[0], positions.shape[0]))
for i in range(positions.shape[0]):
for j in range(positions.shape[0]):
distances[i, j] = np.linalg.norm(positions[i, :] - positions[j, :])
return distances
%timeit find_distances_preallocated(positions)
20.9 ms ± 166 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# pairwise distance matrix
def find_distances_vectorized(positions):
"""
Find distance matrix vectorized with broadcasting
"""
# (100, 2)
# (100, 1, 2)
# (1, 100, 2)
return np.linalg.norm(positions[:, None, :] - positions[None, :, :], axis=2)
%timeit find_distances_vectorized(positions)
179 µs ± 2.45 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
import timeit
n_vals = np.arange(1, 100, 1)
all_times = []
for n in n_vals:
time1 = timeit.timeit("find_distances_iterative(np.random.random((n, 2)))", globals=globals(), number=n)
time2 = timeit.timeit("find_distances_preallocated(np.random.random((n, 2)))", globals=globals(), number=n)
time3 = timeit.timeit("find_distances_vectorized(np.random.random((n, 2)))", globals=globals(), number=n)
all_times.append([time1, time2, time3])
all_times = np.array(all_times)
plt.figure()
plt.plot(n_vals, all_times[:, 0])
plt.title("Sum Iterative runtime")
plt.xlabel("Number of iterations")
plt.ylabel("Runtime (s)")
plt.figure()
plt.plot(n_vals, all_times[:, 1])
plt.title("Numpy Iterative runtime")
plt.xlabel("Number of iterations")
plt.ylabel("Runtime (s)")
plt.figure()
plt.plot(n_vals, all_times[:, 2])
plt.title("Numpy Iterative runtime")
plt.xlabel("Number of iterations")
plt.ylabel("Runtime (s)")
Text(0, 0.5, 'Runtime (s)')
The basins of attraction of the Mandelbrot set are fractal¶
Creation of the fractal Mandelbrot set involves defining a 2D lattice, and then iterating the following function at each lattice point:
$$ z_{t + 1} = z_{t}^2 + c $$ where the complex number $z = x + i y$ encodes a 2D spatial position. This update rule is equivalent to the 2D update rule
$$ x_{t + 1} = x_t^2 - y_t^2 + c \\ y_{t + 1} = 2 x y $$
For the Mandelbrot set, we are interested in plotting, for each initial condition $z_0$, the asymptotic value $\lim_{n \rightarrow \infty} |z_n|$. For some points the set diverges, and for others it converges. The boundary between these two regions is the Mandelbrot set.
Adapted from Nicolas Rougier & Loïc Houpert's book
Let's start by plotting two trajectories of the Mandelbrot set that start from different initial conditions.
def mandelbrot_map(z0, c=-3/4, max_iters=100):
"""
Compute the Mandelbrot fractal using iteration and a vectorized approach
"""
z = [z0]
for i in range(max_iters):
z.append(z[-1]**2 + c)
return np.array(z)
z0 = 0.1 + 1j * 0.5
traj = mandelbrot_map(z0)
plt.figure(figsize=(12, 6))
plt.plot(np.real(traj))
plt.plot(np.imag(traj))
plt.legend(['Real', 'Imaginary'])
plt.xlabel('Iteration')
plt.ylabel('Value')
z0 = -0.5 + 1j * 0.5
traj = mandelbrot_map(z0)
plt.figure(figsize=(12, 6))
plt.plot(np.real(traj))
plt.plot(np.imag(traj))
plt.legend(['Real', 'Imaginary'])
plt.xlabel('Iteration')
plt.ylabel('Value')
Text(0, 0.5, 'Value')
We are interested in finding the basins of attraction, which the subregions of the complex plane that converge to the same asymptotic value. We can do this by computing trajectories of the Mandelbrot set for a grid of initial conditions.
In our implementation, we iterate over a grid of initial conditions and then calculate the trajectory under the Mandelbrot map. If the absolute value of the trajectory exceeds a threshold, we assume that means that those initial conditions diverged. We therefore assign the initial condition to the basin of attraction corresponding to the threshold that was exceeded.
def mandelbrot_iterative(z, maxiter=20, horizon=2.0, c=-3/4):
"""
An implementation of the Mandelbrot set using iterative loops
We use the built-in complex type, although we could have also split into separate
iterations for the real and imaginary parts
Args:
z (complex): A 2D array of complex numbers to iterate
maxiter (int): The maximum number of iterations to perform
horizon (float): The maximum value of the magnitude of the complex number
c (complex): The control parameter for the map
Returns:
z (complex): A 2D array of complex numbers, with values of 0 or 1 depending on
whether they diverge or not
"""
n, m = z.shape
for i in range(n):
for j in range(m):
for t in range(maxiter):
if abs(z[i, j]) >= horizon:
z[i, j] = horizon
else:
z[i, j] = z[i, j]**2 + c
return z
# define a list of the x, y meshpoints at which we will compute the mandelbrot set
xx, yy = np.meshgrid(np.linspace(-1.5, 0.5, 600), np.linspace(-1, 1, 600))
zz = xx + 1j * yy
horizon_vals = mandelbrot_iterative(zz)
plt.figure(figsize=(12, 6))
plt.imshow(np.real(horizon_vals), cmap="PuBu")
plt.axis('off'); plt.tight_layout(pad=0) # remove all axes, padding, and ticks
In our vectorized implementation, we will use the same algorithm, but we will iterate the entire initial condition array all at the same time. We will use Boolean indexing to assign different values, depending on whether a given array point is diverging or not.
def mandelbrot_numpy(z, maxiter=20, horizon=5.0, c=-3/4):
"""
A vectorized implementation of the Mandelbrot set using numpy
Args:
z (complex): A 2D array of complex numbers to iterate
maxiter (int): The maximum number of iterations to perform before giving up
horizon (float): The maximum value of the magnitude of the complex number
c (complex): The control parameter for the map
Returns:
z (complex): A 2D array of complex numbers, with values of 0 or 1 depending on
whether they diverge or not
"""
z = z.astype(np.complex64) # cast to numpy's own complex type
for t in range(maxiter):
z[np.abs(z) <= horizon] = z[np.abs(z) <= horizon]**2 + c
z[np.abs(z) > horizon] = horizon
return z
# define the x, y meshpoints at which we will compute the mandelbrot set
xx, yy = np.meshgrid(np.linspace(-1.5, 0.5, 600), np.linspace(-1, 1, 600))
zz = xx + 1j * yy
horizon_vals = mandelbrot_numpy(zz, maxiter=200)
plt.figure(figsize=(12, 6))
plt.imshow(np.abs(horizon_vals), cmap="PuBu")
plt.axis('off'); plt.tight_layout(pad=0) # remove all axes, padding, and ticks
plt.title("Mandelbrot Set")
Text(0.5, 1.0, 'Mandelbrot Set')
Let's do a timing experiment, where we see how the runtime of our two implementations changes as we increase the number of iterations.
import timeit
n_vals = np.arange(1, 100, 1)
# define the x, y meshpoints at which we will compute the mandelbrot set
xx, yy = np.meshgrid(np.linspace(-1.5, 0.5, 30), np.linspace(-1, 1, 30))
zz = xx + 1j * yy
all_times = []
for n in n_vals:
time1 = timeit.timeit(
"mandelbrot_iterative(zz, maxiter=n)", globals=globals(), number=n
)
time2 = timeit.timeit(
"mandelbrot_numpy(zz, maxiter=n)", globals=globals(), number=n
)
all_times.append([time1, time2])
all_times = np.array(all_times)
plt.figure()
plt.plot(n_vals, all_times[:, 0])
plt.title("Mandelbrot Iterative runtime")
plt.xlabel("Number of iterations")
plt.ylabel("Runtime (s)")
plt.figure()
plt.plot(n_vals, all_times[:, 1])
plt.title("Mandelbrot Numpy runtime")
plt.xlabel("Number of iterations")
plt.ylabel("Runtime (s)")
Text(0, 0.5, 'Runtime (s)')
Our numpy vectorized implementation is fast enough that we can zoom in and explore the fractal structure of the Mandelbrot set.
plt.figure(figsize=(12, 6))
plt.imshow(np.angle(horizon_vals), cmap="PuBu")
plt.axis('off'); plt.tight_layout(pad=0) # remove all axes, padding, and ticks
plt.title("Mandelbrot Set Complex Angle of Converging initial conditions")
# define the x, y meshpoints at which we will compute the mandelbrot set
xx, yy = np.meshgrid(np.linspace(-0.385, -0.355, 600), np.linspace(-0.52, -0.55, 600))
zz = xx + 1j * yy
horizon_vals = mandelbrot_numpy(zz, maxiter=10*100)
plt.figure(figsize=(6, 6))
plt.imshow(np.abs(horizon_vals), cmap="PuBu")
plt.axis('off'); plt.tight_layout(pad=0) # remove all axes, padding, and ticks
plt.title("Mandelbrot Set (Zoomed In)")
plt.figure(figsize=(6, 6))
plt.imshow(np.angle(horizon_vals), cmap="PuBu")
plt.axis('off'); plt.tight_layout(pad=0) # remove all axes, padding, and ticks
plt.title("Mandelbrot Set Angle (Zoomed In)")
Text(0.5, 1.0, 'Mandelbrot Set Angle (Zoomed In)')
Some other useful built-ins for iterables¶
- zip and enumerate are not strictly necessary (there are other ways to do the same thing), but they can make our syntax a bit cleaner and more concise. Functions like this are sometimes called "syntactic sugar"
# enumerate provies a tuple of index and value
diffusivities = [0.1, 0.3, 0.6, 0.9]
weiner_process = lambda n, signma: np.cumsum(np.random.normal(0, signma, n))
blue = np.array([0.372549, 0.596078, 1])
for i, sigma in enumerate(diffusivities):
plt.plot(weiner_process(500, sigma), color=blue * (1 - i / len(diffusivities)))
plt.title(f"Diffusivity {sigma}")
# zip pairs values together
diffusivities = [0.1, 0.3, 0.6, 0.9]
weiner_process = lambda n, signma: np.cumsum(np.random.normal(0, signma, n))
colors = np.array([0.372549, 0.596078, 1]) * (1 - np.arange(0, 1, 0.25)[:, None])
for color, sigma in zip(colors, diffusivities):
plt.plot(weiner_process(500, sigma), color=color)
plt.title(f"Diffusivity {sigma}")
Generators¶
- Generators are functions with an internal "state" that serves as a memory
- Conceptually, they are intermediate between functions and custom objects
- Generators can be useful when you don't want to store an entire array in memory
def weiner_process(x, max_iter=1000):
for i in range(100):
yield x + np.random.normal(0, 1)
# next causes the generator to run until the next yield
process = weiner_process(0)
for i in range(50):
print(next(process))