Vectorization and arrays: Period-doubling bifurcations, and the Mandelbrot SetĀ¶
Preamble: Run the cells below to import the necessary Python packages
This notebook created by William Gilpin. Consult the course website for all content and GitHub repository for raw files and runnable online code.
# Import local plotting functions and in-notebook display functions
import matplotlib.pyplot as plt
%matplotlib inline
# Import numpy library for array manipulation
import numpy as np
External librariesĀ¶
A major benefit of the Python ecosystem is the ability to easily import external libraries. This is done with the import
command. For example, the following imports the numpy
library, which contains many useful functions for scientific computing.
import numpy as np
Common libraries you might use in the natural sciences include
- matplotlib (plotting)
- numpy (linear algebra; lots of MATLAB functions)
- scipy (common scientific algorithms)
- scikit-learn (machine learning)
- networkx (making graphs)
Major libraries you might use in data science include
- numba (just-in-time compilation of numpy-based code)
- pandas (large dataset organization; lots of R functionality)
- statsmodels (statistics; lots of R functionality)
- seaborn (plotting; alternative to matplotlib with API similar to R's ggplot)
- PyTorch (deep learning)
- JAX (deep learning)
Typical usage (in terminal): install
conda activate myenvironment
conda install pandas
When conda fails, or the package is not available, use pip
pip install pandas
Then either launch launch JupyterLab from terminal, or open VSCode app and select the appropriate environment in the upper-right hand corner
import matplotlib.pyplot as plt # Plotting library
import numpy as np # Linear algebra library
import scipy # General scientific computing library
Arrays, indexing, broadcasting, vectorizationĀ¶
Numpy is technically an external library, but we will use it so often in scientific computing that it's basically a built-in data type for our purposes.
Suite of support for the
np.ndarray
datatype, which represents tensors. Most MATLAB functions are within numpy
import numpy as np
a = np.arange(10)
print(a)
type(a)
[0 1 2 3 4 5 6 7 8 9]
numpy.ndarray
a = np.random.random((2, 3))
print(a)
a = np.identity(7)
print(a)
[[1. 0. 0. 0. 0. 0. 0.] [0. 1. 0. 0. 0. 0. 0.] [0. 0. 1. 0. 0. 0. 0.] [0. 0. 0. 1. 0. 0. 0.] [0. 0. 0. 0. 1. 0. 0.] [0. 0. 0. 0. 0. 1. 0.] [0. 0. 0. 0. 0. 0. 1.]]
The logistic mapĀ¶
The logistic map is a polynomial mapping (equivalently, recurrence relation) of degree 2, often cited as an archetypal example of how complex, chaotic behaviour can arise from very simple non-linear dynamical equations.
The map is defined by:
$$x_{n+1} = r x_n (1 - x_n)$$
where the dynamical variable $x_n \in (0, 1)$ is a number between zero and one. The parameter $r$ represents a growth rate, and varies between zero and four. For $r$ values between zero and one, the population always dies out. However, as $r$ increases past one, a bifurcation occurs at $r=3$, where the population begins to oscillate between two values. As $r$ increases further, period doubling occurs, leading to four values, then eight, and so on, until finally chaos occurs, and the population takes on seemingly random values between zero and one.
# logistic map
import numpy as np
import matplotlib.pyplot as plt
def logistic_map(x, r=3.7):
"""
The logistic map, a discrete-time dynamical system.
Args:
x (float): a number in [0, 1] denoting the current state
r (float): a positive parameter that controls the chaoticity of the dynamics
"""
return r * x * (1 - x)
r = 3.2
x = [0.65]
for i in range(1, 100):
x.append(logistic_map(x[i-1], r))
plt.figure()
plt.plot(x, 'k')
plt.xlabel('Iteration')
plt.ylabel('x')
r = 3.8
x = [0.65]
for i in range(1, 100):
x.append(logistic_map(x[i-1], r))
plt.figure()
plt.plot(x, 'k')
plt.xlabel('Iteration')
plt.ylabel('x')
Text(0, 0.5, 'x')
Runtime complexity and dynamical systemsĀ¶
A common question we might ask for an algorithm is: How does the runtime scale with the input size? What about the memory usage?
These are sometimes called the asymptotic time and space complexities of an algorithm.
The asymptotic complexity of an algorithm is independent of the hardware and software environment. It is a representationn-invariant mathematical property of the algorithm.
import numpy as np
def simulate_logistic(n, r=3.8, x0=0.65):
"""
Simulate the logistic map for n iterations using numpy and preallocation
Args:
n (int): The number of iterations of the logistic map
"""
x = np.zeros(n) # Preallocate an array of zeros
x[0] = x0
for i in range(1, n):
prev = x[i - 1]
x[i] = r * prev * (1 - prev)
return x
# x = simulate_logistic(1000)
# plt.plot(x, 'k')
%timeit simulate_logistic(1000)
188 Āµs Ā± 3.88 Āµs per loop (mean Ā± std. dev. of 7 runs, 10,000 loops each)
import timeit
n_vals = np.arange(1, 6000, 5)
# n_vals = np.logspace(1, 5, 50, dtype=np.int32)
times = []
for n in n_vals:
times.append(timeit.timeit("simulate_logistic(n)", globals=globals(), number=10))
plt.figure()
plt.plot(n_vals, times)
plt.title("Sum Iterative runtime")
plt.xlabel("Number of iterations")
plt.ylabel("Runtime (s)")
Text(0, 0.5, 'Runtime (s)')
Notice that in the asymptotic regime the runtime scales linearly.
Why is the runtime $\sim N$? We only "touch" each element of our solution array twice---once to calculate it, and once to update the next state. We don't need to look back at all the previous states.
The memory usage is also $\sim N$ because we are storing the full, cumulative history in order to plot the trajectory. If we only wanted to plot the final state, we could get away with only storing the final state. In this case, the memory would be $\sim 1$.
Now that we have a fast implementation of the logistic map, we can explore its properties in more detail.
We will create a bifurcation diagram, which shows the values of $x_n$ for each value of $r$.
def bifurcation_diagram(dmap, x0, pvals, n=1000, transient=0.5):
"""
Create a bifurcation diagram of a discrete map.
Args:
dmap (callable): A function that takes two arguments, x and p, and returns the next
value of x in the discrete map.
x0 (float): The initial condition of the system.
pvals (list): A list of parameter values.
n (int): The number of iterations of the map to run for each parameter value.
transient (float): The proportion of the initial iterations to discard as transient.
Returns:
xvals (list): A list of lists of x values.
"""
n_discard = int(transient * n)
all_xvals = []
for p in pvals:
vals = simulate_logistic(n, r=p, x0=x0)
all_xvals.append(vals[n_discard:])
return all_xvals
rvals = np.linspace(2.5, 4.0, 1000)
xvals = bifurcation_diagram(logistic_map, 0.65, rvals)
for r, x in zip(rvals, xvals):
plt.plot([r] * len(x), x, '.k', markersize=0.05)
plt.xlim(rvals[0], rvals[-1])
plt.ylim(0, 1)
plt.xlabel('r')
plt.ylabel('x')
Text(0, 0.5, 'x')
The logistic map follows the celebrated period-doubling route to chaos. As the parameter $r$ increases, the period of the oscillations doubles, until chaos occurs at $r \approx 3.57$.
Notice how the spacing in $r$ between the period-doubling bifurcations decreases as $r$ increases. The geometric rate of decrease in the spacing between bifurcations is a constant value called the Feigenbaum constant.
Slicing and indexing numpy arraysĀ¶
- These are extremely useful syntax for manipulating arrays in Python
SlicingĀ¶
- We will use slicing to create a return map of the logistic map. This is a plot of $x_{n+1}$ vs. $x_n$.
traj = simulate_logistic(1000)
# plt.plot(traj)
plt.figure()
plt.plot(traj[1:], traj[:-1], '.k', alpha=0.05)
plt.xlabel('Iteration')
plt.ylabel('Next Iteration')
Text(0, 0.5, 'Next Iteration')
plt.figure()
plt.plot(traj[50:], traj[:-50], '.k', alpha=0.05)
plt.xlabel('Iteration')
plt.ylabel('Next Next Iteration')
Text(0, 0.5, 'Next Next Iteration')
Strided slicingĀ¶
We can use strided slicing to take every $k$th element of an array.
To illustrate this effect, let's first simulate the Logistic map for several values of the bifurcation parameter $r$. We will store all of these trajectories
# simulate before and after period doubling
# Take last 50 iterations to remove transient
traj2 = simulate_logistic(500, r=3)[-50:]
plt.figure()
plt.plot(traj2, 'k')
plt.xlabel('Iteration')
plt.ylabel('x')
traj4 = simulate_logistic(500, r=3.5)[-50:]
plt.figure()
plt.plot(traj4, 'k')
plt.xlabel('Iteration')
plt.ylabel('x')
traj8 = simulate_logistic(1000, r=3.56)[-50:]
plt.figure()
plt.plot(traj8, 'k')
plt.xlabel('Iteration')
plt.ylabel('x')
traj_inf = simulate_logistic(500)[-50:]
plt.figure()
plt.plot(traj_inf, 'k')
plt.xlabel('Iteration')
plt.ylabel('x')
Text(0, 0.5, 'x')
These trajectories look pretty different, but what happens if we resample them?
We will use strided slicing to take every nth element of each array
# plt.figure(figsize=(6, 2))
# plt.plot(traj2)
# plt.figure(figsize=(6, 2))
# plt.plot(traj4[::1])
# plt.figure(figsize=(6, 2))
# plt.plot(traj8[::8])
plt.figure(figsize=(6, 2))
plt.plot(traj_inf[::4])
[<matplotlib.lines.Line2D at 0x17d4d0e50>]
plt.figure(figsize=(6, 2))
plt.plot(traj_inf[::1])
plt.figure(figsize=(6, 2))
plt.plot(traj_inf[::-1])
[<matplotlib.lines.Line2D at 0x17d13cd90>]
Boolean Selection with numpy arraysĀ¶
We can index into Python arrays using a Boolean array of the same shape
These will select out elements of the outer array whereever the Boolean array is True
The resulting array have a size proportional to the number of True elements in the Boolean array
a = np.random.random((5, 5))
plt.figure()
plt.imshow(a, cmap='Blues', vmin=0)
print(a)
# print((a < 0.5))
ind = (a < 0.5)
print(ind)
a[ind] = 0
plt.figure()
plt.imshow(a, cmap='Blues', vmin=0)
[[0.03706901 0.3868108 0.80144563 0.11931178 0.62020471] [0.29632615 0.47796229 0.70041659 0.85303676 0.83618991] [0.17743535 0.3720211 0.74583951 0.61744053 0.90010932] [0.39727078 0.21999045 0.53016913 0.94071539 0.91632947] [0.95495071 0.30877553 0.7893046 0.76895121 0.2782278 ]] [[ True True False True False] [ True True False False False] [ True True False False False] [ True True False False False] [False True False False True]]
<matplotlib.image.AxesImage at 0x17ce07510>
select_inds = traj > 0.5
# print(select_inds)
# nvals = np.arange(len(traj))
nvals = np.arange(len(traj))
plt.figure()
plt.plot(nvals, traj, 'k')
plt.xlabel('Iteration')
plt.ylabel('Next Iteration')
plt.plot(nvals[select_inds], traj[select_inds], '.r')
[<matplotlib.lines.Line2D at 0x14bf02c50>]
Now, let's use Boolean selection to select the points corresponding to when the logistic map crosses the point $x = 0.5$.
We will first threshold the trajectory into a series of 0s and 1s corresponding to whether the trajectory is above or below 0.5.
We then use
np.diff
, which calculates the difference between subsequent elements of the logistic map, in order to find places where the map crosses the threshold.
## We want to perform a longer simulation
traj = simulate_logistic(5000)
where_big = (traj > 0.5).astype(int)
where_change = np.diff(where_big)
changepoints = (where_change != 0)
changepoint_inds = np.where(changepoints)[0]
time_between_changes = np.diff(changepoint_inds)
plt.figure()
plt.hist(time_between_changes, bins=20);
plt.xlabel('Time between changes')
plt.ylabel('Frequency')
Text(0, 0.5, 'Frequency')