# Random Walks in Higher Dimensions

So I was thinking about the dynamics of training, and looking for a way to model some phenomenon I had observed.

Initially, I had dismissed Brownian motion/random walk as an option because the trend was clearly linear — not square root. But then I was asked what Brownian motion looked like in higher dimensions and I panicked.

I had assumed that Brownian motion was universal — that dimensionality didn't enter into the equation (for average distance from the origin as a function of time). But now I was doubting myself.

So I did a quick derivation, which I'm sharing because it gave me a new way to interpret the chi-squared distribution. Here's the set-up: we have a walker on a $D$-dimensional (square) lattice. At every timestep, the walker chooses uniformly among the $2D$ available directions to take one step.

This means that every dimension is sampled on average once every $D$ timesteps, so we can treat a $D$-dimensional walker after $N$ timesteps as a collection of $D$ independent $1$-dimensional walkers after $N/D$ timesteps.

If the $1$-dimensional walker's probability distribution is just a Gaussian centered at $0$ with standard deviation $\sqrt N$, then the probability for the $D$-dimensional walker is the product of $D$ univariate gaussians centered at $0$ with standard deviation $\sqrt{{N}/{D}}$.

But that means that the expected distance traveled by the $D$-dimensional walker, $\sqrt{\mathbb E[|\vec X|^2]}$, is just the square root of the chi-squared distribution,

$Y=\sqrt {\sum_{i=1}^{D}X_{i}^{2}},$

scaled by a factor $\sqrt{N/D}$. (This is not the chi distribution, which we'd want if we were taking the square root before the expectation.)

In this context, $D$ is the number of degrees of freedom. After multiplying the mean of the chi distribution by our factor $\sqrt{N/D}$, we get

$\sqrt{\mathbb E[|\vec X|^{2}]}= \sqrt{D} \sqrt{\frac{N}{D}} = \sqrt{N}.$

It's independent of lattice dimensionality. Just as I originally thought before starting to doubt myself before going down this rabbit hole before reestablishing my confidence.

If that's not enough for you, here are the empirical results.

def random_hyperwalk(n_dims: int, n_samples: int = 1_000, n_timesteps: int = 10_000) -> np.ndarray:
state = np.zeros((n_samples, n_dims))
distances = np.zeros((n_samples, n_timesteps))

for t in tqdm(range(n_timesteps)):
step = np.random.choice(np.arange(n_dims), n_samples)
# print(step)
directions = np.random.choice([-1, 1], n_samples)
# print(directions)

# TODO: vectorize this
for i in range(n_samples):
state[i, step[i]] += directions[i]

# print(step, directions, state[:, step])
distances[:, t] = np.linalg.norm(state, axis=1)

return distances

def random_hyperwalk_scaling(n_dim_range: List[int], n_samples: int = 1_000, n_timesteps: int = 10_000):
timesteps = np.arange(n_timesteps)
predictions = np.sqrt(timesteps)
trajectories = np.zeros((len(n_dim_range), n_samples, n_timesteps))

fig, ax = plt.subplots(len(n_dim_range) // 2, 2, figsize=(10, 10))

for I, n_dims in enumerate(n_dim_range):
i, j = I // 2, I % 2

trajectories[I, :, :] = random_hyperwalk(n_dims, n_samples, n_timesteps)
mean_distances = np.mean(trajectories[I, :, :], axis=0)

for k in range(n_samples):
ax[i][j].plot(timesteps, trajectories[I, k, :], color="blue", alpha=0.01)

ax[i][j].set_title(f"{n_dims} dimensions")
ax[i][j].plot(timesteps, mean_distances, color="red")
ax[i][j].plot(timesteps, predictions, color="black")
ax[i][j].set_xlabel("Timesteps")
ax[i][j].set_ylabel("L2 distance from origin")

plt.show()

random_hyperwalk_scaling([5, 10, 20, 40, 80, 160], n_samples=1000, n_timesteps=1000)