Random Walks in Higher Dimensions

So I was thinking about the dynamics of training, and looking for a way to model some phenomenon I had observed.

Initially, I had dismissed Brownian motion/random walk as an option because the trend was clearly linear — not square root. But then I was asked what Brownian motion looked like in higher dimensions and I panicked.

I had assumed that Brownian motion was universal — that dimensionality didn’t enter into the equation (for average distance from the origin as a function of time). But now I was doubting myself.

So I did a quick derivation, which I’m sharing because it gave me a new way to interpret the chi-squared distribution. Here’s the set-up: we have a walker on a $D$-dimensional (square) lattice. At every timestep, the walker chooses uniformly among the $2D$ available directions to take one step.

This means that every dimension is sampled on average once every $D$ timesteps, so we can treat a $D$-dimensional walker after $N$ timesteps as a collection of $D$ independent $1$-dimensional walkers after $N/D$ timesteps.

If the $1$-dimensional walker’s probability distribution is just a Gaussian centered at $0$ with standard deviation $\sqrt N$, then the probability for the $D$-dimensional walker is the product of $D$ univariate gaussians centered at $0$ with standard deviation $\sqrt{{N}/{D}}$.

But that means that the expected distance traveled by the $D$-dimensional walker, $\sqrt{\mathbb E[|\vec X|^2]}$, is just the square root of the chi-squared distribution,

Y=\sqrt {\sum_{i=1}^{D}X_{i}^{2}},

scaled by a factor $\sqrt{N/D}$. (This is not the chi distribution, which we’d want if we were taking the square root before the expectation.)

In this context, $D$ is the number of degrees of freedom. After multiplying the mean of the chi distribution by our factor $\sqrt{N/D}$, we get

\sqrt{\mathbb E[|\vec X|^{2}]}= \sqrt{D} \sqrt{\frac{N}{D}} = \sqrt{N}.

It’s independent of lattice dimensionality. Just as I originally thought before starting to doubt myself before going down this rabbit hole before reestablishing my confidence.

If that’s not enough for you, here are the empirical results.

Pasted image 20230108182300.png

def random_hyperwalk(n_dims: int, n_samples: int = 1_000, n_timesteps: int = 10_000) -> np.ndarray:
    state = np.zeros((n_samples, n_dims))
    distances = np.zeros((n_samples, n_timesteps))

    for t in tqdm(range(n_timesteps)):
        step = np.random.choice(np.arange(n_dims), n_samples)
        # print(step)
        directions = np.random.choice([-1, 1], n_samples)
        # print(directions)

        # TODO: vectorize this
        for i in range(n_samples):
            state[i, step[i]] += directions[i]

        # print(step, directions, state[:, step])
        distances[:, t] = np.linalg.norm(state, axis=1)

    return distances

def random_hyperwalk_scaling(n_dim_range: List[int], n_samples: int = 1_000, n_timesteps: int = 10_000):
    timesteps = np.arange(n_timesteps)
    predictions = np.sqrt(timesteps)
    trajectories = np.zeros((len(n_dim_range), n_samples, n_timesteps))

    fig, ax = plt.subplots(len(n_dim_range) // 2, 2, figsize=(10, 10))
    fig.tight_layout(pad=5.0)

    for I, n_dims in enumerate(n_dim_range):
        i, j = I // 2, I % 2

        trajectories[I, :, :] = random_hyperwalk(n_dims, n_samples, n_timesteps)
        mean_distances = np.mean(trajectories[I, :, :], axis=0)

        for k in range(n_samples):
            ax[i][j].plot(timesteps, trajectories[I, k, :], color="blue", alpha=0.01)

        ax[i][j].set_title(f"{n_dims} dimensions")
        ax[i][j].plot(timesteps, mean_distances, color="red")
        ax[i][j].plot(timesteps, predictions, color="black")
        ax[i][j].set_xlabel("Timesteps")
        ax[i][j].set_ylabel("L2 distance from origin")

    plt.show()


random_hyperwalk_scaling([5, 10, 20, 40, 80, 160], n_samples=1000, n_timesteps=1000)