2023-Q1

The last half year has been one of the most turbulent periods of my life. It's also been one of the best.

I quit the start-up that was sucking out my soul and rotting my intellect (Okay maybe that's a tad melodramatic). I started working on a problem I care about and reviving my brain. I found the community, mentors, and projects I'd been looking for. I started doing original work and advocating for a neglected area of research (singular learning theory). It's been pretty great.

Which makes it a great time for reflection and looking forward. What's in store for the rest of the year?

The last six months

Six months ago, I got an FTX Future Fund grant to do some upskilling. One of the conditions for receiving that grant was to write a reflection after the grant period (six months) expired. So, yes, that's part of my motivation for writing this post. Even if FTX did implode in the interim, and even if there is likely no one to read this, it's better to be safe than sorry.

A quick summary:

Reading: Mathematics for Machine Learning, Bishop, Cracking the Coding Interview, Sutton & Barto, Russell & Norvig, Watanabe, and lots of miscellaneous articles, sequences, etc.
Courses: Fast.ai (which I quit early because it was too basic), OpenAI's spinning up (abandoned in favor of other RL material), and ARENA (modeled after MLAB).
SERI MATS: An unexpected development was that I ended up participating in SERI MATS. For two months, I was in Berkeley with a cohort of others in a similar position as mine (i.e., transitioning to technical AI safety research).
Output: singular learning theory sequence & classical learning theory sequence.

It's been quite a lot more productive than I anticipated both in terms of input absorbed and output written. I also ended up with a position as a research assistant with David Krueger's lab.

The next six months

But we're not done yet. The next six months are shaping up to be the most busy in my life. As I like 'em.

Summit

I'm organizing a summit on SLT and alignment. My guess is that, looking back a few years from now, I will have accelerated this field by up to two years (compared to worlds in which I don't exist). The aim will be to foster research applying SLT within AI safety towards developing better interpretability tools, with specific attention given to detecting phase transitions.

Publications

So many projects. Unlike some, I think writing publications is actually a pretty decent goal to work to. You need some kind of legible output to work towards and that can serve as a finishing line.

In the order of most finished to least:

(SLT) The Shallow Reality of 'Deep Learning Theory': when I'm done writing the sequence on LessWrong, I'm going to work with Zach Furman and Mark Chiu Chong to turn this into something publishable.
Pattern-learning model: this is the project I'm currently working on with Lauro Langosco in the Krueger lab. The aim is to devise a simplified toy model of neural network training dynamics akin to Michaud et al.'s quantization model of neural scaling.
Neural (network) divergence: a project I'm working on with Samuel Knoche on reviewing and implementing the various ways people have come up with to compare different neural networks.
What are inductive biases, really?: a project I'm working on with Alexandra Bates to review all the existing literature on inductive biases and provide some much needed formalization.
(SLT) Singularities and dynamics: the aim is to develop toy models of the loss landscape in which to investigate the role of singularities on training dynamics.
Path dependence in NNs: this the project I started working on in SERI MATS. The idea is to study how small perturbations (to the weights or hyperparameters) grow over the course of training. There's a lot here, which is why it's taking quite some time to finish up.
(SLT) Phase detectors: a project I recently started during an Apart Hackathon, which explores how to detect "phase transitions" during training.

There's a lot here, which is why some of these projects (the last three) are currently parked.

(And to make it worse I've just accepted a part-time technical writing position.)

Career

What's next? After the summit? After wrapping up a few of these projects? After the research assistant position comes to a close (in the fall)?

Do I…

Get a job?
Start a PhD?
Start an organization?
Go for a century fellowship?

I'm leaning more and more to the last one (/two).

A job with Anthropic would be great, but I think I think I could accomplish more by pursuing a slightly different agenda and if I had a bit more slack to invest in learning.

Meanwhile, I think a typical PhD is too much lock-in, especially in the US where they might require me (with a physics background) to do an additional masters degree. As a century fellow, I'd be free to create my own custom PhD-like program. I'd spend some time in Australia with Daniel Murfet, in Boston with the Tegmark group, in New York with the Bowman lab, in London with Conjecture, in the Bay Area with everyone.

I think it's very likely that I'll end up starting a research organization focused on bringing SLT to alignment. That's going to take a slightly atypical path.