Two months ago, I decided to quit my company and dedicate myself full-force at AI safety. The problems I had been working on were not inspiring me, and the actual work left me feeling like my brain was shrinking. Something had to change.
So far, this feels like one of the best decisions I've ever made.
I received an FTX future fund regrant for six months to transition to research. My plan for this period rests on three pillars: (1) technical upskilling in ML, (2) theoretical upskilling in AI safety, and (3) networking/community outreach.
Concretely, my plan is to (1) read lots of textbooks and follow online courses, (2) read lots of alignment forum and go through curricula (like Richard Ngo's AGI Safety Fundamentals and Dan Hendrycks's Intro to ML Safety), and (3) travel to events, apply to different fellowships, and complete small research projects.
A month and a half has gone by since I really started, which turns to be quite a lot of time. Enough that it's a good moment for a progress report and forecast.
Technical Upskilling in ML
- Mathematics for Machine Learning by Deisenroth, Faisal, and Ong (2020).
- This is a wonderful book. Clear, concise writing. Excellent visuals (color-coded with the corresponding formulas!). It hints at what Chris Olah might be able to do with the textbook genre if he got his hands on it.
- I've completed up to chapter 9 (that's the first half plus one chapter of the second half). I'll finish the book this month.
- Pattern Recognition and Machine Learning by Bishop (2006).
- This book is… okay. Sometimes. It leaves very much to be desired on the visualizing front, and in retrospect, I probably wouldn't recommend to it others. But it does provide a strong probabilistic supplement to a wider ML curriculum.
- I've done up to chapter 5 and skipped ahead to do chapter 9. I plan to go through the rest of the book for completeness. Even if many methods are not immediately relevant to the DL paradigm, a broad basis in statistics and probability theory certainly is. I'm most looking forward to the chapters on causal models (8), sampling techniques (11) and hidden Markov models (13). This should be done by mid-December.
- Cracking the Coding Interview by McDowell (2015)
- The widespread goodharting of leetcode is one of many reasons I'm afraid of AI. We just have to deal with it.
- I've completed chapters 1-7, with 10(-ish) to go. I'm aiming to be done with this by January.
I couldn't help myself and got some more textbooks. When I finish MML, I'll move on to Sutton and Barto's Reinforcement Learning. In December, I'll start on to Russell and Norvig's Artificial Intelligence: A Modern Approach. Now that I think about it, I should probably throw Goodfellow's Deep Learning in the mix.
Practical Deep Learning for Coders by Fast AI
- I began following this course but was disappointed by it, mostly because its level was too basic, and its methods were too applied. So I stopped following the course.
- ARENA Virtual
- Two weeks, a friend introduced me to ARENA Virtual, and I jumped on the opportunity. This program follows a curriculum based on Jacob Hilton's Curriculum, and it's much more my cup of tea. It assumes prior experience, goes much deeper, and is significantly higher-paced. It's also super motivating to work with others.
- This goes until late December.
Once ARENA is done, I might pick and choose from other online courses like OpenAI's Spinning Up, NYU's Deep Learning, etc. But I don't expect this to be necessary anymore, and it may even be counterproductive. ARENA + textbooks is likely to be enough to learn what I need. Any extra time can probably best go towards actual projects.
Theoretical Upskilling in AI Safety
- AGI Safety Fundamentals by Richard Ngo
- I'm going through this on my own and reading everything (the basics + supplementary material). I'm currently on week 7 of 8, so I'll finish this month.
- Intro to ML Safety by Dan Hendrycks
- As soon as I finish AGISF, I'll move on to this course.
Once I'm done with Intro to ML Safety, I'll go on to work through AGI Safety 201. In the meantime, I've also gone through lots of miscellaneous sequences: Value Learning, Embedded Agency, Iterated Amplification, Risks from Learned Optimization, Shard Theory, Intro to Brain-Like-AGI Safety, Basic Foundations for Agent Models, etc. I'm also working my way through AXRP and The Inside View for an informal understanding of various researchers.
Over the last two months, I've actually found myself becoming less doomer and developing longer timelines.1 In terms of where I see myself ending up: it's still interpretability with an uptick in interest for brain-flavored approaches (Shard Theory, Steven Byrnes). I picked up Evolutionary Psychology by David Buss and might pick up a neuroscience textbook one of these days. My ideal fit is still probably Anthropic.
Network & Outreach
- SERIMATS. The essay prompts were wonderful practice in honing my intuitions and clarifying my stance. I think my odds are good of getting in, and that this is the highest value thing I can currently do to speed up my transition into AI safety. The main downside is that SERIMATS includes an in-person component that will be in the Bay starting in January. That's sooner than I would move in an ideal world. But then I guess an ideal world has solved alignment. 🤷♂️
- REMIX (by Redwood). I'll be applying this week. This seems as good an opportunity as SERIMATS.
I received the advice to apply more often. To already send off applications to Anthropic, Redwood, etc. I think the attitude is right, but my current approach already sufficient. Let's check in when we hear back from these programs.
- I've also put together a research agenda (email me if you want the link). In it, I've begun dissecting how the research I did during my masters on toy models from theoretical neuroscience could inform novel research directions for interpretability and alignment. I'm starting a few smaller experiments to better understand the path-dependence of training.
- I've also started a collaboration with Diego Dorn to review the literature on representation learning and how to measure distance/similarity between different trained models.
I've decided to hold off on publishing what I've written up in my research agenda until I have more results. Some of the experiments are really low-hanging fruit, yet helpful to ground the ideas, so I figure it's better to wait a little and immediately provide the necessary context.
- I attended an AI Safety retreat organized by EA NL, which was not only lots of fun, but introduced me to lots of awesome people.
- I'll be attending EAGxRotterdam next week, and EAGxBerkeley in December. Even more awesome people coming soon.
- As a final note, I'm working with Hoog on a video about AI safety. It's going to be excellent.
More on why in a future post. ↩