# 2022-M11

It's been three months since I decided to pivot to AI safety. Who knew three months can be such a long time.

FTX

I got an FTX future fund regrant for six months to help make the switch. Then FTX imploded, and it turns out my grant may be clawed back during the bankruptcy proceedings. Unfortunate. On the bright side1, FTX seems to have been such a mess that it could take years for the process to get to me. So if you have short enough timelines…2

Courses

• The ARENA virtual program is going along smoothly.
• This month, I joined SERI MATS to get my hands dirty in research. I'm officially under Evan Hubinger's Deceptive AI stream (though I'm also participating in John Wentworth's workshops). Yes, I'm reaching the limits of what I can juggle. We're also working our way through the Alignment 201 curriculum.3
• When ARENA finishes up, I'm going to dedicate more of my attention to metauni (especially their track on singular learning theory (SLT)).
• I've also finished most of the miscellaneous Alignment Forum sequences I wanted to go through (as well as AXRP and The Inside View).

The SERI MATS research sprint is about to start, and I'll be in Berkeley from January to at least February to work on this in person. Safe to say, I have way too many ideas for research projects, but I'm planning to focus on Toy Models of Superposition.

Distillation

• I'm working on an introduction to SLT that should be out soon.
• The video I'm working on with Hoog on AI risk is a little delayed because OpenPhil funding was paused (and I'm a little overextended but don't want to admit it), but it is coming along.
• Next to all that, I've started working on an online, interactive AI safety textbook (very much a work in progress, more coming soon)4.

Textbooks

• Mathematics for Machine Learnings: Done. Great book. Highly recommend.
• Pattern Recognition and Machine Learning: Completed up through chapter 9. I have 5 chapters to go. Instead of trying to bang these out in the next month to meet my original deadline, I'm going to push back my deadline by two weeks, so I have more time during the research sprint.
• Cracking the Coding Interview: 6 more chapters to go. Like PRML, I'm going to push back my original deadline 2 weeks to mid-January.4
• Reinforcement Learning: I've already gone through 6/17 chapters ahead of schedule. I'm aiming to be done by April.
• Artificial Intelligence: A Modern Approach: Here too, I'm 3 chapters in. Ahead of when I originally planned to get this started. This is a big book, so my (self-enforced) deadline is May 1st.

Outreach

• I'm writing this on a plane to the EAGxBerkeley. Judging from EAGxRotterdam, EAGxBerkeley is going to be great (as long as they leave out the food poisoning part).
• I've started laying out some feelers for a longer term project of establishing an AI-safety-oriented company in mainland Europe. There's a lot of early-career interest among very smart students. Give them a few years, and mainland Europe will be ripe for a new organization like Anthropic, Redwood, or Conjecture.

Conclusion

The next month is going to be hectic. I'll be in the Bay area for a week and a half, then my parents' place in NY state for a week, then Michigan at my girlfriend's fiancée's dad's, then NY for another week. Oh yeah, did I mention? I got engaged!5 On the day I publish this, it's our 5-year anniversary. Robin, I love you, and I can't wait to spend the rest of our lives together. (However long that is, doomer.)

## Footnotes

1. For me. I think it's safe to say the inconvenience for me is less than the inconvenience for people who lost their life savings.

2. My timelines aren't actually that short. But I'm not worried about eventually being able to pay this back (even very soon with the SERI MATS stipend).

3. I've come to conclude that I can safely skip Intro to ML Safety for now. Much of the content overlaps with these other programs/textbooks.

4. I'm probably a bit newer to the field than would be ideal for this task, so I'm hoping to migrate to a more editorial role, delegating the bits that I can. I think my main strength here is more a kind of Olahian interactive distillation. That's an ability which seems to be pretty rare among active researchers. 2

5. I proposed with an Oura ring, which definitely says something about the kind of people we are. Now that I think about it, I should have probably asked for a sponsorship and gotten the whole wedding funded by a late-stage capitalism PR departments, but hey hindsight is 20/20.

# 2022-M10

Two months ago, I decided to quit my company and dedicate myself full-force at AI safety. The problems I had been working on were not inspiring me, and the actual work left me feeling like my brain was shrinking. Something had to change.

So far, this feels like one of the best decisions I've ever made.

I received an FTX future fund regrant for six months to transition to research. My plan for this period rests on three pillars: (1) technical upskilling in ML, (2) theoretical upskilling in AI safety, and (3) networking/community outreach.

Concretely, my plan is to (1) read lots of textbooks and follow online courses, (2) read lots of alignment forum and go through curricula (like Richard Ngo's AGI Safety Fundamentals and Dan Hendrycks's Intro to ML Safety), and (3) travel to events, apply to different fellowships, and complete small research projects.

A month and a half has gone by since I really started, which turns to be quite a lot of time. Enough that it's a good moment for a progress report and forecast.

# Technical Upskilling in ML

Textbooks

• Mathematics for Machine Learning by Deisenroth, Faisal, and Ong (2020).
• This is a wonderful book. Clear, concise writing. Excellent visuals (color-coded with the corresponding formulas!). It hints at what Chris Olah might be able to do with the textbook genre if he got his hands on it.
• I've completed up to chapter 9 (that's the first half plus one chapter of the second half). I'll finish the book this month.
• Pattern Recognition and Machine Learning by Bishop (2006).
• This book is… okay. Sometimes. It leaves very much to be desired on the visualizing front, and in retrospect, I probably wouldn't recommend to it others. But it does provide a strong probabilistic supplement to a wider ML curriculum.
• I've done up to chapter 5 and skipped ahead to do chapter 9. I plan to go through the rest of the book for completeness. Even if many methods are not immediately relevant to the DL paradigm, a broad basis in statistics and probability theory certainly is. I'm most looking forward to the chapters on causal models (8), sampling techniques (11) and hidden Markov models (13). This should be done by mid-December.
• Cracking the Coding Interview by McDowell (2015)
• The widespread goodharting of leetcode is one of many reasons I'm afraid of AI. We just have to deal with it.
• I've completed chapters 1-7, with 10(-ish) to go. I'm aiming to be done with this by January.

I couldn't help myself and got some more textbooks. When I finish MML, I'll move on to Sutton and Barto's Reinforcement Learning. In December, I'll start on to Russell and Norvig's Artificial Intelligence: A Modern Approach. Now that I think about it, I should probably throw Goodfellow's Deep Learning in the mix.

Courses

• Practical Deep Learning for Coders by Fast AI
• I began following this course but was disappointed by it, mostly because its level was too basic, and its methods were too applied. So I stopped following the course.
• ARENA Virtual
• Two weeks, a friend introduced me to ARENA Virtual, and I jumped on the opportunity. This program follows a curriculum based on Jacob Hilton's Curriculum, and it's much more my cup of tea. It assumes prior experience, goes much deeper, and is significantly higher-paced. It's also super motivating to work with others.
• This goes until late December.

Once ARENA is done, I might pick and choose from other online courses like OpenAI's Spinning Up, NYU's Deep Learning, etc. But I don't expect this to be necessary anymore, and it may even be counterproductive. ARENA + textbooks is likely to be enough to learn what I need. Any extra time can probably best go towards actual projects.

# Theoretical Upskilling in AI Safety

Courses

• AGI Safety Fundamentals by Richard Ngo
• I'm going through this on my own and reading everything (the basics + supplementary material). I'm currently on week 7 of 8, so I'll finish this month.
• Intro to ML Safety by Dan Hendrycks
• As soon as I finish AGISF, I'll move on to this course.

Once I'm done with Intro to ML Safety, I'll go on to work through AGI Safety 201. In the meantime, I've also gone through lots of miscellaneous sequences: Value Learning, Embedded Agency, Iterated Amplification, Risks from Learned Optimization, Shard Theory, Intro to Brain-Like-AGI Safety, Basic Foundations for Agent Models, etc. I'm also working my way through AXRP and The Inside View for an informal understanding of various researchers.

Over the last two months, I've actually found myself becoming less doomer and developing longer timelines.1 In terms of where I see myself ending up: it's still interpretability with an uptick in interest for brain-flavored approaches (Shard Theory, Steven Byrnes). I picked up Evolutionary Psychology by David Buss and might pick up a neuroscience textbook one of these days. My ideal fit is still probably Anthropic.

# Network & Outreach

Programs

• SERIMATS. The essay prompts were wonderful practice in honing my intuitions and clarifying my stance. I think my odds are good of getting in, and that this is the highest value thing I can currently do to speed up my transition into AI safety. The main downside is that SERIMATS includes an in-person component that will be in the Bay starting in January. That's sooner than I would move in an ideal world. But then I guess an ideal world has solved alignment. 🤷‍♂️
• REMIX (by Redwood). I'll be applying this week. This seems as good an opportunity as SERIMATS.

I received the advice to apply more often. To already send off applications to Anthropic, Redwood, etc. I think the attitude is right, but my current approach already sufficient. Let's check in when we hear back from these programs.

Research

• I've also put together a research agenda (email me if you want the link). In it, I've begun dissecting how the research I did during my masters on toy models from theoretical neuroscience could inform novel research directions for interpretability and alignment. I'm starting a few smaller experiments to better understand the path-dependence of training.
• I've also started a collaboration with Diego Dorn to review the literature on representation learning and how to measure distance/similarity between different trained models.

I've decided to hold off on publishing what I've written up in my research agenda until I have more results. Some of the experiments are really low-hanging fruit, yet helpful to ground the ideas, so I figure it's better to wait a little and immediately provide the necessary context.

Networking

• I attended an AI Safety retreat organized by EA NL, which was not only lots of fun, but introduced me to lots of awesome people.
• I'll be attending EAGxRotterdam next week, and EAGxBerkeley in December. Even more awesome people coming soon.

Miscellaneous

• As a final note, I'm working with Hoog on a video about AI safety. It's going to be excellent.

## Footnotes

1. More on why in a future post.

# 2022-Q3

A lot has changed for me in the past month. My partner and I decided to close the business we had started together, and I've thrown myself full-force at AI safety.

We weren't seeing the traction we needed, I was nearing the edge of burnout (web development is not the thing for me1), and, at the end of the day, I did not care enough about our users. It's hard to stay motivated to help a few patients today when you think there's a considerable risk that the world might end tomorrow. And I think the world might end soon — not tomorrow, but more likely than not in the next few decades.2 At some point, I reached a point where I could no longer look away, and I had to do something.

So I reached out to the 80,000 hours team, who connected me to people studying AI safety in my area, and helped me apply to the FTX Future Fund Regranting Program for a six-month upskilling grant to receive $25,000 for kickstarting my transition to AI. Now, I'm not a novice (my Bachelors and Masters theses applied techniques from statistical physics to understand neural networks), but I could definitely use the time to refresh & catch up on the latest techniques. A year is a long time in AI. Next to "upskilling" in ML proper, I need the time to dive deep into AI safety: there's overlap with the conventional ML literature, but there's also a lot of unfamiliar material. Finally, I need time to brush up my CV and prepare to apply to AI labs and research groups. My current guess is that I'll be best-suited to empirical/interpretability research, which I think is likely to be compute-constrained. Thus, working at a larger lab is crucial. That's not to mention the benefits of working alongside people smarter than you are. Unfortunately (for me), the field is competitive, and a "gap year" in an unrelated field after your masters is likely to be perceived as a weakness. There's a signaling game at hand, and it's play or be played. To sum, spending time on intangibles like "networking" and tangibles like "publications"3 will be a must. To keep myself focused throughout the next half year, I'll be keeping track of my goals and progress here. To start, let's take a look at my current plan for the next half year. ## Learning Plan Like all good plans, this plan consists of three parts: 1. Mathematics/Theory of ML 2. Implementation/Practice of ML 3. AI Safety There's also an overarching theme of "community-building" (i.e., attending EAGs and other events in the space) and of "publishing". ### Resources Textbooks • Mathematics for Machine Learning by Deisenroth, Faisal, and Ong (2020). • I was told that this book is predominantly important for its first half, but I'm ready to consume it in full. • Pattern Recognition and Machine Learning by Bishop (2006) • I was advised to focus on chapter 1-5 and 9, but I'm aiming to at least skim the entirety. • Cracking the Coding Interview by McDowell (2015) • One specification I'm going to have to game is the interview. I'm also taking this as an opportunity to master Rust, as I think having a solid understanding of low-level systems programming is going to be an important enabler when working with large models. ML/DL Courses There are a bunch more, but these are the only ones I'm currently committing to finishing. The rest can serve as supplementary material after. AI Safety Courses Miscellaneous ### Publishing I'm not particularly concerned about publishing to prestigious journals, but getting content out there will definitely help. Most immediately, I'm aiming to convert / upgrade my Masters thesis to an AI Safety/Interpretability audience. I'm intrigued by the possibility that perspectives like the Lyapunov spectrum can help us enforce constraints like "forgetfulness" (which may be a stronger condition than myopia), analyze the path-dependence of training, and detect sensitivity to adversarial attacks / improbable inputs, that random matrix theory might offer novel ways to analyze the dynamics of training, and, more generally, that statistical physics is an un(der)tapped source of interpretability insight. In some of these cases, I think it's likely that I can come to original results within the next half year. I'm going to avoid overcommitting to any particular direction just yet, as I'm sure my questions will get sharper with my depth in the field. Next to this, I'm reaching out to several researchers in the field and offering myself up as a research monkey. I trust that insiders will have better ideas than I can form as of yet, but not enough resources to execute (in particular, I'm talking about PhD students), and that if I make myself useful, karma will follow. ## Timeline Over the next three months, my priority is input — to complete the textbooks and courses mentioned above (which means taking notes, making flashcards, doing exercises). Over the subsequent three months, my priority is output — to publish & apply. Of course, this is simplifying; research is a continuous process: I'll start to produce output before the next three months is up & I'll continue to absorb lots of input when the three months is up. Still, heuristics are useful. I'll be checking in here on a monthly basis — reviewing my progress over the previous month & updating my goals for the next month. Let's get the show off the road. ### Month 1 (October) Highlights ## Footnotes 1. At least not as a full-time occupation. I like creating things, but I also like actually using my brain, and too much of web development is mindless twiddling (even post-Copilot). 2. More on why I think this soon. 3. Whether in formal journals or informal blogs. 4. I'm including less formal / "easier" sources because I need some fallback fodder (for when my brain can no longer handle the harder stuff) that isn't Twitter or Hacker News. # No, human brains are not more efficient than computers Epistemic status: grain of salt. There's lots of uncertainty in how many FLOP/s the brain can perform. In informal debate, I've regularly heard people say something like, "oh but brains are so much more efficient than computers" (followed by a variant of "so we shouldn't worry about AGI yet"). Putting aside the weakly argued AGI skepticism, brains actually aren't all that much more efficient than computers (at least not in any way that matters). The first problem is that these people are usually comparing the energy requirements of training large AI models to the power requirements of running the normal waking brain. These two things don't even have the same units. The only fair comparison is between the trained model and the waking brain or between training the model and training the brain. Training the brain is called evolution, and evolution isn't particularly known for its efficiency. Let's start with the easier comparison: a trained model vs. a trained brain. Joseph Carlsmith estimates that the brain delivers roughly $1$ petaFLOP/s (=$10^{15}$ floating-point operations per second)1. If you eat a normal diet, you're expending roughly $10^{-13}$ J/FLOP. Meanwhile, the supercomputer Fugaku delivers $450$ petaFLOP/s at $30$ MW, which comes out to about $10^{-11.5}$ J/FLOP…. So I was wrong? Computers require almost $500$ times more energy per FLOP than humans? $\frac{\text{Supercomputer J}/\text{FLOP}}{\text{Human J} /\text{FLOP}}$ What this misses is an important practical point: supercomputers can tap pretty much directly into sunshine; human food calories are heavily-processed hand-me-downs. We outsource most of our digestion to mother nature and daddy industry. Even the most whole-foods-grow-your-own-garden vegan is $2$-$3$ orders of magnitude less efficient at capturing calories from sunlight than your average device2. That's before animal products, industrial processing, or any of the other Joules it takes to run a modern human. After this correction, humans and computers are about head-to-head in energy/FLOP, and it's only getting worse for us humans. The fact that the brain runs on so little actual juice suggests there's plenty of room left for us to explore specialized architectures, but it isn't the damning case many think it is. (We're already seeing early neuromorphic chips out-perform neurons' efficiency by four orders of magnitude.) $\frac{\text{Electronic efficiency}}{\text{Biological efficiency}}$ But what about training neural networks? Now that we know the energy costs per FLOP are about equal, all we have to do is compare FLOPs required to evolve brains to the FLOPs required to train AI models. Easy, right? Here's how we'll estimate this: 1. For a given, state-of-the-art NN (e.g., GPT-3, PaLM), determine how many FLOP/s it performs when running normally. 2. Find a real-world brain which performs a similar number of FLOP/s. 3. Determine how long that real-world brain took to evolve. 4. Compare the number of FLOPs (not FLOP/s) performed during that period to the number of FLOPs required to train the given AI. Fortunately, we can piggyback off the great work done by Ajeya Cotra on forecasting "Transformative" AI. She calculates that GPT-3 performs about $10^{12}$ FLOP/s3, or about as much as a bee. Going off Wikipedia, social insects evolved only about 150 million years ago. That translates to between $10^{38}$ and $10^{44}$ FLOPs. GPT-3, meanwhile, took about $10^{23.5}$ FLOPs. That means evolution is $10^{15}$ to $10^{22}$ times less efficient. $\log_{10}\left(\text{total FLOPs to evolve bee brains}\right)$ Now, you may have some objections. You may consider bees to be significantly more impressive than GPT-3. You may want to select a reference animal that evolved earlier in time. You may want to compare unadjusted energy needs. You may even point out the fact that the Chinchilla results suggest GPT-3 was "significantly undertrained". Object all you want, and you still won't be able to explain away the >$15$ OOM gap between evolution and gradient descent. This is no competition. What about other metrics besides energy and power? Consider that computers are about 10 million times faster than human brains. Or that if the human brain can store a petabyte of data, S3 can do so for about$20,000 (2022). Even FLOP for FLOP, supercomputers already underprice humans.4 There's less and less for us to brag about it.

$\frac{\/(\text{Human FLOP/s})}{\/(\text{Supercomputer FLOP}/s)}$

Brain are not magic. They're messy wetware, and hardware will catch up has caught up.

Postscript: brains actually might be magic. Carlsmith assigns less than 10% (but non-zero) probability that the brain computes more than $10^{21}$ FLOP/s. In this case, brains would currently still be vastly more efficient, and we'd have to update in favor of additional theoretical breakthroughs before AGI.

If we include the uncertainty in brain FLOP/s, the graph looks more like this:

$\frac{\text{Supercomputer J}/\text{FLOP}}{\text{Human J} /\text{FLOP}}$

(With a mean of ~$10^{19}$ and median of $830$.)

# Appendix

Squiggle snippets used to generate above graphs. (Used in conjunction with obsidian-squiggle).

brainEnergyPerFlop = {
humanBrainFlops = 15; //10 to 23;	// Median 15; P(>21) < 10%
humanBrainFracEnergy = 0.2;
humanEnergyPerDay = 8000 to 10000; // Daily kJ consumption
humanBrainPower = humanEnergyPerDay / (60 * 60 * 24); // kW
humanBrainPower * 1000 / (10 ^ humanBrainFlops) // J / FLOP
}

supercomputerEnergyPerFlop = {
// https://www.top500.org/system/179807/
power = 25e6 to 30e6; // J
flops = 450e15 to 550e15;
power / flops
}

supercomputerEnergyPerFlop / brainEnergyPerFlop
humanFoodEfficiency = {
photosynthesisEfficiency = 0.001 to 0.03
trophicEfficiency = 0.1 to 0.15
photosynthesisEfficiency * trophicEfficiency
}

computerEfficiency = {
solarEfficiency = 0.15 to 0.20
transmissionEfficiency = 1 - (0.08 to .15)
solarEfficiency * transmissionEfficiency
}

computerEfficiency / humanFoodEfficiency
evolution = {
// Based on Ayeja Cotra's "Forecasting TAI with biological anchors"
// All calculations are in log space.

secInYear = log10(365 * 24 * 60 * 60);

// We assume that the average ancestor pop. FLOP per year is ~constant.
// cf. Humans at 10 to 20 FLOP/s & 7 to 10 population
ancestorsAveragePop = uniform(19, 23); # Tomasik estimates ~1e21 nematodes
ancestorsAverageBrainFlops = 2 to 6; // ~ C. elegans
ancestorsFlopPerYear = ancestorsAveragePop + ancestorsAverageBrainFlops + secInYear;

years = log10(850e6) // 1 billion years ago to 150 million years ago
ancestorsFlopPerYear + years
}
humanLife$= 1e6 to 10e6 humanBrainFlops = 1e15 humanBrain$PerFlops = humanLife$/ humanBrainFlops supercomputer$ = 1e9
supercomputerFlops = 450e15
supercomputer$PerFlop = supercomputer$ / supercomputerFlops

supercomputer$PerFlops/humanBrain$PerFlops

## Footnotes

1. Watch out for FLOP/s (floating point operations per second) vs. FLOPs (floating point operations). I'm sorry for the source of confusion, but FLOPs usually reads better than FLOP.

2. Photosynthesis has an efficiency around 1%, and jumping up a trophic level means another order of magnitude drop. The most efficient solar panels have above 20% efficiency, and electricity transmission loss is around 10%.

3. Technically, it's FLOP per "subjective second" — i.e., a second of equivalent natural thought. This can be faster or slower than "truth thought."

4. Compare FEMA's value of a statistical life at $7.5 million to the$1 billion price tag of the Fukuga supercomputer, and we come out to the supercomputer being a fourth the cost per FLOP.

# Rationalia starter pack

LessWrong has gotten big over the years: 31,260 posts, 299 sequences, and more than 120,000 users.1 It has budded offshoots like the alignment and EA forums and earned itself recognition as "cult". Wonderful!

There is a dark side to this success: as the canon grows, it becomes harder to absorb newcomers (like myself).2 I imagine this was the motivation for the recently launched "highlights from the sequences".

To make it easier on newcomers (veterans, you're also welcome to join in), I've created an Obsidian starter-kit for taking notes on the LessWrong core curriculum (the Sequences, CodexHPMOR, best of, concepts, various jargon, and other odds and ends).

There's built-in support to export notes & definitions to Anki, goodies for tracking your progress through the notes, useful metadata/linking, and pretty visualizations of rationality space…

It's not perfect — I'll be doing a lot of fine-tuning as I work my way through all the content — but there should be enough in place that you can find some value. I'd love to hear your feedback, and if you're interested in contributing, please reach out! I'll also soon be adding support for the AF and the EAF .

More generally, I'd love to hear your suggestions for new aspiring rationalists. For example, there was a round of users proposing alternative reading orders about a decade ago (by Academianjimrandomh, and XiXiDu) and may be worth revisiting in 2022.

## Footnotes

1. From what I can tell using the graphql endpoint.

2. Already a decade ago, jimrandomh was worrying about LW's intimidation factor — we're now about an order of magnitude ahead.