Reviews

2022 Review

A lot has changed this year.

From Entrepreneurship to Research

I started the year set on committing to the path of entrepreneurship. I ended it as an AI safety researcher. What can I say? Priorities change.

One day you may be sold on the earning-to-give route (or β€” if you're feeling cynical β€” the social status accompanying entrepreneurship and philanthropy). The next day, you're sold on maybe preventing powerful AI from causing the demise of humanity (or the social status accompanying AI safety research within the EA bubble).

Social status dynamics aside, it's a better fit. Working on Health Curious (the company I founded) made me feel like my brain was shrinking. I just wasn't built to spend my days writing React apps.

Meanwhile, I've always been fascinated by AI (and have always contorted my physics degrees into excuses to study NNs). Research keeps my curiosity levels far better satiated. I also didn't have anything near this healthy a support network while working on my company.

I'm happier, more focused, and working far more productively. If anything, my life has gotten much easier and better since it's gotten less balanced. Having the one overarching priority of "solve alignment" makes taking any kind of decision much easier.

I still think founding some kind of a research organization might very well be in my future. I like working with people and working on big-picture strategy. There's a big premium on that kind of thing in technical AI safety (considering we're a bunch of nerds).

Goals for 2022

Overall, I'd say my goals for 2022 had about a 50% success rate. It's that low mainly because they were bad goals that didn't suit the person I ended up becoming.

Or maybe that's just my coping strategy.

  1. πŸ›‘ No more scrolling (YouTube, Reddit, Porn, etc.):
    • Complete failure. I even ended up joining a new platform (Twitter).
  2. πŸšͺ Screen time:
    • I'd call this a success. My screen time for my phone is about an hour. For my computer, it's atrocious, often upwards of 8-10 hours. But hey, it's my job, so I accept it as the price of admission.
  3. ⏲ Self-monitoring:
    • Altogether a success. My main innovation was getting on Linear and building an integration with Toggl, so that my tasks are automatically time-tracked. This meant I didn't have to do much thinking to log my time, which is the best way to make sure it actually gets logged.
    • There's definitely room for improvement: I'm not actually doing anything with the information. I think the most natural way to address this would be to build a little dashboard for all my sources of data.
    • The other main room for improvement is that there's always more I could track: I didn't manage to track additional media consumption beyond books.
  4. πŸ“š Books (1 book per week):
    • I didn't get anywhere close to my goal of 50 books this year, at least as logged on Goodreads (it says 22). In terms of total volume, however, I think I far exceeded last year. The discrepancy consists in, e.g., Worm being listed as a single book (at 6,680 pages, about 1.5 times the Harry Potter series), and most of my reading consisting of textbooks and papers.
    • The bigger failure is that I didn't meet most of the particular categorical targets I set (in terms of, e.g., reading X books of a specific language, X books by Y author).
    • Just goes to show that optimizing the wrong metric is stupid.
  5. πŸ—ƒ PKM:
    • Bit of a failure. My personal knowledge management is a mess in need of a thorough cleaning.
  6. ✍️ Writing:
    • I missed a few of the targets, but I'm overall happy with what I've published.
    • The main new thing I'm trying to do is publish my own notes on a given subject.
  7. πŸ—£ Languages:
    • I learned Portuguese to a pretty high level, but I've given up on German (for the time being), and now accept that I will lose my bet with my roommate on reading Faust in the original German by my 25th birthday. I've also seriously fallen behind on the Mandarin.
    • The bigger problem is that I've had trouble keeping my Anki habit active. For a period of about 6 years, I did Anki pretty much every day, and I need to get back to that commitment, not just for languages, but for everything in my brain.
    • Oh, and I didn't meet any of the specific create X flashcards targets. They were too ambitious.
  8. πŸƒ Moving
    • I stopped diligently trying to close my Apple Watch rings. Bad Jesse. And I've been bad about walking. But overall, my fitness has been pretty good. It really peaked in BrasΓ­lia when I was doing pilates every day and the pilates instructor was this demon sent from the seventh circle of hell to torture us with her core wrath.
    • As for the other subgoals, I can manage a 15s or so handstand, but the 30s is not quite there yet. And I've stopped doing my Kegel exercises β€” the non-ejaculatory orgasm will have to wait.
  9. 🍽 Fasting
    • I love food too much to make myself not eat for a full day every month. So I'm going to stick to the 16/8 that has served me well for years.
  10. 🌏 Diet:
    • I've become progressively more and more vegetarian, and I think it's about time to make the full plunge.
  11. πŸ‘“ Myopia:
    • Nope. Didn't make progress here. But that's also because I stopped putting much effort into this.
  12. πŸ‘₯ Relationships:
    • This is where I've had the most success. I've found a network of people doing the same things I'm doing, and I'm now doing the programs that will get me where I need to be. I have a research mentor and plenty of other guidance to help me along the way.
  13. πŸ’° Money:
    • Between the SERI MATS stipends and the FTX regrant (which I may ultimately have to repay), I'm doing well. My partner and I are financially and locationally independent.

My takeaways for next year are to set fewer goals and to allow myself more freedom within the goals (e.g., don't try to prescribe a list of the exact authors I'm going to read). I ended up constraining myself more than was useful, and setting goals for things that weren't actually priorities. Lessons learned.

2023 Planning

The Goal

My priority is solving alignment, and for now that means seeking out (1) a position with some financial padding so I can continue doing research and (2) mentorship so I can keep getting better at it. So either: a position at a research organization like Anthropic, OpenAI, DeepMind, etc. or a PhD position (with one of a handful of advisors doing actually relevant research).1

The Subgoals

The main way I'm going to get a research position is β€” no surprises β€” to do research. I'll be at SERI MATS for the next two months with explicitly this purpose.

In particular, I'll be doing research on path dependence and theory of deep learning under the guidance of Evan Hubinger. I'm aiming to publish (in conferences) two or three papers out of this work because you have to play the signaling game just a bit if you hope to succeed.

Afterwards, there's an option for an extension (~6 months). If I decide to stick it out with the industry route, I'm aiming to obtain a position by the end of SERI MATS. If I decide for the academia route (or if two months turns out to be a crazy, unrealistic timeline), I'll go with the extension.

My lesson from last year was to set fewer goals and offer myself more freedom within each goal (e.g., to avoid a reading list of exactly these and these authors).

Writing

I want to publish impactful research (or at least have content that I could publish if I decided it was worth it to go through the process of submission).

Since it's hard to measure impact (at least on a one-year timescale), let's stick to setting targets for the observables (and live with the true goal in mind)...

  • πŸ“’ Publish(able) 3 papers. This seems pretty conservative target considering I have 2 already in the works.
  • πŸ“š Launch Textbook on AI Safety. This has taken a seat on the back-burner for the last month, but it seems pretty important and valuable. I'm going to throw out the sections on "Foundations" and "Machine Learning", and work on the thing it's actually about.
  • πŸ“ Publish notes at least once a month. Something I want to get in the rhythm of is publishing high-quality notes. Let's be real, writing little blog articles is fun, but it's not the best thing I can be doing provided I can get my writing fill in other ways, such as publishing notes. Which is what I'll be doing.

Reading List

I'm going to throw out my specific "# of books" goals from previous years, though I will set some specific goals in terms of reading textbooks.

First, though a definition of "reading textbooks."

Reading textbooks means skipping the content that doesn't matter, skimming the content that seems possibly somewhat relevant, and investing in the content that seems important (with multiple readings and problem sets). It doesn't mean actually read end-to-end.

Currently in progress

  • Artificial Intelligence by Russell and Norvig
  • Reinforcement Learning by Sutton and Barto
  • The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman
  • Algebraic Geometry and Statistical Learning Theory by Watanabe
  • Pattern Recognition and Machine Learning by Bishop
  • An Introduction to Kolmogorov Complexity and Its Applications by Li and VitΓ‘nyi
  • Scaling and Renormalization in Statistical Physics by Cardy

New

Stretch/Undecided

  • Category Theory by Awodey
  • Topology by Munkres
  • Radically Elementary Probability Theory by Nelson

Stretch Goals

  • πŸ‡¨πŸ‡³ Learn Mandarin. I like learning languages, and hobbies seem healthy even when the world is ending. It also seems valuable to make myself a future asset if world governments ever get their shit together to figure out AI policy.
  • πŸ‘“ Myopia. Actually reduce my diopters by 0.5 in both eyes.
  • πŸƒ Moving. I'd like to balance out my exercise regime a bit more. Right now, I'm going to hot pilates/yoga several times a week which seems to get me what I need in terms of mobility/flexibility/core/cardio. I'd like to get some actual strength training into my regimen.
  • πŸ’° Money. I'd love to generate a bit of passive income. Obvious routes are selling content (some notes or lecture series) or some kind of (AI-driven) service.

Footnotes

  1. I'm avoiding the independent research route because I think the value of a strong group of peers and mentors is too high to be missed. ↩

2022-M11

It's been three months since I decided to pivot to AI safety. Who knew three months can be such a long time.

FTX

I got an FTX future fund regrant for six months to help make the switch. Then FTX imploded, and it turns out my grant may be clawed back during the bankruptcy proceedings. Unfortunate. On the bright side1, FTX seems to have been such a mess that it could take years for the process to get to me. So if you have short enough timelines…2

Courses

  • The ARENA virtual program is going along smoothly.
  • This month, I joined SERI MATS to get my hands dirty in research. I'm officially under Evan Hubinger's Deceptive AI stream (though I'm also participating in John Wentworth's workshops). Yes, I'm reaching the limits of what I can juggle. We're also working our way through the Alignment 201 curriculum.3
  • When ARENA finishes up, I'm going to dedicate more of my attention to metauni (especially their track on singular learning theory (SLT)).
  • I've also finished most of the miscellaneous Alignment Forum sequences I wanted to go through (as well as AXRP and The Inside View).

The SERI MATS research sprint is about to start, and I'll be in Berkeley from January to at least February to work on this in person. Safe to say, I have way too many ideas for research projects, but I'm planning to focus on Toy Models of Superposition.

Distillation

  • I'm working on an introduction to SLT that should be out soon.
  • The video I'm working on with Hoog on AI risk is a little delayed because OpenPhil funding was paused (and I'm a little overextended but don't want to admit it), but it is coming along.
  • Next to all that, I've started working on an online, interactive AI safety textbook (very much a work in progress, more coming soon)4.

Textbooks

  • Mathematics for Machine Learnings: Done. Great book. Highly recommend.
  • Pattern Recognition and Machine Learning: Completed up through chapter 9. I have 5 chapters to go. Instead of trying to bang these out in the next month to meet my original deadline, I'm going to push back my deadline by two weeks, so I have more time during the research sprint.
  • Cracking the Coding Interview: 6 more chapters to go. Like PRML, I'm going to push back my original deadline 2 weeks to mid-January.4
  • Reinforcement Learning: I've already gone through 6/17 chapters ahead of schedule. I'm aiming to be done by April.
  • Artificial Intelligence: A Modern Approach: Here too, I'm 3 chapters in. Ahead of when I originally planned to get this started. This is a big book, so my (self-enforced) deadline is May 1st.

Outreach

  • I'm writing this on a plane to the EAGxBerkeley. Judging from EAGxRotterdam, EAGxBerkeley is going to be great (as long as they leave out the food poisoning part).
  • I've started laying out some feelers for a longer term project of establishing an AI-safety-oriented company in mainland Europe. There's a lot of early-career interest among very smart students. Give them a few years, and mainland Europe will be ripe for a new organization like 3 Resources/Rationalia/EA/Topics/Anthropic, Redwood, or Conjecture.

Conclusion

The next month is going to be hectic. I'll be in the Bay area for a week and a half, then my parents' place in NY state for a week, then Michigan at my girlfriend's fiancΓ©e's dad's, then NY for another week. Oh yeah, did I mention? I got engaged!5 On the day I publish this, it's our 5-year anniversary. Robin, I love you, and I can't wait to spend the rest of our lives together. (However long that is, doomer.)

Footnotes

  1. For me. I think it's safe to say the inconvenience for me is less than the inconvenience for people who lost their life savings. ↩

  2. My timelines aren't actually that short. But I'm not worried about eventually being able to pay this back (even very soon with the SERI MATS stipend). ↩

  3. I've come to conclude that I can safely skip Intro to ML Safety for now. Much of the content overlaps with these other programs/textbooks. ↩

  4. I'm probably a bit newer to the field than would be ideal for this task, so I'm hoping to migrate to a more editorial role, delegating the bits that I can. I think my main strength here is more a kind of Olahian interactive distillation. That's an ability which seems to be pretty rare among active researchers. ↩ ↩2

  5. I proposed with an Oura ring, which definitely says something about the kind of people we are. Now that I think about it, I should have probably asked for a sponsorship and gotten the whole wedding funded by a late-stage capitalism PR departments, but hey hindsight is 20/20. ↩

2022-M10

Two months ago, I decided to quit my company and dedicate myself full-force at AI safety. The problems I had been working on were not inspiring me, and the actual work left me feeling like my brain was shrinking. Something had to change.

So far, this feels like one of the best decisions I've ever made.

I received an FTX future fund regrant for six months to transition to research. My plan for this period rests on three pillars: (1) technical upskilling in ML, (2) theoretical upskilling in AI safety, and (3) networking/community outreach.

Concretely, my plan is to (1) read lots of textbooks and follow online courses, (2) read lots of alignment forum and go through curricula (like Richard Ngo's AGI Safety Fundamentals and Dan Hendrycks's Intro to ML Safety), and (3) travel to events, apply to different fellowships, and complete small research projects.

A month and a half has gone by since I really started, which turns to be quite a lot of time. Enough that it's a good moment for a progress report and forecast.

Technical Upskilling in ML

Textbooks

  • Mathematics for Machine Learning by Deisenroth, Faisal, and Ong (2020).
    • This is a wonderful book. Clear, concise writing. Excellent visuals (color-coded with the corresponding formulas!). It hints at what Chris Olah might be able to do with the textbook genre if he got his hands on it.
    • I've completed up to chapter 9 (that's the first half plus one chapter of the second half). I'll finish the book this month.
  • Pattern Recognition and Machine Learning by Bishop (2006).
    • This book is… okay. Sometimes. It leaves very much to be desired on the visualizing front, and in retrospect, I probably wouldn't recommend to it others. But it does provide a strong probabilistic supplement to a wider ML curriculum.
    • I've done up to chapter 5 and skipped ahead to do chapter 9. I plan to go through the rest of the book for completeness. Even if many methods are not immediately relevant to the DL paradigm, a broad basis in statistics and probability theory certainly is. I'm most looking forward to the chapters on causal models (8), sampling techniques (11) and hidden Markov models (13). This should be done by mid-December.
  • Cracking the Coding Interview by McDowell (2015)
    • The widespread goodharting of leetcode is one of many reasons I'm afraid of AI. We just have to deal with it.
    • I've completed chapters 1-7, with 10(-ish) to go. I'm aiming to be done with this by January.

I couldn't help myself and got some more textbooks. When I finish MML, I'll move on to Sutton and Barto's Reinforcement Learning. In December, I'll start on to Russell and Norvig's Artificial Intelligence: A Modern Approach. Now that I think about it, I should probably throw Goodfellow's Deep Learning in the mix.

Courses

  • Practical Deep Learning for Coders by Fast AI
    • I began following this course but was disappointed by it, mostly because its level was too basic, and its methods were too applied. So I stopped following the course.
  • ARENA Virtual
    • Two weeks, a friend introduced me to ARENA Virtual, and I jumped on the opportunity. This program follows a curriculum based on Jacob Hilton's Curriculum, and it's much more my cup of tea. It assumes prior experience, goes much deeper, and is significantly higher-paced. It's also super motivating to work with others.
    • This goes until late December.

Once ARENA is done, I might pick and choose from other online courses like OpenAI's Spinning Up, NYU's Deep Learning, etc. But I don't expect this to be necessary anymore, and it may even be counterproductive. ARENA + textbooks is likely to be enough to learn what I need. Any extra time can probably best go towards actual projects.

Theoretical Upskilling in AI Safety

Courses

  • AGI Safety Fundamentals by Richard Ngo
    • I'm going through this on my own and reading everything (the basics + supplementary material). I'm currently on week 7 of 8, so I'll finish this month.
  • Intro to ML Safety by Dan Hendrycks
    • As soon as I finish AGISF, I'll move on to this course.

Once I'm done with Intro to ML Safety, I'll go on to work through AGI Safety 201. In the meantime, I've also gone through lots of miscellaneous sequences: Value Learning, Embedded Agency, Iterated Amplification, Risks from Learned Optimization, Shard Theory, Intro to Brain-Like-AGI Safety, Basic Foundations for Agent Models, etc. I'm also working my way through AXRP and The Inside View for an informal understanding of various researchers.

Over the last two months, I've actually found myself becoming less doomer and developing longer timelines.1 In terms of where I see myself ending up: it's still interpretability with an uptick in interest for brain-flavored approaches (Shard Theory, Steven Byrnes). I picked up Evolutionary Psychology by David Buss and might pick up a neuroscience textbook one of these days. My ideal fit is still probably Anthropic.

Network & Outreach

Programs

  • SERIMATS. The essay prompts were wonderful practice in honing my intuitions and clarifying my stance. I think my odds are good of getting in, and that this is the highest value thing I can currently do to speed up my transition into AI safety. The main downside is that SERIMATS includes an in-person component that will be in the Bay starting in January. That's sooner than I would move in an ideal world. But then I guess an ideal world has solved alignment. πŸ€·β€β™‚οΈ
  • REMIX (by Redwood). I'll be applying this week. This seems as good an opportunity as SERIMATS.

I received the advice to apply more often. To already send off applications to Anthropic, Redwood, etc. I think the attitude is right, but my current approach already sufficient. Let's check in when we hear back from these programs.

Research

  • I've also put together a research agenda (email me if you want the link). In it, I've begun dissecting how the research I did during my masters on toy models from theoretical neuroscience could inform novel research directions for interpretability and alignment. I'm starting a few smaller experiments to better understand the path-dependence of training.
  • I've also started a collaboration with Diego Dorn to review the literature on representation learning and how to measure distance/similarity between different trained models.

I've decided to hold off on publishing what I've written up in my research agenda until I have more results. Some of the experiments are really low-hanging fruit, yet helpful to ground the ideas, so I figure it's better to wait a little and immediately provide the necessary context.

Networking

  • I attended an AI Safety retreat organized by EA NL, which was not only lots of fun, but introduced me to lots of awesome people.
  • I'll be attending EAGxRotterdam next week, and EAGxBerkeley in December. Even more awesome people coming soon.

Miscellaneous

  • As a final note, I'm working with Hoog on a video about AI safety. It's going to be excellent.

Footnotes

  1. More on why in a future post. ↩

2022-Q3

A lot has changed for me in the past month. My partner and I decided to close the business we had started together, and I've thrown myself full-force at AI safety.

We weren't seeing the traction we needed, I was nearing the edge of burnout (web development is not the thing for me1), and, at the end of the day, I did not care enough about our users. It's hard to stay motivated to help a few patients today when you think there's a considerable risk that the world might end tomorrow. And I think the world might end soon β€” not tomorrow, but more likely than not in the next few decades.2 At some point, I reached a point where I could no longer look away, and I had to do something.

So I reached out to the 80,000 hours team, who connected me to people studying AI safety in my area, and helped me apply to the FTX Future Fund Regranting Program for a six-month upskilling grant to receive $25,000 for kickstarting my transition to AI.

Now, I'm not a novice (my Bachelors and Masters theses applied techniques from statistical physics to understand neural networks), but I could definitely use the time to refresh & catch up on the latest techniques. A year is a long time in AI.

Next to "upskilling" in ML proper, I need the time to dive deep into AI safety: there's overlap with the conventional ML literature, but there's also a lot of unfamiliar material.

Finally, I need time to brush up my CV and prepare to apply to AI labs and research groups. My current guess is that I'll be best-suited to empirical/interpretability research, which I think is likely to be compute-constrained. Thus, working at a larger lab is crucial. That's not to mention the benefits of working alongside people smarter than you are. Unfortunately (for me), the field is competitive, and a "gap year" in an unrelated field after your masters is likely to be perceived as a weakness. There's a signaling game at hand, and it's play or be played. To sum, spending time on intangibles like "networking" and tangibles like "publications"3 will be a must.

To keep myself focused throughout the next half year, I'll be keeping track of my goals and progress here. To start, let's take a look at my current plan for the next half year.

Learning Plan

Like all good plans, this plan consists of three parts:

  1. Mathematics/Theory of ML
  2. Implementation/Practice of ML
  3. AI Safety

There's also an overarching theme of "community-building" (i.e., attending EAGs and other events in the space) and of "publishing".

Resources

Textbooks

  • Mathematics for Machine Learning by Deisenroth, Faisal, and Ong (2020).
    • I was told that this book is predominantly important for its first half, but I'm ready to consume it in full.
  • Pattern Recognition and Machine Learning by Bishop (2006)
    • I was advised to focus on chapter 1-5 and 9, but I'm aiming to at least skim the entirety.
  • Cracking the Coding Interview by McDowell (2015)
    • One specification I'm going to have to game is the interview. I'm also taking this as an opportunity to master Rust, as I think having a solid understanding of low-level systems programming is going to be an important enabler when working with large models.

ML/DL Courses

There are a bunch more, but these are the only ones I'm currently committing to finishing. The rest can serve as supplementary material after.

AI Safety Courses

Miscellaneous

Publishing

I'm not particularly concerned about publishing to prestigious journals, but getting content out there will definitely help. Most immediately, I'm aiming to convert / upgrade my Masters thesis to an AI Safety/Interpretability audience. I'm intrigued by the possibility that perspectives like the Lyapunov spectrum can help us enforce constraints like "forgetfulness" (which may be a stronger condition than myopia), analyze the path-dependence of training, and detect sensitivity to adversarial attacks / improbable inputs, that random matrix theory might offer novel ways to analyze the dynamics of training, and, more generally, that statistical physics is an un(der)tapped source of interpretability insight.

In some of these cases, I think it's likely that I can come to original results within the next half year. I'm going to avoid overcommitting to any particular direction just yet, as I'm sure my questions will get sharper with my depth in the field.

Next to this, I'm reaching out to several researchers in the field and offering myself up as a research monkey. I trust that insiders will have better ideas than I can form as of yet, but not enough resources to execute (in particular, I'm talking about PhD students), and that if I make myself useful, karma will follow.

Timeline

Over the next three months, my priority is input β€” to complete the textbooks and courses mentioned above (which means taking notes, making flashcards, doing exercises). Over the subsequent three months, my priority is output β€” to publish & apply.

Of course, this is simplifying; research is a continuous process: I'll start to produce output before the next three months is up & I'll continue to absorb lots of input when the three months is up. Still, heuristics are useful.

I'll be checking in here on a monthly basis β€” reviewing my progress over the previous month & updating my goals for the next month. Let's get the show off the road.

Month 1 (October)

Highlights

  • Finish part 1 of MML
  • Finish chapters 1-5 of PRML
  • Finish intro & chapters 1-5 of Cracking the Coding Interview
  • Finish lessons 1-5 of Practical Deep Learning for Coders
  • Finish weeks 1-5 of AGI Safety Fundamentals
  • β–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ

Footnotes

  1. At least not as a full-time occupation. I like creating things, but I also like actually using my brain, and too much of web development is mindless twiddling (even post-Copilot). ↩

  2. More on why I think this soon. ↩

  3. Whether in formal journals or informal blogs. ↩

  4. I'm including less formal / "easier" sources because I need some fallback fodder (for when my brain can no longer handle the harder stuff) that isn't Twitter or Hacker News. ↩