
Eric Jang – Building AlphaGo from scratch
Descripción del Episodio
Eric Jang walks through how to build AlphaGo from scratch, but with modern AI tools.
Sometimes you understand the future better by stepping backward. AlphaGo is still the cleanest worked example of the primitives of intelligence: search, learning from experience, and self-play. You have to go back to 2017 to get insight into how the more general AIs of the future might learn.
Once he explained how AlphaGo works, it gave us the context to have a discussion about how RL works in LLMs and how it could work better – naive policy gradient RL has to figure out which of the 100k+ tokens in your trajectory actually got you the right answer, while AlphaGo’s MCTS suggests a strictly better action every single move, giving you a training target that sidesteps the credit assignment problem. The way humans learn is surely closer to the second.
Eric also kickstarted an Autoresearch loop on his project. And it was very interesting to discuss which parts of AI research LLMs can already automate pretty well (implementing and running experiments, optimizing hyperparameters) and which they still struggle with (choosing the right question to investigate next, escaping research dead ends). Informative to all the recent discussion about when we should expect an intelligence explosion, and what it would look like from the inside.
Watch on YouTube. Read the transcript.
And check out the flashcards I wrote to retain the insights.
Sponsors
* Cursor‘s agent SDK let me build a pipeline to generate flashcards for this episode. For each card, I had an agent read the transcript, ingest blackboard screenshots, generate an SVG visual, and run everything through a critic. A durable agent is much better at this kind of work than a chain of LLM calls, and Cursor’s SDK made it easy. Check out the cards at flashcards.dwarkesh.com and get started with the SDK at cursor.com/dwarkesh
* Jane Street gave me a real deep-dive tour of one of their datacenters. I got to ask a bunch of questions to Ron Minsky, who co-leads Jane Street’s tech group, and Dan Pontecorvo, who runs Jane Street’s physical engineering team. They were willing to literally pull up the floorboards and take out racks to explain how everything works. Check out the full tour at janestreet.com/dwarkesh
Timestamps
(00:00:00) – Basics of Go
(00:08:17) – Monte Carlo Tree Search
(00:32:04) – What the neural network does
(01:00:33) – Self-play
(01:25:38) – Alternative RL approaches
(01:45:47) – Why doesn't MCTS work for LLMs
(02:01:09) – Off-policy training
(02:12:02) – RL is even more information inefficient than you thought
(02:22:16) – Automated AI researchers
Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
Episodios Recientes

Más podcasts de Sociedad y Cultura
Ver toda la categoría →
Dumb Blonde
By shows
<p>Asking the questions others are afraid to. Bunnie XO host of the Dumb Blonde podcast – the ultimate destination for comedy, trending and lifestyle. Get ready to dive into hilarious discussions about relationships, trauma, embarrassing moments, and all the realness life throws at us. Join Bunnie every week to laugh, relate, and embrace your inner healing.</p>

The Snare
By shows
In 1996, 18-year-old Angie Dodge is found brutally murdered in her Idaho Falls home. Police zero in on a suspect and put a man behind bars. But as the years pass, doubts emerge about whether the real killer was ever caught. Leading the fight for answers is an unlikely advocate: Angie’s own mother, who embarks on a decades-long mission to uncover the truth. A six-part series from 20/20 and ABC Audio, hosted by Maggie Rulli. New episodes Tuesdays.

Joy 101 with Hoda Kotb
By shows
<p>Joy is essential.</p> <p>And it's also elusive. You can't order it, borrow it, or simply hope it into life.</p> <p>But now, there's a new and exciting way to start your journey toward a more joyful existence: The Joy 101 Podcast with Hoda!</p> <p>Best known for her Emmy-winning work and co-anchoring <em>Today,</em> Hoda Kotb infuses her authenticity, curiosity, and warmth into conversations with the world’s most fascinating people. Entertainment legends, sport icons, wellness experts, and everyday folks will share how they find, allow, and experience joy. Hoda will offer her own tips and takes on seeking a more balanced, harmonious life. </p> <p>If you're craving inspiration, support, and useful tools to maximize your joy, tune in to these candid, uplifting, and moving on-air chats.</p> <p>Joy after a breakup, joy as an empty-nester, joy after loss, joy as a caretaker — Hoda's new podcast will speak to you.</p> <p>Joy 101 with Hoda Kotb, an iHeartPodcast.</p>

La Silla: On The Record
By shows
Cada semana contamos movidas de poder en Colombia a través de la voz de sus protagonistas. Un podcast de La Silla Podcasts.

Dinero Más Inteligente
By shows
El dinero no solo se gana, se entiende. <br /> En El Dinero Más Inteligente, Valeria Ovalle presenta la economía y Juan Carlos Herrera la conecta al mundo de inversiones. Una conversación entre razón y estrategia para entender el mundo financiero sin complicaciones. <br /> By GBM <br /> Síguenos en <b><a href="https://www.instagram.com/gbmplus_?igsh=czBzOG5mazBwMmlj&utm_source=qr">Instagram</a><b>.</b></b>

Lo que NO se habla con Fer Flores
By shows
Contenidos para construir una Humanidad con H mayúscula con perspectiva psicoanalista, médica, legal, antropológica y espiritual.

After the Whistle with Brendan Hunt and Rebecca Lowe
By shows
<p>Rebecca Lowe (Fox Sports) and Brendan Hunt (‘Ted Lasso’) are teaming up again to take on the 2026 World Cup! They’ll ride an emotional roller coaster together as 48 teams play 104 action-packed matches across the U.S., Canada, and Mexico. They’ll bring you all the joy and the drama, the hope and the heartbreak — and help you understand the matchups and personalities that will make this the biggest sporting and cultural event of our lifetimes</p><p>‘After the Whistle With Brendan Hunt and Rebecca Lowe’ is an Apple News Original podcast presented by Verizon.</p>
