
Eric Jang – Building AlphaGo from scratch
Descripción del Episodio
Eric Jang walks through how to build AlphaGo from scratch, but with modern AI tools.
Sometimes you understand the future better by stepping backward. AlphaGo is still the cleanest worked example of the primitives of intelligence: search, learning from experience, and self-play. You have to go back to 2017 to get insight into how the more general AIs of the future might learn.
Once he explained how AlphaGo works, it gave us the context to have a discussion about how RL works in LLMs and how it could work better – naive policy gradient RL has to figure out which of the 100k+ tokens in your trajectory actually got you the right answer, while AlphaGo’s MCTS suggests a strictly better action every single move, giving you a training target that sidesteps the credit assignment problem. The way humans learn is surely closer to the second.
Eric also kickstarted an Autoresearch loop on his project. And it was very interesting to discuss which parts of AI research LLMs can already automate pretty well (implementing and running experiments, optimizing hyperparameters) and which they still struggle with (choosing the right question to investigate next, escaping research dead ends). Informative to all the recent discussion about when we should expect an intelligence explosion, and what it would look like from the inside.
Watch on YouTube. Read the transcript.
And check out the flashcards I wrote to retain the insights.
Sponsors
* Cursor‘s agent SDK let me build a pipeline to generate flashcards for this episode. For each card, I had an agent read the transcript, ingest blackboard screenshots, generate an SVG visual, and run everything through a critic. A durable agent is much better at this kind of work than a chain of LLM calls, and Cursor’s SDK made it easy. Check out the cards at flashcards.dwarkesh.com and get started with the SDK at cursor.com/dwarkesh
* Jane Street gave me a real deep-dive tour of one of their datacenters. I got to ask a bunch of questions to Ron Minsky, who co-leads Jane Street’s tech group, and Dan Pontecorvo, who runs Jane Street’s physical engineering team. They were willing to literally pull up the floorboards and take out racks to explain how everything works. Check out the full tour at janestreet.com/dwarkesh
Timestamps
(00:00:00) – Basics of Go
(00:08:17) – Monte Carlo Tree Search
(00:32:04) – What the neural network does
(01:00:33) – Self-play
(01:25:38) – Alternative RL approaches
(01:45:47) – Why doesn't MCTS work for LLMs
(02:01:09) – Off-policy training
(02:12:02) – RL is even more information inefficient than you thought
(02:22:16) – Automated AI researchers
Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
Episodios Recientes

Más podcasts de Sociedad y Cultura
Ver toda la categoría →
Ad Propositum
By shows
Bienvenidas y bienvenidos al podcast de Adpropositum, mi espacio auditivo para acompañarte a conectarte con tu propósito ayudándote a eliminar los obstáculos para acceder a una vida autentica y con sentido. Aqui reflexionaremos y aprenderemos en torno a la vida, el amor, el sufrimiento, el proposito y lo valioso. Un lugar construido para que lo compartas con otros y para que ademas de acceder a mis podcast, tambien encuentres mis medicinas auditivas para el alma.

Modo Taoísmo
By shows
Obtén inspiración para poseer el poder de alcanzar la grandeza y desbloquear todo tu potencial. Solo necesitas motivación y orientación para superar obstáculos y llevar una vida con propósito.

BAJO LOS PALOS by FLEXICAR
By shows
Bajo los Palos es un podcast presentado por Iker Casillas, donde las conversaciones van más allá del fútbol. En cada episodio, Iker invita a diferentes personalidades para hablar sobre experiencias de vida, aprendizajes y reflexiones, creando un espacio cercano y auténtico. Un viaje lleno de historias inspiradoras, desde dentro y fuera del terreno de juego.

Park Predators
By shows
Explore the dark side of the world’s most beautiful places with investigative journalist and park enthusiast Delia D’Ambra. Each week, Delia guides you deep into national parks and forests across the globe, uncovering stories where nature’s breathtaking beauty has masked sinister secrets. From infamous cases that made headlines to little-known crimes that still need answers, Delia’s relentless pursuit of the truth takes her through archives and remote landscapes to reveal the hidden darkness haunting these natural wonders. Because sometimes, the most beautiful places hide the darkest secrets. This is Park Predators.

Monólogo de Alsina
By shows
Escucha y lee todas las noticias del programa. En directo de L-V de 6 a 12:30

Martes De Misterio
By shows
Casos reales de misterio y horror. Testimonios en primera persona. Entrevistas e investigaciones. Conduce: Martín Echevarría (@martinderadio). Cada Martes un episodio estreno para que puedas oír desde cualquier dispositivo. Si tienes una historia para contarnos éstos son nuestros contactos: +54 9 223 6155802 (Whatsapp Producción) // @martesdemisterio (Instagram) // mail: [email protected]

El Cartel de La Mega
By shows
Dirigido por Daniel Trespalacios. Es reconocido por su formato innovador que mezcla entretenimiento, interacción con los oyentes y temas paranormales.
