Background

Reiner Pope – The math behind how LLMs are trained and served

Dwarkesh Podcast29 de abril de 20268030
Compartir episodio:Descargar

Descripción del Episodio

Did a very different format with Reiner Pope - a blackboard lecture where he walks through how frontier LLMs are trained and served.

It’s shocking how much you can deduce about what the labs are doing from a handful of equations, public API prices, and some chalk.

It’s a bit technical, but I encourage you to hang in there – it’s really worth it.

There are less than a handful of people who understand the full stack of AI, from chip design to model architecture, as well as Reiner. It was a real delight to learn from him.

Recommend watching this one on YouTube so you can see the chalkboard.

Reiner is CEO of MatX, a new chip startup (full disclosure - I’m an angel investor). He was previously at Google, where he worked on software efficiency, compilers, and TPU architecture.

Download markdown of transcript here to chat with an LLM.

Wrote up some flashcards and practice problems to help myself retain what Reiner taught. Hope it's helpful to you too!

Sponsors

* Jane Street needs constant access to incredibly low-latency compute. I recently asked one of their engineers, Clark, to talk me through how they meet these demands. Our conversation—which touched on everything from FPGAs to liquid cooling—was extremely helpful as I prepped to interview Reiner. You can watch the full discussion and explore Jane Street’s open roles at janestreet.com/dwarkesh

* Google’s Gemma 4 is the first open model that’s let me shut off the internet and create a fully disconnected “focus machine”. This is because Gemma is small enough to run on my laptop, but powerful enough to actually be useful. So, to prep for this interview, I downloaded Reiner’s scaling book, disconnected from wifi, and used Gemma to help me break down the material. Check it out at goo.gle/Gemma4

* Cursor helped me turn some notes I took on how gradients flow during large-scale pretraining into a great animation. At first, I wasn’t sure the best way to visualize the concept, but Cursor’s Composer 2 Fast model let me iterate on different ideas almost instantaneously. You can check out the animation in my recent blog post. And if you have something to visualize yourself, go to cursor.com/dwarkesh

Timestamps

(00:00:00) – How batch size affects token cost and speed

(00:32:09) – How MoE models are laid out across GPU racks

(00:47:12) – How pipeline parallelism spreads model layers across racks

(01:03:37) – Why Ilya said, “As we now know, pipelining is not wise.”

(01:18:59) – Because of RL, models may be 100x over-trained beyond Chinchilla-optimal

(01:33:02) – Deducing long context memory costs from API pricing

(02:04:02) – Convergent evolution between neural nets and cryptography



Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe

Episodios Recientes

Reiner Pope – The math behind how LLMs are trained and served

Más podcasts de Sociedad y Cultura

Ver toda la categoría →
FREE SOLO

FREE SOLO

By shows

Free Solo

Ad Propositum

Ad Propositum

By shows

Bienvenidas y bienvenidos al podcast de Adpropositum, mi espacio auditivo para acompañarte a conectarte con tu propósito ayudándote a eliminar los obstáculos para acceder a una vida autentica y con sentido. Aqui reflexionaremos y aprenderemos en torno a la vida, el amor, el sufrimiento, el proposito y lo valioso. Un lugar construido para que lo compartas con otros y para que ademas de acceder a mis podcast, tambien encuentres mis medicinas auditivas para el alma.

Modo Taoísmo

Modo Taoísmo

By shows

Obtén inspiración para poseer el poder de alcanzar la grandeza y desbloquear todo tu potencial. Solo necesitas motivación y orientación para superar obstáculos y llevar una vida con propósito.

BAJO LOS PALOS by FLEXICAR

BAJO LOS PALOS by FLEXICAR

By shows

Bajo los Palos es un podcast presentado por Iker Casillas, donde las conversaciones van más allá del fútbol. En cada episodio, Iker invita a diferentes personalidades para hablar sobre experiencias de vida, aprendizajes y reflexiones, creando un espacio cercano y auténtico. Un viaje lleno de historias inspiradoras, desde dentro y fuera del terreno de juego.

Park Predators

Park Predators

By shows

Explore the dark side of the world’s most beautiful places with investigative journalist and park enthusiast Delia D’Ambra. Each week, Delia guides you deep into national parks and forests across the globe, uncovering stories where nature’s breathtaking beauty has masked sinister secrets. From infamous cases that made headlines to little-known crimes that still need answers, Delia’s relentless pursuit of the truth takes her through archives and remote landscapes to reveal the hidden darkness haunting these natural wonders. Because sometimes, the most beautiful places hide the darkest secrets. This is Park Predators.

Monólogo de Alsina

Monólogo de Alsina

By shows

Escucha y lee todas las noticias del programa. En directo de L-V de 6 a 12:30

Martes De Misterio

Martes De Misterio

By shows

Casos reales de misterio y horror. Testimonios en primera persona. Entrevistas e investigaciones. Conduce: Martín Echevarría (@martinderadio). Cada Martes un episodio estreno para que puedas oír desde cualquier dispositivo. Si tienes una historia para contarnos éstos son nuestros contactos: +54 9 223 6155802 (Whatsapp Producción) // @martesdemisterio (Instagram) // mail: [email protected]

El Cartel de La Mega

El Cartel de La Mega

By shows

Dirigido por Daniel Trespalacios. Es reconocido por su formato innovador que mezcla entretenimiento, interacción con los oyentes y temas paranormales.