Unsupervised Learning with Jacob Effron

Ep 87: Gemini Co-Lead on World Models, RL's Next Domains & Continual Learning

Episode Summary

Oriol Vinyals, VP of Research at Google DeepMind and co-lead of the Gemini program, joins Jacob the day after Google I/O to unpack the research underpinning Google's latest announcements and where frontier AI is heading. The conversation moves from world models (why Google has uniquely bet on them as a path to AGI, what the "GPT moment" for video and images would look like, and how they connect to robotics and simulation) to agents (the Spark release, why the system and model need to be optimized jointly, and why scaffolding will eventually be written by models themselves). Oriol gets into the mechanics of memory in models, drawing on his cognitive neuroscience background to argue that file-system-style non-parametric memory is more practical than baking memory into weights at serving scale. He shares his views on the limits of RL today (LLMs are data-limited in a way that game-playing RL never was), why training on narrow domains like math and code generalizes surprisingly well, and what a true "Move 37" moment for science or ML research would look like. Throughout, he reflects on the unique advantages of being inside Google (TPU co-design, end-to-end revenue stability, the merger of Brain and DeepMind), the trade-offs between focus and exploration in research orgs, and why he believes AGI in some meaningful sense may already be here, even if the goalposts keep moving.

Episode Notes

(0:00) Intro

(1:36) Why World Models

(4:21) The GPT Moment for Video

(7:51) What Makes Omni a World Model

(10:04) World Models & Robotics

(12:37) Evaluating Physics in AI

(14:51) Consumer Agents & Spark

(18:39) Scaffolding & the Bitter Lesson

(22:06) Memory & Continual Learning

(26:54) Research Bets Inside Big Labs

(32:30) Post-Training RL is Greenfield

(35:57) What Real Intelligence Looks Like

(39:11) RL Generalization

(43:00) Advice for Founders

(46:40) Can AI Truly Innovate?

(49:48) Recursive Self-Improvement

(52:14) Quickfire