What happens when our machines begin to understand us as naturally as we understand each other?
That’s not a question for the future – it’s one we’re living through right now.
Training today’s large language models already costs upwards of $100 million. Just last year, two Nobel Prizes were awarded for AI breakthroughs. That’s extraordinary. It signals something profound: we’ve crossed a threshold where artificial intelligence isn’t just solving problems, it’s transforming how we think, create, and interact.
In my career, from leading research at Google DeepMind to my current work as Chief AI Officer at Genesis Therapeutics, I’ve seen AI evolve from brittle systems that followed commands to flexible partners capable of reasoning, learning, and even showing hints of personality.
So in this article, I’ll reflect on where that journey has taken us, and where it’s leading next. We’ll explore how large language models (LLMs) are changing natural interaction, unifying control across systems, and even learning autonomously.
Most importantly, we’ll consider what these breakthroughs mean for the path toward Artificial General Intelligence (AGI) – and for the safety, responsibility, and humanity of the field we’re building together.
Let’s get started.
Teaching robots to understand us
When I first started in robotics, giving instructions to a robot was about as intuitive as writing assembly code. You had to specify coordinates, velocities, joint angles – every micro-step.
Now, imagine instead saying something like:
“Trot forward slowly.” “Back off – don’t hurt the squirrel.” “Act like you’re limping.”
And the robot simply understands.
That’s the leap we’ve made thanks to large language models. In one of our early projects, we used GPT-4 to control a quadruped robot. Underneath, a traditional controller handled the physical contact patterns – blue meant touch, yellow meant lift – while the LLM acted as an intelligent interface translating natural language into motor commands.
What amazed us wasn’t just that it worked – it was that it generalized. You could tell the robot, “Good news, we’re going on a picnic!” and it would literally jump around.
That’s what I mean by natural interaction. For the first time, non-experts can access complex AI systems without programming or robotics expertise. It’s a fundamental shift – one that opens up AI to millions more people and use cases.

Code as a common language
Across robotics, web agents, and digital assistants, one big barrier has always been fragmentation. Every system speaks a different “action language.”
A self-driving car thinks in terms of steering angle and acceleration.A quadruped robot thinks in terms of joint torques.A web agent manipulates HTML elements.
There’s no universal interface.
But code might just be that universal action space.
Let me give you an example. We built a web navigation agent capable of executing multi-step searches entirely from natural instructions like:
“Find one-bedroom apartments in Ortonville for corporate housing, starting from google.com.”
The agent reads the raw HTML (we’re talking megabytes of unstructured data), plans the next steps, writes the necessary code, executes it, and repeats – closing the loop autonomously.
With just 200 self-collected demonstrations, this modular system learned to generalize across entirely new websites. We achieved success rates between 65% and 80% on real-world domains like real estate, Reddit queries, and Google Maps.
However, this capability also raised early concerns about AI safety. I remember vividly in late 2022, right as ChatGPT launched, we were discussing whether agents should be allowed to write and execute code on their own. That’s a powerful and potentially dangerous ability.
So while this experiment demonstrated how LLMs can unify action across domains, it also reminded us that capability must come with control.
