

Machine learning researchers didn’t sign up to babysit GPU clusters.
Yet modern AI research often involves many hours managing infrastructure, debugging distributed systems, and working with compute resources.
The research component then has to compete for limited cognitive and time resources.
It took a journey to solve this problem: seven years as CTO of OpenAI, a departure, a new venture, $2 billion in funding, and the launch of Tinker.
Mira Murati’s Thinking Machines Lab has delivered an early Christmas gift to the ecosystem with Tinker.
It is an API that promises to return researchers to what they do best: research itself, not systems administration.
Tinker allows developers to fine-tune models and apply reinforcement learning loops while the API does the heavy lifting of distributed computing. Currently, it is available on a waitlist basis for developers.
And because the same API is used for both generating responses and updating the model in real time, it collapses the old split between “inference systems” and “training systems” into one continuous, programmable loop.
AIM reached out to Tyler Griggs, a reinforcement learning researcher at UC Berkeley, who co-developed Sky-RL and is among Tinker’s early users.
For context, SkyRL is a Python-based framework that researchers run on their own machines or GPU clusters.
It allows them to fully control reinforcement learning — defining custom agents, environments, and reward signals, then executing live training loops to influence how a model behaves, and improve over time actively.
Griggs highlighted Tinker’s importance both for researchers and as a tool to improve existing RL frameworks and mechanisms.
“For ML researchers who don’t care about the underlying infrastructure but only about the environment and data, this is excellent… I can see the budgets of various academic groups shifting their GPU credit allocations to Tinker,” said Griggs.
“Because these researchers don’t want to deploy GPUs. They don’t care”.
How Tinker Boosts Existing RL Frameworks
The Tinker API has been successfully integrated into different open-source projects that Thinking Machines has granted access to.
For instance, SkyRL’s implementation, called ‘SkyRL tx’, demonstrates how Tinker can be integrated into an existing reinforcement learning framework.
SkyRL initially offered a complete reinforcement learning stack, including algorithms, agent logic, and environment design, but users still had to deploy and handle the compute infrastructure on their own hardware.
While researchers controlled the training process, reward functions, evaluators, and custom environments, enabling experiments in live learning, tool use, and reasoning, they still faced operational challenges with GPU orchestration and system setup.
But when Tinker arrived, the dynamic shifted. SkyRL tx, the Tinker-backed implementation, effectively offloaded the engine layer to Tinker while retaining SkyRL’s brain layer.
Before Tinker, the ecosystem was segmented.
At one end were services that allowed researchers to upload data and obtain a fine-tuned model with minimal control over the training process.
At the other end were low-level systems such as SkyRL, which offered full customisation of reinforcement learning algorithms but required users to handle GPU deployment and management manually.
Tinker positioned itself in the middle—offering enough low-level features for full algorithm control, while also abstracting the complexity of hardware and distributed execution.
Griggs emphasised a second benefit of Tinker: its long-term potential for standardisation.
“If the Tinker training API becomes as standard for training as OpenAI’s API is for inference, then we imagine that many people will be writing their training loops in the Tinker API.”
The Advantage of a Unified Interface
Griggs highlighted the unique advantage of training and inference under a single interface. “Your job as someone writing an RL framework is to write a bunch of glue code to put these [training and inference stacks] together.”
“We’ve seen several errors arise from this,” he added. He pointed to instances where numerics often don’t align, leading to off-policy behaviour.
This means the model begins making decisions based on slightly different internal representations than those it was trained on, simply because the inference and training systems don’t compute values in precisely the same way.
Other issues, such as slow and error-prone weight synchronisation, arise because the training and inference systems store the model differently. This requires constant conversion and passing of weights to keep them aligned.
Solving these issues, Griggs argued, would have an outsized effect on the broader ecosystem. “I really do have high confidence that it will significantly lower the barrier for ML research, especially for people not at big labs, for those in academia.”
In his view, the key downstream impact is the redistribution of knowledge. “It could no longer be the case where the little tricks or secrets of how to train these models really well are only in the heads of dozens or hundreds of people at big labs.”
Instead, open-source and academic researchers would be able to move just as swiftly, unlocking a wave of progress that simply wasn’t possible before.
Tinker initially offered its services for free, but has recently introduced pricing for API usage. However, Griggs said that the real value isn’t economic; it’s cognitive.
Even if it eventually costs as much as renting GPUs directly, researchers would find a net acceleration given the time Tinker saves them, while keeping frustration and mental load at bay.
‘Tinker Keeps the Fun Alive for Deep Learning Engineers’
AIM reached out to multiple AI researchers to further understand how these aspects could benefit them.
Sachin Dharashivkar, the founder of Athena Agent, which helps companies develop RL environments, said that such a project, which would normally take four or five months, could be completed in two and a half months.
He also added that the API works the way deep learning research engineers intuitively think, pointing to Tinker’s four primitive operations, rather than abstracting everything away behind a black-box system.
He emphasised that this is where Tinker’s most substantial advantage lies.
Closed fine-tuning services typically abstract away too much, limiting control over the actual training loop. “I need to do many experiments, I need to explore how we define loss functions, how we create data and everything else,” said Dharashivkar.
“Tinker keeps the fun part alive for deep learning engineers. The harder part, which we don’t enjoy doing, which we have to do, that is building a proper compute infrastructure, is now automated.”
Having said that, it’d be unfair to say that Tinker is the first of its kind. Platforms like Modal or Runpod also offer certain similar features.
While many wait to get access to Tinker, Adithya S Kolavi, founder of Cognitive Lab, shared how one can train or fine-tune LLMs without worrying much about infrastructure on Modal.
However, what is worth reiterating is that the abstraction level that Tinker offers with its four primitives is what many other frameworks don’t.
Historically, foundational models weren’t the only breakthroughs that shaped the field. Clean abstractions and training frameworks have had equally profound impacts.
DeepSeek’s introduction of Group Relative Policy Optimisation (GRPO), for instance, didn’t just power its own models — it sparked an entire wave of new research.
Tinker could play a similar role if its API becomes the standard interface for training, not just inference.
The post Mira Murati Makes Deep Learning Fun Again for Researchers appeared first on Analytics India Magazine.


