Why wait until the end to realize your model’s code won’t actually run?

Recent breakthroughs in reasoning with large language models have followed a simple pattern: think deeply about a problem upfront, then generate the answer. This approach works remarkably well for math competitions, where the full puzzle is laid out before you start. But code generation tells a different story.

Consider the difference between solving a word problem and writing actual code. A math problem presents itself completely: “A train leaves Boston at 60 mph, another leaves New York at 70 mph, they’re 200 miles apart, when do they meet?” You can think through the entire setup before touching paper. Code works differently. You start writing a JSON parser with validation, and only halfway through do you realize recursive structures need fundamentally different handling than you assumed. The complexity wasn’t hidden in the problem statement, it emerged from your own implementation decisions.

This distinction explains why “think first, generate once” reasoning approaches have hit a ceiling for code. Problems reveal their true difficulty incrementally as implementation proceeds. Different sections need different amounts of reasoning. Some lines of code flow naturally, others are algorithmic nightmares. Upfront reasoning wastes tokens on scenarios that never materialize, while by the time the model gets stuck, it’s already committed to wrong choices.

A new paper presents a fundamental insight: code generation needs a different approach. Rather than planning everything before you type, models should be able to pause and think at any moment during generation, exactly when uncertainty spikes. This is called Think-Anywhere, and it reshapes how we think about reasoning in AI.

Where does a coder actually need to pause

Before proposing solutions, we need to identify what signal could possibly tell a model “you need to think more here.” The answer lies in something measurable: token entropy.


Read more

Scroll to Top