Why This Engineer is Betting Against AI Agents

Everyone seems to agree: 2025 will be the year AI agents finally take over. Autonomy is the new magic word. 

Yet for one developer who has built over a dozen production-grade agent systems—from UI generation to DevOps automation—this narrative doesn’t add up. The problem, he argues, isn’t that agents don’t work. It’s that they don’t work like the industry thinks.

In an exclusive interview with AIM, Utkarsh Kanwat, engineer at ANZ, unpacked why he’s still bullish on AI, but deeply sceptical of fully autonomous agents. His reasons are supported by mathematics, economics, and traditional engineering principles.

The Numbers Just Don’t Work

At the centre of Kanwat’s argument is a brutal truth, multi-step AI workflows compound error exponentially. A system with 95% per-step reliability, generous for current LLMs, delivers just 36% success across 20 steps. “This isn’t a prompt engineering problem. This is mathematical reality,” he writes in a blog post.

Kanwat told AIM that his first production agent, a function generator, worked well for simple tasks but failed catastrophically on ambiguity.

“It completely broke down on complex, ambiguous requirements. This shaped my philosophy around building verification layers and human review,” said Kanwat.

Success, he argues, lies in designing around these constraints. His DevOps agent, for instance, works because it’s split into small, independently verifiable tasks. Human confirmation gates and rollback points turn “autonomy” into manageable assistance.

This philosophy isn’t limited to reliability. Token costs, too, are an invisible wall.

“Each follow-up required the full conversation context. By query 10 in a session, it was passing 150,000+ tokens per request costing multiple $ per request for models like OpenAI o1,” said Kanwat, describing a failed conversational database agent. The token bill quickly outpaced the value.

Where Most Agents Fail

Tool design, not AI capability, often decides whether an agent system survives production. Many agent companies, Kanwat notes, “treat tools like human interfaces, not AI interfaces.” Without structured feedback and partial failure handling, API calls become dead ends. In his view, agents do only about 30% of the work.

When asked to elaborate, he said, “70% is the person designing the right boundaries and creating feedback loops that the agent can actually understand and act on.”

Even integration is a non-trivial frontier. “Enterprise systems aren’t clean APIs waiting for AI agents to orchestrate them,” Kanwat writes. His database agent doesn’t just query—it manages connection pooling, handles rollbacks, respects replicas, and logs for compliance. The AI handles query generation. Everything else is systems engineering.

He also flagged a critical misconception that good demos reflect good products. 

“Almost all demos show agents doing complex workflows successfully because they’re curated scenarios run in controlled environments. Production systems need to handle every scenario—including the ones you’ve never seen before.”

What the Industry Gets Wrong

Kanwat doesn’t expect the hype cycle to end well for everyone. 

“Venture-funded ‘fully autonomous agent’ startups will hit the economics wall first,” he predicts. Enterprise vendors who bolted on agents as features may also struggle with poor integration depth.

When asked who’s getting it right, he pointed to Anthropic.

“Their Constitutional AI approach and emphasis on safety-first deployment shows they understand that capability without reliability is dangerous,” he said. But he’s sceptical of anyone touting “fully autonomous” systems without addressing cost, failure, or integration.

He sees the closest parallel to this moment in the blockchain boom, hype outpacing grounded use. “AI agents are fundamentally different—they actually work and solve real problems.”

The relatively low adoption, even in tech-savvy markets, is no mystery to him. 

“India might see even lower adoption due to the relative lack of deep tech investment and infrastructure,” Kanwat noted. “I believe Indian markets are more conservative with funding for new technologies.”

‘Humans Maintain Control’

The most successful agents Kanwat has built all follow the same playbook: let AI do the complex translation, let humans control the critical decision points, and wrap everything in robust software engineering.

His UI generator works because humans sign off before deployment. His CI/CD pipeline succeeds because rollback rules are explicit. And his database agent confirms every destructive action. “AI handles complexity, humans maintain control,” he sums up.

The future he envisions isn’t “agents everywhere.” It’s carefully scoped AI tools that operate within boundaries, with predictable economics and reliability engineered in from the start. “Stateless often beats stateful,” he advises. “Users trust tools that work consistently more than they value systems that occasionally do magic.”

And what about those still chasing full autonomy? They’ll likely learn the hard way. “The market will learn the difference between AI that demos well and AI that ships reliably,” Kanwat concludes. “That education will be expensive for many companies.”

The post Why This Engineer is Betting Against AI Agents appeared first on Analytics India Magazine.

Scroll to Top