What exactly is an AI agent – and how do you build one?

What exactly is an AI agent – and how do you build one?

What makes something an “AI agent” – and how do you build one that does more than just sound impressive in a demo?

I’m Nico Finelli, Founding Go-To-Market Member at Vellum. Starting in machine learning, I’ve consulted for Fortune 500s, worked at Weights & Biases during the LLM boom, and now I help companies get from experimentation to production with LLMs, faster and smarter.

In this article, I’ll unpack what AI agents actually are (and aren’t), how to build them step by step, and what separates teams that ship real value from those that stall out in proof-of-concept purgatory. 

We’ll also take a close look at the current state of AI adoption, the biggest challenges teams face today, and the one thing that makes or breaks an agent system: evaluation.

Let’s dive in.

Where we are in the AI landscape

At Vellum, we recently partnered with Weaviate and LlamaIndex to run a survey of over 1,200 AI developers. The goal? To understand where people are when it comes to deploying AI in production.

What we found was pretty surprising: only 25% of respondents said they were live in production with their AI initiative. For all the hype around generative AI, most teams are still stuck in experimentation mode.

The biggest blocker? Hallucinations and prompt management. Over 57% of respondents said hallucinations were their number one challenge. And here’s the kicker: when we cross-referenced that with how people were evaluating their systems, we noticed a pattern. 

The same folks struggling with hallucinations were the ones relying heavily on manual testing or user feedback as their main form of evaluation.

That tells me there’s a deeper issue here. If your evaluation process isn’t robust, hallucinations will sneak through. And most businesses don’t have automated testing pipelines yet, because AI applications tend to be highly specific to their use cases. So, the old rules of software QA don’t fully apply.

Bottom line: without evaluation, your AI won’t reach production. And if it does, it won’t last long.

The future of IoT is agentic and autonomous
Agentic AI enables autonomous, goal-driven decision-making across the IoT, transforming smart homes, cities, and industrial systems.
What exactly is an AI agent – and how do you build one?

How successful companies build with LLMs

So, how are the companies that do get to production pulling it off?

First, they don’t just chase the latest shiny AI trend. They start with a clearly defined use case and understand what not to build. That discipline creates focus and prevents scope creep.

Second, they build fast feedback loops between software engineers, product managers, and subject matter experts. We see too many teams build something in isolation, hand it off, get delayed feedback, and then go back to the drawing board. That slows everything down.

The successful teams? They involve everyone from day one. They co-develop prompts, run tests together, and iterate continuously. About 65–70% of Vellum customers have AI in production, and these fast iteration cycles are a big reason why.

They also treat evaluation as their top priority. Whether that’s manual review, LLM-as-a-judge, or golden datasets, they don’t rely on vibes. They test, monitor, and optimize like it’s a software product – because it is.

The truth about enterprise AI agents (and how to get value from them)
What’s the point of AI if it doesn’t actually make your workday easier?
What exactly is an AI agent – and how do you build one?

Scroll to Top