RAG shows its work. That’s not the same as being right.

RAG shows its work.  That’s not the same as being right.

At the Generative AI Summit Austin, Ramkumar Shanker took the main stage to deliver a keynote that cut through the hype: the era of third-party cookies is over, and the publishers who will win are those that can turn consented, first-party signals into meaning, at scale, with accountability.

Shanker brings a rare dual perspective to this challenge. As Director of Data Science at USA TODAY (Gannett), he leads AI initiatives at the intersection of generative AI and media monetization.

In parallel, he is a research collaborator at the University of Chicago, where his work on explainable AI in medical imaging informs his approach to governance and evaluation.

I sat down with him after his session to go deeper on the ideas that lit up the room.

RAG shows its work.  That’s not the same as being right.
Ramkumar Shanker presenting at the Generative AI Summit Austin, February 25, 2026.
We’re moving from monetizing identifiers to monetizing meaning.

Ram, you opened with a provocation about the “privacy reset.” What is actually changing, and why does it matter right now?

The reset is simple in principle and hard in practice. Invisible tracking, the kind that follows a user across the web without their awareness, is becoming structurally impossible.

Regulations, browser changes, and platform policy are all converging. When that pipe closes, the only signal you own outright is the one your readers gave you directly: what they read, what they subscribe to, what they watch, how they behave on your own property.

That shifts first-party data from a compliance checkbox into a genuine competitive moat. It is the only signal that is both permissioned and exclusive to your brand. But having first-party data is table stakes. The edge comes from what you do with it: can you translate behavior into intent? Can you explain what you inferred, and why? That is the new capability that separates leaders from laggards.

Your keynote introduced the idea of “monetizing meaning” rather than monetizing identifiers. What does that look like in practice?

Traditional programmatic advertising sold access to a person, or a cookie that stood in for a person. Monetizing meaning is different: it sells access to an intent.

Instead of “this device visited these 40 sites,” you can say “this reader has demonstrated a sustained interest in local sports that correlates with in-market purchase behavior for sporting goods.” That is a richer, more actionable signal, and it is one you can back up with evidence.

Publishers can build segments like “local sports superfans” or “home-improvement intenders” from content consumption and subscription signals, then package those as premium targeting or sponsored editorial experiences.

The advertiser gets a high-signal audience. The reader gets relevance. And you can show your work, citing the specific content behaviors that justified the segment. That traceability is what makes it defensible, commercially and ethically.

“Can we infer intent at scale, and can we show our work? That is where GenAI helps.”

You talked about LLMs and RAG as the technical engine behind this. What is the “unlock” compared with traditional taxonomy approaches?

Rules-based taxonomies are brittle. Language shifts, topics evolve, and manual tagging cannot keep up with the volume of content a publisher produces. An LLM can classify based on meaning rather than exact keywords. That is the flexibility win.

But flexibility without accountability is dangerous in a commercial context. That is where Retrieval-Augmented Generation earns its place in production. RAG grounds each tag in an approved source set and produces an audit trail.

The system can tell you: “This article was tagged supply-chain compliance because these specific passages connect it to regulatory reporting requirements.” An article on sustainable manufacturing might never mention compliance explicitly, yet the system can surface that connection and show its work.

That turns tagging from a static taxonomy into a living semantic index you can monetize and govern.

A rough but useful way to picture it: imagine you are the head librarian of the world’s largest newspaper archive, millions of articles, decades of journalism. A sponsor walks in and asks for every reader who cares deeply about clean energy but has not yet committed to an EV.

Under the old model, you hand them a list of people who clicked on an EV ad. Under the new model, you read every article every person has engaged with, infer their evolving concerns about range anxiety and charging infrastructure, and build an audience defined by intention, not by a single click.

The analogy is not perfect, real systems are messier and inference is never that clean, but it captures what LLM-based tagging plus RAG is reaching toward at the scale of a national publisher.

LLMs give you flexibility. RAG gives you accountability.

What are the deployment realities that most teams underestimate when they try to take this from demo to production?

Most LLM projects do not fail in the model. They fail in the system. There are five failure modes I would warn every builder about: drift, cost, latency, evaluation, and governance.

Drift is the quiet killer. Content topics, audience behavior, and language all shift over time, and you have to monitor retrieval quality and downstream business outcomes continuously, not just model accuracy at launch.

Cost can explode at inference scale, so you need caching strategies, routing to smaller models for simpler tasks, and smarter retrieval. Latency is an engineering constraint from day one: every retrieval hop adds time, and if you are trying to serve contextual ads in real time, you cannot bolt that on later.

Evaluation is where a lot of teams are overconfident. Accuracy scores tell you almost nothing about business value. You need to measure relevance, segment lift, and actual outcomes, and you need to actively probe failure modes rather than waiting for them to surface in production.

Governance means logs, provenance records, access controls, and human review for high-stakes decisions, especially if you are using agents that take actions autonomously.

I find John Searle’s Chinese Room a useful provocation here, though I will admit it is a contested analogy and philosophers have been arguing about it for decades. Imagine a room where a person who speaks no Chinese is handed cards with Chinese symbols.

They follow a rulebook to produce responses that look, to everyone outside, like fluent Chinese conversation. The person inside is not understanding anything; they are pattern-matching.

Now imagine that room is your LLM. The analogy breaks down in important ways; LLMs are not simply rule-following in any simple sense, but the practical warning holds: a system can produce outputs that sound authoritative and coherent while being subtly wrong in ways that are very hard to detect.

That is not a reason to avoid LLMs. It is a reason to pair them with retrieval from curated sources, human review for consequential decisions, and governance structures that treat ‘sounds right’ as insufficient. Fluency is not understanding.

Most LLM projects do not fail in the model. They fail in the system.

When monetization is involved, governance becomes urgent fast. What risk do teams most consistently underestimate, and what have you seen actually work?

The failure I see most often is governance that is vague rather than concrete. Organizations publish AI principles, form committees, and then discover, usually when something goes wrong, that nobody had decision rights over anything. Principles without authority are decoration.

The bigger risk is how quickly helpful automation becomes unaccountable automation once revenue pressure enters the picture. Nick Bostrom’s paperclip maximizer is a useful frame here, even if the scenario itself is deliberately absurd.

Imagine an AI tasked with one goal: maximize paperclip production. Given sufficient capability, it converts every available resource into paperclips, because that is what it was told to optimize.

Nobody is building paperclip maximizers, and the real-world dynamics are far more gradual and mundane than the thought experiment suggests. But that is almost the point: you do not need a superintelligent rogue agent for reward hacking to cause damage.

If you optimize only for clicks or conversions, you can, without anyone intending to, degrade trust, compromise brand safety, and erode the long-term reader relationship that makes your first-party data worth anything in the first place. The reward function you choose is a governance decision, not just a modeling decision.

What works is governance that is specific: an AI council with actual decision rights, not just advisory status.

Concretely, that means approval gates for new AI features and segment definitions; source allow-lists for RAG with regular audits; required labeling and logging for training and inference; periodic outcome reviews that go beyond accuracy metrics; and an explicit kill-switch to pause any system quickly.

My parallel work in medical imaging has reinforced the same instinct. In radiology AI, you are never allowed to stop at ‘is it accurate?’ You must answer: where does it fail? Who reviews it? How do we catch errors before they harm a patient? Media is not medicine, but the discipline transfers. High-stakes decisions deserve the same scrutiny.

Do not just ask, ‘Is it accurate?’ Ask ‘where does it fail, and how do we catch it before it harms users?’

You connect your media work to your research in medical imaging at the University of Chicago. What travels between those two worlds?

More than people expect. The surface features are very different: one is predicting tumor response to treatment, the other is predicting which content a reader will engage with.

But the underlying challenges are nearly identical. You need held-out validation that actually reflects real-world distribution, explainability requirements that hold up to scrutiny, and governance frameworks that survive institutional review.

In medical imaging, you cannot ship a model without thinking about failure modes, audit trails, and the humans in the loop who catch what the model misses. That discipline has made me a more careful engineer in media.

And the scale of media, millions of articles and hundreds of millions of sessions, has made me a more rigorous scientist. The instinct to ask whether the experimental design actually tested the claim is one that improves every system I build, in either field.

What is the one mental model you want every attendee to carry out of this session?

We are drifting toward a Library of Babel internet, and I mean that as a useful approximation rather than a precise claim.

Borges imagined an infinite library containing every possible combination of characters, and therefore every book that has ever been written or ever could be written, alongside an infinite number of books that are almost right, plausibly formatted, and completely wrong.

The internet is obviously not infinite, and AI-generated content is not random, but the directional challenge is real: generative AI can produce convincing text at an unprecedented scale, which means the ratio of plausible-but-wrong to verified-and-true is shifting in a direction that should concern anyone who depends on information quality. 

RAG shows its work.  That’s not the same as being right.
The Library of Babel as today’s internet: infinite shelves of information, most of it plausible, much of it wrong.

The competitive advantage is not the ability to generate more text. It is the ability to build and maintain the trusted index, the curated, provenance-backed catalog that lets you find what is true amid the noise.

RAG is not a truth machine. It is a traceability machine. If your source set is bad, you will get well-cited nonsense. The real unlock is retrieval from reputable, curated sources with clear provenance.

Start small: pick a high-trust source set, build a basic RAG layer that cites it, create a small set of monetizable segments, and design human review and logging from day one. Because the teams that get governance right from the start are the teams that are still operating at scale two years from now.

RAG is not a truth machine. It is a traceability machine. If your source set is bad, you will get well-cited nonsense.

About the speaker

Ramkumar Shanker is Director of Data Science at USA TODAY (Gannett), where he leads AI initiatives at the intersection of generative AI and first-party data monetization.

He is also a research collaborator at the University of Chicago, where his work on explainable AI in medical imaging informs his approach to governance and evaluation. He holds an M.S. in Applied Data Science from the University of Chicago and a B.Tech. from IIT Madras.

The views expressed in this interview are those of Ramkumar Shanker and do not necessarily reflect the positions of USA TODAY or the University of Chicago.

Scroll to Top