TELUS Digital Research Reveals a Hidden Risk in AI Model Behavior

Study shows the use of persona prompting can cause shifts in LLMs’ moral judgements,
leading to unexpected and inconsistent responses

For enterprises, this means careful model selection, rigorous testing and ongoing evaluation are essential to ensure consistent,
reliable AI behavior in production

A new study published by TELUS Digital, The Robustness Paradox: Why Better Actors Make Riskier Agents, finds that the use of persona prompting, a technique that asks large language models (LLMs) to “role-play” as part of a query or conversation, can cause shifts in moral judgements that lead to unexpected and inconsistent responses. In addition, the research demonstrates that moral consistency across repeated tests is primarily driven by the model family (i.e., models made by a single vendor) while susceptibility to moral variance rises with LLM size within a model family. These findings highlight a hidden enterprise risk that requires attention during AI model selection, solution design, and in ongoing testing and monitoring.

“When AI models adopt different personas, they don’t just change how they speak, they can fundamentally alter their reasoning and decision-making,” said Renato Vicente, Director, TELUS Digital Research Hub. “In enterprise settings, this matters because these systems are increasingly being used to support important decisions and affect outcomes that may impact customers, employees and businesses at scale. Knowing that an AI model’s judgment may shift based on the persona it’s prompted to adopt by a user, companies need to assess when that variance is acceptable or creates too much risk, and select an AI model vendor and model size accordingly. Builders should also design appropriate guardrails as well as test and evaluate AI model behavior under different persona prompting conditions on an ongoing basis, especially when relying on them in high-impact use cases.”

What is persona prompting?

Persona prompting, also known as role prompting, refers to instructing an AI model to respond as if it were a specific type of person or role with specific expertise or knowledge, such as a business leader, teacher, or customer support agent, rather than responding as a neutral system. For example: “You are a certified financial planner, tell me where to invest my retirement savings.”

Persona prompting is also commonly used by model builders in system design and production to hardcode personas and assign fixed roles that will define the AI’s behavior. For instance, building an AI-powered customer service bot that’s configured to act as a helpful support agent with deep knowledge of product features and return policies. In practice, personas make AI outputs feel more consistent, helpful and context-aware without changing the underlying model.

How was TELUS Digital’s research done?

The study, conducted by researchers working at the TELUS Digital Research Hub in the University of São Paulo’s Center for Artificial Intelligence and Machine Learning (CIAAM), evaluated 16 leading AI model families, including Open AI GPT, Anthropic Claude, Google Gemini and X.ai Grok. Researchers prompted the models to adopt a range of personas, including contrasting pairs like a “traditionalist grandmother” and a “radical libertarian,” and then observed how each model responded as each persona.

To assess the responses, researchers used the Moral Foundations Questionnaire, a tool used in social psychology to measure how judgments are made across dimensions such as harm, fairness, authority and loyalty. Rather than analyzing individual answers, the researchers examined patterns across tens of thousands of responses to measure how consistently each model reasoned across different personas.

The study identified two properties:

Moral robustness describes how consistent a model’s judgments remain while it stays within a single persona.
Moral susceptibility captures how much a model’s judgments shift when it moves from one persona to another.

When evaluated together, moral robustness and moral susceptibility reveal whether an AI model maintains consistent moral reasoning or produces contradictory judgments based on an assigned persona.

TELUS Digital’s key findings on how personas affect AI model behavior

While it’s well understood that LLM outputs can shift when personas are added to prompts, TELUS Digital’s study highlights a more specific pattern. Moral robustness is driven mainly by model family, while moral susceptibility tends to increase with model size within the same family when the persona changes. This becomes a higher risk when those shifts show up in business decisions where consistency and oversight matter most, such as compliance, finance, healthcare or human resources.

The study identified additional patterns in how AI models respond when prompted to adopt different personas. Researchers described the findings as a “robustness paradox” because the models that were better at staying in character also showed larger shifts in moral judgments when the persona changed.

Persona based prompts can systematically influence moral reasoning in AI: Changes in the models’ judgment is not random, it shifts in predictable ways that are aligned to the assigned roles.
Judgment stability is primarily driven by model family: A subset of the findings indicated that:
- Claude demonstrated the highest overall moral robustness
- Gemini and GPT demonstrated moderate moral robustness
- Grok demonstrated comparatively low moral robustness

What are the real world impacts of persona prompting when building AI?

TELUS Digital’s research findings highlight the importance of conducting ongoing testing and oversight of AI models as part of a robust governance framework. This is particularly important when AI models are employed in scenarios where decisions affect people’s lives, safety, or rights, and in regulated environments, such as banking and finance, insurance and healthcare. Understanding how different AI models behave under different persona prompts is key information to help model builders and enterprises identify where variability is acceptable and where it can introduce risk.

“Our research findings underscore why enterprise AI deployment requires more than just picking the most advanced or largest model. Organizations must evaluate how individual models respond to variables such as persona prompting and choose options that deliver consistent, reliable outputs without introducing unexpected risk,” said Bret Kinsella, General Manager and Senior Vice President of Fuel iX at TELUS Digital. “Every time a system prompt is modified within the model, or the model is changed, it needs to be tested again to validate its judgment, consistency, and safety. The scale and frequency of this testing, monitoring and validation is significant. TELUS Digital developed Fuel iX Fortify to enable continuous automated red-teaming, including stress-testing how AI systems behave under different persona prompts.”

Are you ready to uncover the vulnerabilities in your GenAI applications? Learn more at: https://www.fuelix.ai/products/fuel-fortify

The TELUS Digital Research Hub brings together academic researchers and industry practitioners to study how advanced AI models behave in real-world, human-facing contexts. For more information visit: https://www.telusdigital.com/research-hub

The post TELUS Digital Research Reveals a Hidden Risk in AI Model Behavior first appeared on AI-Tech Park.

Related Posts