Accent, Identity, and AI: What Sanas.ai is Solving at Scale

In a world where digital communication is more prevalent than ever, accents can still pose a barrier to clear understanding. Shawn Zhang, Chief Technology Officer and co-founder of Sanas, is working to overcome this challenge. In this conversation, he discusses the origin of Sanas, the challenges the team has faced, and what lies ahead for the future of Speech AI.

From Stanford Class to Product Development

Sanas was founded in April 2020, during the peak of the global lockdown. At the time, Shawn was a sophomore at Stanford University. He recalled a sense of restlessness that helped set the stage for what would become Sanas.

“Classes were virtual and disengaging,” he said. “A friend of ours had gone back to Nicaragua to support his family and picked up a job as a customer service representative. He began telling us how people kept saying they couldn’t understand him because of his accent.”

That experience led Shawn and his co-founder to a realisation: accent can be a significant barrier in customer service interactions. This is when they began exploring the idea of a real-time algorithm that could translate accents, allowing speakers to be better understood without losing their identity.

What started as a college project in a Stanford entrepreneurship class turned into a startup when a guest lecturer and angel investor wrote their first $50,000 cheque.

Today, most of Sanas’ revenue comes from the contact centre industry, where the company first recognised the need.

“Customers were saying, ‘I just can’t understand you’, and that was affecting the quality of service. Our product immediately found a natural fit here,” Shawn said. “Now, we’re expanding into enterprise communications, and eventually into B2C applications.”

The largest market for Sanas is the contact centre industry, where the company first identified the issue of communication barriers caused by accents and found a strong product-market fit. Their technology improves communication between agents and customers by reducing misunderstandings and friction during calls. Sanas is also expanding into enterprise communications, aiming to support global businesses with diverse teams that face similar challenges.

Inside the Technology

The accent translation technology for a real-world production environment was developed after the founders saw a friend face discrimination in a contact centre due to his accent. This led to the idea of real-time accent translation, which creates a bridge to better understandability and intelligibility while keeping the speaker’s voice, tone, and rhythm.

Customers have reported better communication, leading to higher customer satisfaction scores (CSAT). Interestingly, there has also been a rise in a second, often overlooked metric — employee satisfaction scores (ESAT).

Many agents reported that the tool not only managed to reduce customer complaints but also made their jobs less stressful. Zhang revealed that an agent commented, “I no longer hear customers saying they can’t understand me—it’s a game changer.”

The noise cancellation feature works alongside accent translation to improve clarity during calls, especially in loud or busy environments, bridging the gap between understandability and intelligibility.

Together, the two features are designed to reduce friction in communication and support better outcomes for both parties involved in the call.

Currently, the company operates on a B2B model, working with organisations rather than individual users. However, it plans to transition to B2C in the future, with potential use cases involving personal communication across borders, such as between friends and family.

Is Speech Going to Lead Us to AGI?

AGI, or Artificial General Intelligence, is typically defined as an AI system with the ability to understand, learn, and apply knowledge across a wide range of tasks at a human-like level.

“I think we’re still just getting started because we’ll see this explosion of speech interfaces,” Zhang said while discussing the growing dominance of the area, which he believes is more than just a trend.

“Speech is the most human form of communication. It’s emotional and productive. If you ask me, voice is essential for AGI. It’s how we collaborate and relate to each other.”

He believes Sanas’ strength lies in augmenting human-to-human interaction, rather than replacing it.

“Even when my accent is translated, I still want to sound like me. I want to keep my rhythm, emotion, and identity. That’s what makes our approach different.”

Optimism for a Speech-First Future

According to Zhang, India is especially ripe for a Speech AI boom.

“India’s mobile-first, highly multilingual society makes speech not just important, but essential.”

With over 28 official languages and a mobile-heavy user base, Zhang sees India as a proving ground for technologies like Sanas. “People are more comfortable speaking than typing. Speech unlocks access to services and communication in a way text doesn’t.”

On the global level, Zhang explained that the future of Speech AI is shaped by long-term, unchanging global trends. “I still think we’re just at the dawn of the Speech AI revolution.”

He pointed out that the world’s population is growing, which means more communication and interaction will continue to take place. At the same time, globalisation is increasing, leading to a rise in international business and electronic communication, such as video conferencing.

These developments create more opportunities for AI to support and enhance voice interactions across regions and languages. He believes that labour markets will be transformed as a result, with sectors like customer service and healthcare relying more on global collaboration. As user interfaces evolve, speech is likely to become the primary mode of interaction.

However, companies operating in this space also face certain challenges.

“Data sensitivity is a major concern. Enterprises want assurance that their speech data is secure. This pushes us towards on-device, edge computing.”

Another issue is hallucinations, which occur when AI systems confidently deliver incorrect outputs.

“Your AI might work 95% of the time, but that 5% failure can cause serious damage. We tell clients, ‘Don’t evaluate us on our wins, evaluate us on our losses. That’s where the risks lie.”

Zhang is confident that Speech AI will play a transformative role in how we interact with technology. As innovation gains momentum and trust continues to build, speech is set to become the most natural interface between humans and machines. In countries like India, this opens up exciting new possibilities for how people work, learn, and connect.

Despite the challenges, Zhang remains confident that the benefits of Speech AI far outweigh the hurdles. As innovation accelerates and trust in the technology grows, speech is poised to become the most natural interface between humans and machines.

For countries like India, this shift could unlock entirely new ways of working, learning, and connecting. “The goal isn’t just clearer communication, it’s a future where every voice can participate fully, no matter the accent, language, or location.”

The post Accent, Identity, and AI: What Sanas.ai is Solving at Scale appeared first on Analytics India Magazine.

From Stanford Class to Product Development

Inside the Technology

Is Speech Going to Lead Us to AGI?

Optimism for a Speech-First Future

Related Posts