BharatGen’s ‘Recipe’ for Building a Trillion Parameters Indic Model

Karya Google

BharatGen, the IIT Bombay-led consortium that bagged the biggest GPU allocation under the IndiaAI Mission, is now laying out the framework to achieve its most ambitious target yet — a trillion-parameter model. 

The task is not just about scaling compute, but building the scaffolding India lacks — data, talent, and what the group calls “recipes” for sovereign AI.

“We picked this ambitious goal because we really want to move the needle on what’s possible to build in India today,” Rishi Bal, head of BharatGen, told AIM. “But this is not just about the models. It’s about the entire ecosystem… it’s a steep ramp.”

The numbers are staggering. BharatGen has secured 13,640 H100 GPUs and close to ₹1,000 crore in funding, the single-largest allocation in the country. 

It already has a series of early releases under its belt — Param-1, a bilingual 2.9-billion-parameter model, Shrutam for speech recognition, and Patram, a vision-language model for document understanding. But, scaling to a trillion parameters is a different order of challenge.

The Recipe

BharatGen started with a consortium of seven institutes and is now expanding that base to nine by adding IIT Kharagapur and IIIT Delhi. Bal said that to reach the larger model, the team first needs the talent, and that is what it is currently focusing on. 

The first milestone is to build a robust research ecosystem and next 12 months have been assigned to build just that, Bal said, adding that the numerous language specific problems, challenges, and knowledge exchange require the group to partner very closely.

Data is the other pillar. India is short on the kind of large-scale, high-quality datasets that powered the first wave of LLMs in the West. Several startups took the synthetic data route, generating Indic data from models like Llama and Mistral, but that path does not seem completely scalable.

To fix this, BharatGen has chosen a ground-up approach. Teams have been deployed in Madhya Pradesh to convince publishers and radio stations to contribute. 

“This may not give us trillions of tokens, but it gives us high-quality, human-generated data,” Bal said.

Beyond collection, the group is investing in data provenance — metadata and curation pipelines that are otherwise a “black art” for small players. “We need sovereign recipes, including when to build small models from scratch, when to distill from larger ones, and when to checkpoint. These are crucial for the ecosystem,” IIT Bombay professor Ganesh Ramakrishnan told AIM

The group also draws a sharp line against aping Western models. “Indian languages are as important as English. You cannot just tokenise them into English-heavy models. This is about building on our own terms,” Ramakrishnan said.

BharatGen is working with NASSCOM AI to fold this into a national AI stack.

On synthetic data, BharatGen is pragmatic. Bal said it cannot be written off. “You have to look at the right mixture. Crawling, OCR, synthetic generation, community contribution — they’re all tools. The question is about balance, and that’s where the research consortium helps.”

A Sovereign Ecosystem, Not Isolation

BharatGen was seeded by the Department of Science and Technology (DST) and is structured as a Section 8 non-profit company under IIT-Bombay as per the Companies Act, 2013. The idea is to make public goods, not returns. 

“If I set this up as a for-profit, there would always be questions,” Bal said. “As a Section 8 [entity], we have institutional credibility, and it unlocks partnerships — academic and private — that would be harder otherwise.”

“It’s like another investor who has some investment expectation of return.

And in return, some expectation of control to ensure that its interests are served well. This is no different from raising 100 crores from a private equity fund or a sovereign fund,” Bal added.

In India, the government has provided GPUs and funding, and does take a stake in return, structured through board seats or convertible debentures. Critics worry this could give the state too much control over the models. Bal pushed back. “It’s just another investor with expectations. The legal structures are in place.”

Comparing the AI ecosystem in China, which is heavily supported by their government, Bal said India needs a similar approach to develop an ecosystem for AI that thrives.

“Because if you get these large players [like OpenAI, Google, or Meta] very early, there’s no protection for the local players,” Bal said, highlighting the surge of AI models from China that are built by private players.

Earlier, Abhishek Upperwal from Soket AI Labs, another participant in the IndiaAI Mission, said that equity is a good idea. Nikhil Malhotra from Tech Mahindra’s Makers’ Lab, which is now also part of the mission, said that as long as the equity doesn’t interfere with their direction, it’s “a fantastic idea.”

Read: IndiaAI’s Equity in AI Startups is a ‘Fantastic’ but Risky Idea

BharatGen insists sovereignty doesn’t mean shutting the door on global players. The team recently signed an agreement with IBM to collaborate on model technologies, as well as data preparation, and scaling data prep work for complex, governed pipelines. This is after the team’s continued partnership with NVIDIA.

“We are not collaborating with IBM on foundational models,” Ramakrishnan clarified. “We are building our sovereign AI stack. Partners can add value on top of it.” The foundation will remain Indian.

IBM wrote in a blog post that BharatGen will also integrate with IBM’s growing family of Granite models, and build use case templates for those industries with IBM watsonx and Red Hat OpenShift AI. 

With nearly 14,000 GPUs at its disposal, BharatGen sits at the heart of India’s AI push. But if its leaders are to be believed, the real test would not be in hitting the trillion-parameter milestone alone. It would be whether the effort can seed a sovereign ecosystem robust enough to outlast the compute cycles.

The post BharatGen’s ‘Recipe’ for Building a Trillion Parameters Indic Model appeared first on Analytics India Magazine.

Scroll to Top