

The rapid growth of the Artificial Intelligence (AI) sector has triggered a series of debates around fostering innovation while protecting businesses as well as consumers. One of the subjects drawing particular attention is the regulation of synthetic data.
Synthetic data – the creation of artificial datasets through algorithms and models, can democratise AI by providing startups with data that they might otherwise struggle to access. However, if misused and unregulated, it could lead to the proliferation of fake data, causing significant concerns regarding data integrity and privacy.
However, the Competition Commission of India’s market study on AI suggested that the unregulated generation of synthetic data presents risks. Without proper supervision, it could lead to the creation of fake data that could skew model outputs, leading to unfair advantages for certain players and potentially harming consumers.
“While synthetic data could democratise AI by giving startups training material, it also risks fake data/content proliferation if not overseen properly,” the study points out.
A Call for Open Data Commons
The importance of open-access synthetic data was echoed by several industry leaders. Yasir Wani, a product lead manager at a leading insurance company based in Dubai, believes that synthetic data could become a new layer in the AI stack, akin to an “open-source commons.”
He suggests the creation of anonymised, high-quality datasets that are responsibly generated through privacy-safe methods. Sectors such as healthcare, transportation, fintech, and agritech can utilise these datasets, allowing smaller entities like startups, researchers, and institutions to access them.
“A tiered access model could work well,” Wani explains. Smaller datasets could be made available for download, while larger, dynamic datasets might be accessed through APIs.
Sensitive data could be used within secure, sandboxed environments, preventing the exposure of raw information. He further suggests that public-sector projects should mandate the use of shared synthetic datasets, which would encourage even larger players to participate in a more open ecosystem.
For India to lead in synthetic data democratisation, industry-wide collaboration will be key.
Dhiraj Udapure, vice-president of technology and business development at SCS Tech, shares this sentiment. He emphasises the importance of modular, interoperable architectures.
This, he argues, would allow businesses to transition seamlessly between AI ecosystems without becoming overly reliant on a single vendor. “Synthetic data is a key enabler for developing robust AI models when used responsibly,” Udapure says. For him, collaboration between industry, academia, and policymakers will be vital to ensure that synthetic data pipelines are transparent, trustworthy, and efficient.
AI Kosh and Data Sovereignty
The Indian government has made strides in addressing concerns surrounding data access and the democratisation of AI. One of the key initiatives is AI Kosh, an AI data repository launched by the government in March 2025. AI Kosh hosts over 2,000 datasets, including non-public and non-private data, making it easier for smaller firms to access high-quality data without violating privacy.
Sagar Vishnoi, co-founder and director of Future Shift Labs, believes that AI Kosh will be instrumental in facilitating access to synthetic data. He suggests that the government should develop a certification process that assigns a non-private tag to data, ensuring compliance with data privacy regulations while allowing data to be used freely by smaller firms and startups.
The Indian government is also pushing for sector-specific data repositories that can be compiled using computational safeguards. Vishnoi highlights that this would be beneficial for research, as they would contain specialised datasets that would be valuable for specific industries. This approach will make data more accessible while ensuring it is gathered responsibly and ethically.
Challenges of AI Sovereignty and Market Lock-In
Despite the promise of synthetic data and open data initiatives, the CCI study identifies several challenges that could keep India from capitalising on these opportunities. A key concern is the concentration of AI value chains, where a few large firms dominate multiple layers of the AI stack, data collection, compute, and deployment. The report suggests that this dominance could hinder competition, challenging the smaller players’ entry into the market.
Industry experts agree, noting that the market is becoming increasingly locked in due to the high switching costs associated with large hyperscalers, such as Google Cloud, Microsoft Azure, and AWS.
These platforms control the AI technology stack with vast amounts of data and compute resources, making it difficult for local platforms to compete.
Udapure suggests that Indian companies can offer viable alternatives by embracing interoperable platforms. “If APIs, model formats, and vector databases follow common standards, switching providers becomes far easier,” he states. He proposes that sovereign inference gateways run by neutral bodies like the National Informatics Centre (NIC) or the Centre for Development of Advanced Computing (CDAC) could help businesses route AI workloads across different clouds while comparing costs and performance.
India’s AI sovereignty is a subject of great importance. Vishnoi stresses the need for Indian platforms to build robust technical capabilities, security credentials, and data privacy measures. “The Indian ecosystem needs to push AI sovereignty to give alternatives to big players,” he says. Indian companies need to ensure that their platforms are competitive in terms of both technical capacity and security to rival established global players.
The government’s initiatives, such as AI Kosh and the AI Sovereignty movement, are key steps toward a more open and competitive AI ecosystem in India. However, for these initiatives to succeed, industry collaboration is essential.
The post India’s AI Boom Could Crash Without This One Crucial Change appeared first on Analytics India Magazine.


