NVIDIA & AMD Keep Winning Despite Custom Chips Threat

NVIDIA’s dominance in AI infrastructure rests on a single principle: that GPUs are general machines. For years, this flexibility has been the bedrock of the company’s skyrocketing valuation. 

Even AMD’s path to challenging NVIDIA hinges entirely on building more efficient GPUs for AI workloads. 

However, competition to GPUs has emerged from companies producing Application Specific Integrated Circuits, better known as ASICs. 

These are claimed to offer better performance and cost efficiency compared to GPUs for specific AI workloads. 

Examples include Google’s TPUs, Amazon AWS’s Trainium, Cerebras’ Wafer Scale Engine, Groq’s LPUs (language processing units), and a long list of other companies like SambaNova, Etched, Mythic, and more.  

While Google has largely succeeded with TPUs, and companies like Anthropic also utilise them, TPUs are only available through Google Cloud, not as hardware for on-premises deployment like NVIDIA and AMD offer. 

Cerebras is one of the companies that provides both cloud and on-premise solutions in the custom AI accelerator market, competing with GPUs. 

Recently, Cerebras raised $1.1 billion and claimed to have outperformed NVIDIA’s Blackwell in inference speeds, which involve generating outputs from trained AI models. 

In addition to inference, Cerebras also supports AI model training with its Wafer-Scale Engines. 

And its latest iteration is equipped with more memory, transistors, and petaflops than the NVIDIA Blackwell B200 GPU, claiming to offer 20x faster inference speeds and better training performance as well. 

Similarly, Groq, another ASIC manufacturer, claims to offer faster inference speeds for certain AI models than NVIDIA. 

While excelling on paper, how promising are these solutions in the long run, and in the real world, to replace GPUs? 

The Flexibility Problem

While ASICs provide faster inference speeds, they do so only for particular AI architectures and workloads for which they are optimised by the vendor. 

In an interaction with AIM, Anush Elangovan, VP of AI software at AMD, explained why GPUs will continue to hold superiority. 

He said that by the time an ASIC is shipped optimised for one architecture, the field has already moved to something new. 

“So if you want programmability, the general flexibility, to be able to run the race, you want to be on the GPU section of the graph,” said Elangovan. “If you know exactly the workload you have, and this is all you need, sure, you can buy custom AI accelerators”

ASICs optimise for known workloads. Cerebras supports a few of Meta’s open-source Llama models (Llama 4 Scout, 3.1 8B, 3.3 70B), OpenAI’s GPT OSS, and Alibaba’s Qwen 3-32B. 

That specificity drives efficiency, but researchers don’t wait for accelerator support before experimenting with new architectures.

CUDA Wins

However, GPUs offer developers flexibility in programming them, with CUDA for NVIDIA GPUs and ROCm for AMD, enabling easy deployment of any open-source AI model. 

In a recent podcast episode, Jensen Huang, CEO and founder at NVIDIA, stated that AI model architectures are evolving so rapidly that researchers need tools to iterate quickly across dozens of experiments, testing different mechanisms and strategies. 

“CUDA helps you do all that because it’s so programmable,” Huang said.

Source: NVIDIA

Besides competition with ASIC, CUDA has played an enormous role in enabling NVIDIA to lead other GPU makers, such as AMD. 

CUDA, with 6 million developers, offers an ecosystem, vast libraries, and tooling, creating a moat that has often been difficult for competitors to crack.

Moreover, any new architecture developed by researchers today is primarily worked upon and optimised for NVIDIA GPUs.

“The software is evolving constantly because of what works best on NVIDIA,” said Dylan Patel, the founder of the AI market research firm SemiAnalysis. 

Even as ASIC makers like Cerebras add support for more AI model architectures and integrate newer frameworks, GPUs will always have an edge as the default platform where researchers first develop and optimise new architectures.

Take Samsung AI Research’s Tiny Recursive Models (TRM) as an example. This small AI model outperformed Google’s Gemini 2.5 Pro and OpenAI’s o3-mini on specific benchmarks. 

The study’s author noted that TRM was trained on NVIDIA H100 GPUs. As a result, the model is immediately compatible with NVIDIA’s GPU ecosystem for future use cases, both internally and externally. 

Besides the reasons mentioned earlier, discussions on the popular tech forum Hacker News explore additional limitations of Cerebras, including memory architecture, economic inefficiencies, and manufacturing challenges. 

Besides, Google’s TPUs are a strong competitor to GPUs, given their widespread use in developing and deploying Google’s Gemini AI models and the way companies like Anthropic are expanding their use of the hardware.

However, it poses multiple caveats for researchers and developers. 

TPUs are accessible only through Google’s cloud ecosystem, leaving developers with no alternatives. Additionally, TPUs are primarily optimised for specific frameworks such as TensorFlow, JAX, and Google’s compiler stack (XLA). 

In contrast, GPUs are more versatile, supporting a broader range of frameworks, custom kernels, and specialised algorithms.

A Work In Progress

Having said that, ASIC makers like Cerebras and Groq very well recognise this challenge, especially with the programmability of their hardware systems. 

For instance, Groq’s developer community grew from 1 million to 2  million users between April and October 2025. 

This helps expand the range of models their hardware can effectively support, accelerate adoption, and reduce friction for developers experimenting with non-standard or custom AI workloads. 

Groq’s CEO, Jonathan Ross, recently discussed at an event in Bengaluru how the company’s LPUs work well with the popular mixture of experts model architecture. 

In a conversation with AIM following their $1.1 billion funding raise, Cerebras CEO Andrew Feldman highlighted how the company is tackling the “NVIDIA lock-in” challenge. 

He cited AlphaSense, a market intelligence platform, as an example of a customer that successfully migrated from NVIDIA GPUs to Cerebras using its API. 

Feldman added that moving to the Cerebras Cloud is straightforward: “just a few keystrokes, around 10, to switch from a GPU-based cloud solution to Cerebras,” enabling users to immediately leverage their speed without deep integration work.

While ASIC makers continue to improve their ecosystem, AI model makers like OpenAI and Meta are looking to build their own chip. 

Recently, OpenAI announced plans to deploy 10 gigawatts of custom AI accelerators in collaboration with Broadcom. 

It was also reported that Project Rainer, AWS’s initiative to build an AI supercomputer, will now contain half a million of the company’s Trainium2 chips. 

This order would have traditionally gone to NVIDIA.

Even Google, despite owning its TPU stacks, continues to purchase large amounts of NVIDIA GPUs.  

Nonetheless, all these companies still purchase NVIDIA or AMD GPUs extensively, even though they have had success with their own ASICs. 

For hyperscalers, custom chips are primarily about lowering reliance on NVIDIA, ensuring reliable large-scale computation, and diversifying their compute infrastructure as they better understand which workloads are best suited for GPUs and which are not. 

While NVIDIA leads in GPU adoption, the ecosystem is by no means ignoring the benefits that ASIC makers offer. 

Ongoing efforts to overcome the well-known limitations will undoubtedly create new use cases in which developers can select customised silicon that delivers tangible advantages in efficiency, cost, and performance. 

The post NVIDIA & AMD Keep Winning Despite Custom Chips Threat appeared first on Analytics India Magazine.

Scroll to Top