Google Shows Pre-Training is Not Dead

Google announced an update to the Gemini 2.5 Pro family of models on Tuesday and released an accompanying technical report.

The report outlines the architecture of the Gemini 2.5 models and outlines their capabilities, behaviours, and performance on various benchmarks. Google revealed that the Gemini 2.5 models are based on a sparse mixture-of-experts (MoE) transformer.

In such types of models, only a subset of parameters (experts) is activated for each task. This reduces computational costs by focusing resources on relevant parameters rather than using all of them for every task.

Google has also stated that these model series show significant improvement in boosting large-scale training stability, signal propagation, and optimisation dynamics, “resulting in a considerable boost in performance straight out of pre-training compared to previous Gemini models”.

The above statement highlights an opportunity to improve AI models through pre-training, a technique that has been debated over the last few months.

Improving an AI model in the pre-training process involves using additional compute and datasets to enhance performance.

However, several reports emerged last year that observed diminishing returns while using additional compute and the lack of availability of newer datasets after all the finite available data is exhausted from the Internet.

Google said the Gemini 2.5 models were trained on the company’s fifth-generation TPUs, with “8060-chip pods”. Notably, this is a significant jump from the 4096-chip pods of their fourth-generation TPUs, which were used to train the Gemini 1.5 models.

The company added that, compared to the Gemini 1.5 pre-training dataset, it used several new methods to improve data quality.

Thus, the Gemini 2.5 family of models has shown significant improvement across math, coding, and reasoning tasks compared to the 1.5 Pro family of models.

Besides, Gemini 2.5 models are trained with reinforcement learning to use additional compute during inference, when outputs are being extracted from the model, to spend more time on ‘thinking’ or reasoning.

“The combination of these improvements in data quality, increased compute, algorithmic enhancements, and expanded capabilities has contributed to across-the-board performance gains,” Google stated.

As per the latest update, the Gemini 2.5 family of models—2.5 Pro, 2.5 Flash, and 2.5 Flash-Lite—are out of preview and now available as stable versions.

According to Artificial Analysis, a benchmarking platform that evaluates the performance of AI models across various benchmarks, the 2.5 Pro model is one of the best-performing models available today.

Although the lightweight Gemini 2.5 Flash has the fastest output speed among all AI models, it also delivers competitive performance on par with some of the top AI models available today.

The post Google Shows Pre-Training is Not Dead appeared first on Analytics India Magazine.

Related Posts