Baidu’s ERNIE 4.5 is Built On a ‘Heterogeneous MoE’ Architecture

Baidu AI China

Chinese tech giant Baidu has open-sourced the ERNIE 4.5 family of models on Monday, with numerous variants from 0.3B (billion) to 424B parameters. Baidu has released both language and multimodal (image and video) AI models under the ERNIE 4.5 umbrella, and all of them are publicly accessible under the Apache 2.0 license. 

The 300B variant of ERNIE-4.5 outperforms DeepSeek-V3 671B on several benchmarks, across general reasoning, math, and coding tasks. 

Besides, the 21B variant outperforms Alibaba’s Qwen3-30B-A3B on several math and reasoning benchmarks, despite containing 30% fewer parameters. 

The core architectural innovation of the ERNIE 4.5 models is their heterogeneous modality Mixture of Experts (MoE) structure. This was designed from the ground up for multimodal learning. It uses separate experts for text and vision, along with shared experts to integrate knowledge across all modalities. 

“These architectural choices ensure that both modalities are effectively represented, allowing for mutual reinforcement during training,” Baidu stated in the technical report. 

This technique also improves the computational efficiency. Since visual experts have one-third the intermediate dimension size of text experts, it reduces the computation for visual tokens by about 66%. 

Moreover, in text-only scenarios, vision experts can be skipped to reduce memory overhead, the authors noted. 

Source: x.com/rohanpaul_ai

Furthermore, the company stated that through a series of ‘extreme optimisations’, the largest model in the family achieved 47% Model FLOPs Utilisation (MFU) on NVIDIA H800 GPUs. Based on the training approach, the largest ERNIE 4.5 model was able to achieve “optimal training performance” with limited compute resources—around 96 GPUs, according to the company. 

Baidu stated that utilising techniques in pre-training, such as ‘intra-node’ expert parallelism, memory-efficient pipeline scheduling, FP8 mixed-precision training, and fine-grained recomputation methods, has helped achieve the stated levels of efficiency. 

For more details about the model’s architecture, pre-training and post-training techniques, and evaluations, refer to the PDF of the technical report.  

The post Baidu’s ERNIE 4.5 is Built On a ‘Heterogeneous MoE’ Architecture appeared first on Analytics India Magazine.

Scroll to Top