‘Attention is All You Need’ Author Suggests LLMs ‘Reflect’ in Pre-Training

Essential AI, a startup founded by Ashish Vaswani—co-author of the landmark ‘Attention Is All You Need’ paper that introduced transformers—released a study a few weeks ago, titled ‘Rethinking Reflection in Pre-Training’.

The research reveals that an AI model’s capacity for self-reflection on its reasoning arises during pre-training itself, rather than through fine-tuning or reinforcement learning, as is often perceived.

By testing an AI model (OLMo-2) at various stages of training using tasks with intentional errors, the researchers discovered that reflection naturally emerges during the training process.

The researchers created datasets across different domains such as mathematics, coding, logical reasoning, and knowledge acquisition. These datasets contained deliberately modified chain-of-thought (CoT) reasoning paths with introduced errors like arithmetic mistakes and logical inconsistencies. They also tested models on their ability to correct their incorrect reasoning.

A key finding was that reflection could be activated using simple and natural language triggers.

Interjections like “wait” prompted even partially trained models to pause, recognise and correct errors arising from the reasoning paths.

“For instance, an OLMo-2 7B model pre-trained on four trillion tokens displays self-correction on our six self-reflection tasks,” read a section of the study.

The study also revealed that as models underwent more training, their ability to identify mistakes and correct reasoning steadily improved.

The startup has also published a technical report that outlines the research methodologies, outcomes and results.

Essential AI emerged from stealth mode in December 2023, raising $56.5 million in a funding round led by Google, Thrive Capital, AMD, and others. The startup is focused on building ‘full-stack AI products’, including LLMs that increase productivity in ‘monotonous’ workflows.

Vaswani was also joined by Niki Parmar as a co-founder, who had also co-authored the ‘Attention Is All You Need’ paper. However, she recently joined the AI startup Anthropic.

Attention Is All You Need was a research paper published by Google in 2017 that introduced the ‘Transformer’ architecture, which serves as a backbone for most, if not all, large language models today.

The post ‘Attention is All You Need’ Author Suggests LLMs ‘Reflect’ in Pre-Training appeared first on Analytics India Magazine.

Related Posts