LLM Research Papers: The 2025 List (January to June)

As some of you know, I keep a running list of research papers I (want to) read and reference.

About six months ago, I shared my 2024 list, which many readers found useful. So, I was thinking about doing this again. However, this time, I am incorporating that one piece of feedback kept coming up: “Can you organize the papers by topic instead of date?”

The categories I came up with are:

  1. Reasoning Models

    – 1a. Training Reasoning Models

    – 1b. Inference-Time Reasoning Strategies

    – 1c. Evaluating LLMs and/or Understanding Reasoning

  2. Other Reinforcement Learning Methods for LLMs

  3. Other Inference-Time Scaling Methods

  4. Efficient Training & Architectures

  5. Diffusion-Based Language Models

  6. Multimodal & Vision-Language Models

  7. Data & Pre-training Datasets

Also, as LLM research continues to be shared at a rapid pace, I have decided to break the list into bi-yearly updates. This way, the list stays digestible, timely, and hopefully useful for anyone looking for solid summer reading material.

Please note that this is just a curated list for now. In future articles, I plan to revisit and discuss some of the more interesting or impactful papers in larger topic-specific write-ups. Stay tuned!

1. Reasoning Models

This year, my list is very reasoning model-heavy. So, I decided to subdivide it into 3 categories: Training, inference-time scaling, and more general understanding/evaluation.

1a. Training Reasoning Models

This subsection focuses on training strategies specifically designed to improve reasoning abilities in LLMs. As you may see, much of the recent progress has centered around reinforcement learning (with verifiable rewards), which I covered in more detail in a previous article.

Annotated figure from Reinforcement Pre-Training, https://arxiv.org/abs/2506.08007

1b. Inference-Time Reasoning Strategies

This part of the list covers methods that improve reasoning dynamically at test time, without requiring retraining. Often, these papers are focused on trading of computational performance for modeling performance.


Read more

Scroll to Top