How letting AI choose its own path made it smarter

Outstanding paper award at ICML 2025 (episode 4)

Sep 15, 2025

Hello fellow researchers and AI enthusiasts!

Welcome back to The Future of AI! In this fourth episode of my ICML review series, we get back to the issue explored in “Why AI struggles to think outside the box” and discuss whether large language models could truly be creative, or whether they will forever remain just clever parrots.

What if LLMs didn’t have to think in straight lines? A new approach that lets them choose their own order turns out to be much better at solving puzzles, planning tasks, and maybe even unlocks new levels of creativity.

Full reference : J. Kim, K. Shah, V. Kontonis, S. Kakade, and S. Chen, “Train for the worst, plan for the best: Understanding token ordering in masked diffusions,” arXiv preprint arXiv:2502.06768, 2025

Context

Generative AI models that work with text, code, or other discrete data typically rely on autoregressive models (ARMs), which generate words (or tokens) one by one in a fixed order (usually left to right). An autoregressive model (“auto” means “self”, and “regressive” means it looks at past data) is a simple way to predict future values in a series based on past values. Imagine you want to predict how much coffee you’ll drink tomorrow. An ARM might say: “Well, you drank 2 cups yesterday and 3 cups the day before. Based on that, I think you’ll drink about 2.5 cups tomorrow.”

Recently, masked diffusion models (MDMs) have emerged as an alternative. Unlike ARMs, MDMs don’t follow a strict order during generation. They can fill in tokens in any sequence, offering potential advantages for tasks like puzzle solving or planning. However, this flexibility comes at a certain cost during training: MDMs must learn a much higher number of acceptable sequences of words or tokens. In other words, when a LLM is trained on the next-word prediction, with ARM and a fixed order, there is only 1 “correct” next word: the actual next word in the sentence. With MDM, however, multiple next words are considered “correct” which means that the training process gets multiplied by a factor X.

Key results

The Authors investigate whether the benefits of flexible inference in MDMs can outweigh the cost of harder training, and find that:

Training is harder for MDMs than ARMs because they are shown exponentially many word combinations, and some doesn’t make much sense. This helps explain why MDMs often underperform in language modeling tasks.
Inference can compensate for that. If MDMs choose the order of unmasking tokens strategically instead of randomly, performance improves dramatically. This approach is called “adaptive inference”. It uses simple strategies like filling in the easiest tokens first, based on model confidence.
On Sudoku puzzles, basic MDMs solved less than 7% of puzzles correctly. Adaptive inference boosts this to nearly 90% accuracy, even outperforming ARMs trained with extra information about the correct solving order. Similar results hold for Zebra puzzles (logic-based riddles).
Even on harder, unseen Sudoku puzzles, adaptive MDMs maintain higher accuracy than ARMs trained with explicit ordering.

This research shows that order-independent training paired with smart, adaptive inference can outperform traditional approaches on reasoning tasks. This is particularly valuable for games and puzzle solving, planning tasks in robotics or scheduling (where step order is uncertain), code completion and structured text generation (where flexibility in token order matters) and even scientific modeling, such as protein design, where constraints and dependencies vary.

My take

In “Why AI struggles to think outside the box” we discussed the difficulty of modern chatbots to be creative : the problem is their training to strictly predict the next most plausible word. This paper presents a possible solution to this problem: let the model choose the order of words on its own. The impressive performance boost, from 7% to 90% accuracy on Sudoku puzzles, demonstrates the high potential of this novel technique. And the other perspectives, such as the path to AGI, are even more considerable.

Looking ahead

To summarise, we’ve seen how breaking free from strict word order allows LLMs to make surprising leaps in reasoning and problem-solving.

In the nest episode, on Thursday, we’ll turn from puzzles to people, exploring why more intelligence in AI isn’t always better, especially when it comes to public services, and what that means for trust and governance.

If you don’t want to miss that, make sure to subscribe to The Future of AI!

The Future of AI