𝐓𝐡𝐞 𝐁𝐫𝐚𝐢𝐧-𝐈𝐧𝐬𝐩𝐢𝐫𝐞𝐝 𝐌𝐢𝐧𝐢-𝐌𝐨𝐝𝐞𝐥 𝐁𝐞𝐚𝐭𝐢𝐧𝐠 𝐆𝐢𝐚𝐧𝐭𝐬 𝐚𝐭 𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠

Aug 13, 2025

The Hierarchical Reasoning Model (HRM) is a 27 million parameter network, trained entirely from scratch on only 1,000 examples, outperforming models hundreds of times larger on certain reasoning challenges.

The backdrop is obvious to anyone following AI. The Transformer architecture, the foundation of GPT‑4, Claude, and Gemini, is shallow in computational depth.

These systems rely heavily on chain‑of‑thought prompting: break a problem into intermediate steps, output each step in language, and hope the path holds together. The approach is prone to stalling or collapse on tasks that require deep search or backtracking. It is slow, verbose, brittle, and computationally demanding.

𝐇𝐑𝐌 𝐭𝐚𝐤𝐞𝐬 𝐚 𝐯𝐞𝐫𝐲 𝐝𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐭 𝐚𝐩𝐩𝐫𝐨𝐚𝐜𝐡, 𝐝𝐢𝐫𝐞𝐜𝐭𝐥𝐲 𝐢𝐧𝐬𝐩𝐢𝐫𝐞𝐝 𝐛𝐲 𝐡𝐨𝐰 𝐭𝐡𝐞 𝐛𝐫𝐚𝐢𝐧 𝐦𝐚𝐧𝐚𝐠𝐞𝐬 𝐦𝐮𝐥𝐭𝐢‑𝐬𝐭𝐚𝐠𝐞 𝐫𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠 𝐚𝐜𝐫𝐨𝐬𝐬 𝐝𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐭 𝐭𝐢𝐦𝐞𝐬𝐜𝐚𝐥𝐞𝐬.

Key features:

Two recurrent modules: a high‑level network for slow, abstract planning and a low‑level network for rapid, fine‑grained updates. This separation mirrors cortical hierarchies and multi‑frequency rhythms observed in neuroscience.
Hierarchical convergence: the low‑level loop reaches a local equilibrium before the high‑level module steps forward, preventing premature convergence and keeping both modules active across many reasoning cycles.
One‑step gradient approximation from Deep Equilibrium Models: eliminates backpropagation through time’s O(T) memory cost and trains with O(1) memory, using only the final equilibrium states for gradients.
Deep supervision: each reasoning segment receives its own loss and update, with state detachment between segments for stability. This delivers frequent corrective feedback and acts as a regularizer.
Adaptive Computation Time (ACT) via Q‑learning: the model learns to halt early for simple problems and run additional cycles for complex ones, saving compute without a performance drop.
Inference‑time scaling: at test time you can raise the maximum segment count Mmax to boost performance, no retraining required.

The results are backed by numbers:

Sudoku‑Extreme: near‑perfect accuracy on a dataset averaging 22 search backtracks per puzzle. Comparable‑sized Transformers fail completely.
Maze‑Hard (30×30): solves optimal paths in mazes requiring path lengths over 110 steps, again near‑perfect where even 175M‑parameter Transformers fall below 20% accuracy.
ARC‑AGI: 40.3% vs o3‑mini‑high at 34.5% and Claude 3.7 at 21.2%. Achieved with no pretraining and no CoT supervision.
The high‑level module’s participation ratio (PR) is almost 3× that of the low‑level module, directly paralleling observed cortical dimensionality hierarchies in mouse brain studies.

It is all reproducible. Sapient Intelligence has released the code, training configs, evaluation scripts, augmentation and preprocessing pipelines, plus checkpoints. The repo includes toggles for ACT and deep supervision, and scripts to replicate ARC’s “1000 augmented variants plus voting” evaluation, which is compute‑heavy but standard in ARC research.

A reality check: that top ARC number comes with heavy test‑time augmentation and voting, which has cost implications. The architecture’s performance on Sudoku and Maze with single‑pass inference shows that the depth and efficiency gains hold even without augmentation tricks.

From a practical point of view, HRM shows that reasoning can live in latent states rather than spilling into output tokens. That means less data reliance, greater stability, and direct control of test‑time “thinking length” without retraining. It is quick on easy cases, deliberate on hard ones, and grounded in a training and convergence strategy that is both efficient and biologically plausible.

This is not AGI. It is a focused design that sidesteps some of the fundamental computational limits of shallow Transformer stacks. For anyone building reasoning systems, or exploring neuroscience‑inspired architectures, the paper and its open‑source implementation deserve attention.

Code & Checkpoints: https://github.com/sapientinc/HRM

Serge’s Substack

Discussion about this post