๐๐ก๐ ๐๐ซ๐๐ข๐ง-๐๐ง๐ฌ๐ฉ๐ข๐ซ๐๐ ๐๐ข๐ง๐ข-๐๐จ๐๐๐ฅ ๐๐๐๐ญ๐ข๐ง๐ ๐๐ข๐๐ง๐ญ๐ฌ ๐๐ญ ๐๐๐๐ฌ๐จ๐ง๐ข๐ง๐
The Hierarchical Reasoning Model (HRM) is a 27 million parameter network, trained entirely from scratch on only 1,000 examples, outperforming models hundreds of times larger on certain reasoning challenges.
The backdrop is obvious to anyone following AI. The Transformer architecture, the foundation of GPTโ4, Claude, and Gemini, is shallow in computational depth.
These systems rely heavily on chainโofโthought prompting: break a problem into intermediate steps, output each step in language, and hope the path holds together. The approach is prone to stalling or collapse on tasks that require deep search or backtracking. It is slow, verbose, brittle, and computationally demanding.
๐๐๐ ๐ญ๐๐ค๐๐ฌ ๐ ๐ฏ๐๐ซ๐ฒ ๐๐ข๐๐๐๐ซ๐๐ง๐ญ ๐๐ฉ๐ฉ๐ซ๐จ๐๐๐ก, ๐๐ข๐ซ๐๐๐ญ๐ฅ๐ฒ ๐ข๐ง๐ฌ๐ฉ๐ข๐ซ๐๐ ๐๐ฒ ๐ก๐จ๐ฐ ๐ญ๐ก๐ ๐๐ซ๐๐ข๐ง ๐ฆ๐๐ง๐๐ ๐๐ฌ ๐ฆ๐ฎ๐ฅ๐ญ๐ขโ๐ฌ๐ญ๐๐ ๐ ๐ซ๐๐๐ฌ๐จ๐ง๐ข๐ง๐ ๐๐๐ซ๐จ๐ฌ๐ฌ ๐๐ข๐๐๐๐ซ๐๐ง๐ญ ๐ญ๐ข๐ฆ๐๐ฌ๐๐๐ฅ๐๐ฌ.
Key features:
Two recurrent modules: a highโlevel network for slow, abstract planning and a lowโlevel network for rapid, fineโgrained updates. This separation mirrors cortical hierarchies and multiโfrequency rhythms observed in neuroscience.
Hierarchical convergence: the lowโlevel loop reaches a local equilibrium before the highโlevel module steps forward, preventing premature convergence and keeping both modules active across many reasoning cycles.
Oneโstep gradient approximation from Deep Equilibrium Models: eliminates backpropagation through timeโs O(T) memory cost and trains with O(1) memory, using only the final equilibrium states for gradients.
Deep supervision: each reasoning segment receives its own loss and update, with state detachment between segments for stability. This delivers frequent corrective feedback and acts as a regularizer.
Adaptive Computation Time (ACT) via Qโlearning: the model learns to halt early for simple problems and run additional cycles for complex ones, saving compute without a performance drop.
Inferenceโtime scaling: at test time you can raise the maximum segment count Mmax to boost performance, no retraining required.
The results are backed by numbers:
SudokuโExtreme: nearโperfect accuracy on a dataset averaging 22 search backtracks per puzzle. Comparableโsized Transformers fail completely.
MazeโHard (30ร30): solves optimal paths in mazes requiring path lengths over 110 steps, again nearโperfect where even 175Mโparameter Transformers fall below 20% accuracy.
ARCโAGI: 40.3% vs o3โminiโhigh at 34.5% and Claude 3.7 at 21.2%. Achieved with no pretraining and no CoT supervision.
The highโlevel moduleโs participation ratio (PR) is almost 3ร that of the lowโlevel module, directly paralleling observed cortical dimensionality hierarchies in mouse brain studies.
It is all reproducible. Sapient Intelligence has released the code, training configs, evaluation scripts, augmentation and preprocessing pipelines, plus checkpoints. The repo includes toggles for ACT and deep supervision, and scripts to replicate ARCโs โ1000 augmented variants plus votingโ evaluation, which is computeโheavy but standard in ARC research.
A reality check: that top ARC number comes with heavy testโtime augmentation and voting, which has cost implications. The architectureโs performance on Sudoku and Maze with singleโpass inference shows that the depth and efficiency gains hold even without augmentation tricks.
From a practical point of view, HRM shows that reasoning can live in latent states rather than spilling into output tokens. That means less data reliance, greater stability, and direct control of testโtime โthinking lengthโ without retraining. It is quick on easy cases, deliberate on hard ones, and grounded in a training and convergence strategy that is both efficient and biologically plausible.
This is not AGI. It is a focused design that sidesteps some of the fundamental computational limits of shallow Transformer stacks. For anyone building reasoning systems, or exploring neuroscienceโinspired architectures, the paper and its openโsource implementation deserve attention.
Code & Checkpoints: https://github.com/sapientinc/HRM