Bluesky Thread

HRM confirmed by ARC-AGI team, but also dismissed as non-generalizable

View original thread
HRM confirmed by ARC-AGI team, but also dismissed as non-generalizable

the magic wasn’t in the hierarchical structure, it was in the outer loop. And the outer loop benefited mostly at training time

i.e. it mostly just memorized answers

arcprize.org/blog/hrm-ana...
arcprize.org
The Hidden Drivers of HRM's Performance on ARC-AGI
We scored on hidden tasks, ran ablations, and found that performance from the Hierarchical Reasoning Model comes from an unexpected source
Tim Kellogg @timkellogg.me
HRM: Hierarchical Reasoning Model

ngl this sounds like bullshit but i don’t think it is

- 27M (million parameters)
- 1000 training examples
- beats o3-mini on ARC-AGI

arxiv.org/abs/2506.21734
26 2
to be clear — it was a legit result. it completed ARC as advertised, no cheating

the problem (as interpreted by the ARC team) is that it won’t work beyond ARC-style problems
3 1
26 likes 2 reposts

More like this

×