MarkTechPost@AI 10月10日 02:12
小型递归模型在ARC-AGI推理任务上超越大型语言模型
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

三星SAIT发布了名为Tiny Recursive Model (TRM) 的新型两层、约700万参数的递归推理模型。该模型在ARC-AGI基准测试中取得了44.6%-45%的准确率(ARC-AGI-1)和7.8%-8%的准确率(ARC-AGI-2),优于许多参数量远超其的语言模型,如DeepSeek-R1、o3-mini-high和Gemini 2.5 Pro。TRM通过迭代更新潜在的“草稿本”来优化解决方案,而非逐个token生成,并且将计算资源集中在测试时的推理过程。该模型在Sudoku-Extreme和Maze-Hard等谜题基准上也取得了显著进步,证明了在特定任务上,紧凑型架构和递归精炼策略的有效性。

💡 **创新架构与训练方法**:TRM摒弃了传统的多模块层级结构,采用一个精简的两层神经网络,通过迭代更新潜在的“草稿本”(z)和当前解决方案嵌入(y)来执行“思考”和“行动”的循环。这种递归方式,结合深度监督和完全反向传播,使得模型能够从头开始学习,并在有限的参数下实现强大的推理能力。

🚀 **性能超越大型模型**:尽管参数量仅为约700万,TRM在ARC-AGI-1和ARC-AGI-2基准测试中取得了44.6-45%和7.8-8%的准确率,显著优于DeepSeek-R1(671B)、o3-mini-high和Gemini 2.5 Pro等规模庞大的模型。这表明高效的架构设计和训练策略比单纯的参数堆叠更能提升在特定推理任务上的表现。

🧩 **多任务通用性与效率**:TRM不仅在ARC-AGI任务上表现出色,还在Sudoku-Extreme(87.4%)和Maze-Hard(85.3%)等谜题解决任务上超越了参数量更大的先前模型HRM。这展示了TRM架构在处理不同类型的符号-几何推理问题时的灵活性和效率,尤其是在资源受限的情况下。

🧠 **推理机制的洞察**:TRM的成功关键在于其“先决策后修正”的推理模式,通过内部的迭代一致性检查来优化初步解决方案,从而减少了自回归解码在结构化输出上的偏差。此外,模型将计算资源集中于测试时的递归精炼,而非参数数量,有效提升了泛化能力。

Can an iterative draft–revise solver that repeatedly updates a latent scratchpad outperform far larger autoregressive LLMs on ARC-AGI? Samsung SAIT (Montreal) has released Tiny Recursive Model (TRM)—a two-layer, ~7M-parameter recursive reasoner that reports 44.6–45% test accuracy on ARC-AGI-1 and 7.8–8% on ARC-AGI-2, surpassing results reported for substantially larger language models such as DeepSeek-R1, o3-mini-high, and Gemini 2.5 Pro on the same public evaluations. TRM also improves puzzle benchmarks Sudoku-Extreme (87.4%) and Maze-Hard (85.3%) over the prior Hierarchical Reasoning Model (HRM, 27M params), while using far fewer parameters and a simpler training recipe.

What’s exactly is new?

TRM removes HRM’s two-module hierarchy and fixed-point gradient approximation in favor of a single tiny network that recurses on a latent “scratchpad” (z) and a current solution embedding (y):

https://arxiv.org/pdf/2510.04871v1

Architecturally, the best-performing setup for ARC/Maze retains self-attention; for Sudoku’s small fixed grids, the research team swap self-attention for an MLP-Mixer-style token mixer. A small EMA (exponential moving average) over weights stabilizes training on limited data. Net depth is effectively created by recursion (e.g., T = 3, n = 6) rather than stacking layers; in ablations, two layers generalize better than deeper variants at the same effective compute.

Understanding the Results

https://arxiv.org/pdf/2510.04871v1
https://arxiv.org/pdf/2510.04871v1

These are direct-prediction models trained from scratch on small, heavily augmented datasets—not few-shot prompting. ARC remains the canonical target; broader leaderboard context and rules (e.g., ARC-AGI-2 grand-prize threshold at 85% private set) are tracked by the ARC Prize Foundation.

Why a 7M model can beat much larger LLMs on these tasks?

    Decision-then-revision instead of token-by-token: TRM drafts a full candidate solution, then improves it via latent iterative consistency checks against the input—reducing exposure bias from autoregressive decoding on structured outputs.Compute spent on test-time reasoning, not parameter count: Effective depth arises from recursion (emulated depth ≈ T·(n+1)·layers), which the researchers show yields better generalization at constant compute than adding layers. Tighter inductive bias to grid reasoning: For small fixed grids (e.g., Sudoku), attention-free mixing reduces overcapacity and improves bias/variance trade-offs; self-attention is kept for larger 30×30 grids.

Key Takeaways

Editorial Comments

This research demonstrates a ~7M-parameter, two-layer recursive solver that unrolls up to 16 draft-revise cycles with ~6 latent updates per cycle and reports ~45% on ARC-AGI-1 and ~8% (two-try) on ARC-AGI-2. The research team released code on GitHub. ARC-AGI remains unsolved at scale (target 85% on ARC-AGI-2), so the contribution is an architectural efficiency result rather than a general reasoning breakthrough.


Check out the Technical Paper and GitHub Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The post Tiny Recursive Model (TRM): A Tiny 7M Model that Surpass DeepSeek-R1, Gemini 2.5 pro, and o3-mini at Reasoning on both ARG-AGI 1 and ARC-AGI 2 appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Tiny Recursive Model TRM ARC-AGI AI Reasoning Large Language Models Machine Learning Recursive Neural Networks Samsung SAIT Small Model Performance
相关文章