AI News 10月09日 09:29
三星AI模型以小胜大,复杂推理超越大型语言模型
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

三星AI研究员提出了一种名为Tiny Recursive Model (TRM)的新型AI模型,该模型参数量仅为700万,远小于主流大型语言模型(LLMs)。令人瞩目的是,TRM在ARC-AGI等复杂推理基准测试中取得了当前最先进的成果,甚至超越了部分巨型LLMs。TRM通过单一小型网络递归优化推理过程和答案预测,克服了LLMs在多步推理中的局限性,证明了高效模型设计的潜力,为AI发展提供了参数更少、效率更高的替代方案。

💡 **高效模型设计颠覆AI规模论:** 三星提出的Tiny Recursive Model (TRM)模型,仅拥有700万参数,远小于主流大型语言模型(LLMs),却在ARC-AGI等复杂推理测试中取得了超越LLMs的成绩,挑战了“越大越好”的行业惯例,证明了模型架构设计的重要性。

🧠 **递归式自我优化提升推理能力:** TRM采用单一小型网络,通过递归迭代的方式,不断优化其内部的“推理”过程和对最终“答案”的预测。这种机制允许模型在多达16次重复中逐步纠正错误,尤其在需要完美逻辑执行的难题上表现出色,克服了LLMs因逐词生成可能产生的早期错误。

⚙️ **简化训练与数学依据,性能大幅提升:** TRM摒弃了其前身Hierarchical Reasoning Model (HRM)复杂的生物学论证和固定点定理。它通过直接对整个递归过程进行反向传播来训练,这一简化极大地提升了性能。例如,在Sudoku-Extreme基准测试中,准确率从56.5%飙升至87.4%,显示了简化训练方法带来的显著效益。

📊 **多项基准测试表现优异:** TRM在Sudoku-Extreme和Maze-Hard等测试中均取得了显著进步,并在衡量AI通用智能的ARC-AGI基准测试上取得了44.6%(ARC-AGI-1)和7.8%(ARC-AGI-2)的准确率,超越了参数量更大的HRM模型以及包括Gemini 2.5 Pro在内的许多大型LLMs。

🚀 **训练效率优化,加速AI发展:** TRM的训练过程通过引入自适应机制ACT (Adaptive Computation Time)的简化,消除了每次训练步骤中不必要的二次前向传播,从而提高了训练效率,而未对最终泛化能力造成明显影响,为AI模型的开发和部署提供了更经济的途径。

A new paper from a Samsung AI researcher explains how a small network can beat massive Large Language Models (LLMs) in complex reasoning.

In the race for AI supremacy, the industry mantra has often been “bigger is better.” Tech giants have poured billions into creating ever-larger models, but according to Alexia Jolicoeur-Martineau of Samsung SAIL Montréal, a radically different and more efficient path forward is possible with the Tiny Recursive Model (TRM).

Using a model with just 7 million parameters, less than 0.01% of the size of leading LLMs, TRM achieves new state-of-the-art results on notoriously difficult benchmarks like the ARC-AGI intelligence test. Samsung’s work challenges the prevailing assumption that sheer scale is the only way to advance the capabilities of AI models, offering a more sustainable and parameter-efficient alternative.

Overcoming the limits of scale

While LLMs have shown incredible prowess in generating human-like text, their ability to perform complex, multi-step reasoning can be brittle. Because they generate answers token-by-token, a single mistake early in the process can derail the entire solution, leading to an invalid final answer.

Techniques like Chain-of-Thought, where a model “thinks out loud” to break down a problem, have been developed to mitigate this. However, these methods are computationally expensive, often require vast amounts of high-quality reasoning data that may not be available, and can still produce flawed logic. Even with these augmentations, LLMs struggle with certain puzzles where perfect logical execution is necessary.

Samsung’s work builds upon a recent AI model known as the Hierarchical Reasoning Model (HRM). HRM introduced a novel method using two small neural networks that recursively work on a problem at different frequencies to refine an answer. It showed great promise but was complicated, relying on uncertain biological arguments and complex fixed-point theorems that were not guaranteed to apply.

Instead of HRM’s two networks, TRM uses a single, tiny network that recursively improves both its internal “reasoning” and its proposed “answer”.

The model is given the question, an initial guess at the answer, and a latent reasoning feature. It first cycles through several steps to refine its latent reasoning based on all three inputs. Then, using this improved reasoning, it updates its prediction for the final answer. This entire process can be repeated up to 16 times, allowing the model to progressively correct its own mistakes in a highly parameter-efficient manner.

Counterintuitively, the research discovered that a tiny network with only two layers achieved far better generalisation than a four-layer version. This reduction in size appears to prevent the model from overfitting; a common problem when training on smaller, specialised datasets.

TRM also dispenses with the complex mathematical justifications used by its predecessor. The original HRM model required the assumption that its functions converged to a fixed point to justify its training method. TRM bypasses this entirely by simply back-propagating through its full recursion process. This change alone provided a massive boost in performance, improving accuracy on the Sudoku-Extreme benchmark from 56.5% to 87.4% in an ablation study.

Samsung’s model smashes AI benchmarks with fewer resources

The results speak for themselves. On the Sudoku-Extreme dataset, which uses only 1,000 training examples, TRM achieves an 87.4% test accuracy, a huge leap from HRM’s 55%. On Maze-Hard, a task involving finding long paths through 30×30 mazes, TRM scores 85.3% compared to HRM’s 74.5%.

Most notably, TRM makes huge strides on the Abstraction and Reasoning Corpus (ARC-AGI), a benchmark designed to measure true fluid intelligence in AI. With just 7M parameters, TRM achieves 44.6% accuracy on ARC-AGI-1 and 7.8% on ARC-AGI-2. This outperforms HRM, which used a 27M parameter model, and even surpasses many of the world’s largest LLMs. For comparison, Gemini 2.5 Pro scores only 4.9% on ARC-AGI-2.

The training process for TRM has also been made more efficient. An adaptive mechanism called ACT – which decides when the model has improved an answer enough and can move to a new data sample – was simplified to remove the need for a second, costly forward pass through the network during each training step. This change was made with no major difference in final generalisation.

This research from Samsung presents a compelling argument against the current trajectory of ever-expanding AI models. It shows that by designing architectures that can iteratively reason and self-correct, it is possible to solve extremely difficult problems with a tiny fraction of the computational resources.

See also: Google’s new AI agent rewrites code to automate vulnerability fixes

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security Expo, click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Samsung’s tiny AI model beats giant reasoning LLMs appeared first on AI News.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

三星AI Tiny Recursive Model TRM 大型语言模型 LLMs 复杂推理 AI效率 模型架构 ARC-AGI Samsung AI Tiny Recursive Model TRM Large Language Models LLMs Complex Reasoning AI Efficiency Model Architecture ARC-AGI
相关文章