Transformer组合推理能力研究

cs.AI updates on arXiv.org 10月21日 12:28

Transformer组合推理能力研究

本文通过随机层次模型研究Transformer在未见过序列上的组合推理能力，发现其表现与任务复杂度及上下文示例数量相关，并揭示了训练过程中层专用性的渐进出现及其与泛化性能的相关性。

arXiv:2510.17469v1 Announce Type: cross Abstract: Transformers exhibit compositional reasoning on sequences not observed during training, a capability often attributed to in-context learning (ICL) and skill composition. We investigate this phenomenon using the Random Hierarchy Model (RHM), a probabilistic context-free grammar that generates sequences through recursive rule application. Models are trained on subsets of sequences and evaluated across four generalization conditions: memorization, in-distribution generalization, out-of-distribution generalization with the same rules, and cross-layer transfer. Behaviorally, performance improves systematically with task complexity and the number of in-context examples, with out-of-distribution tasks requiring substantially more examples than in-distribution scenarios. Mechanistically, we identify a progressive emergence of layer specialization during training that correlates with generalization performance. Principal component analysis and attention pattern clustering reveal that transformers develop structured, hierarchically organized representations in specialized layers. These results demonstrate that transformers develop modular, interpretable mechanisms supporting compositional reasoning, linking internal algorithmic structure to observed behavioral capabilities.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Transformer 组合推理泛化能力层专用性随机层次模型

相关文章

Import AI 364: Robot scaling laws; human-level LLM forecasting; and Claude 3

Trends in Computer Vision with Georgia Gkioxari - #549

Social Commonsense Reasoning with Yejin Choi - #518

Trends in Natural Language Processing with Sameer Singh - #445

AI趨勢周報第252期：取代Transformer？LSTM之父發表新LLM架構

How ‘Chain of Thought’ Makes Transformers Smarter

This AI Paper by Toyota Research Institute Introduces SUPRA: Enhancing Transformer Efficiency with Recurrent Neural Networks

This AI Paper from Huawei Introduces a Theoretical Framework Focused on the Memorization Process and Performance Dynamics of Transformer-based Language Models (LMs)

Octo: An Open-Sourced Large Transformer-based Generalist Robot Policy Trained on 800k Trajectories from the Open X-Embodiment Dataset

惊喜发现又祛魅一项能力：读论文 CS 专业一路走来被论文折磨，现以为脱离苦海，但又不得不紧跟看 LLM SD 论文，痛点就是：看不下去，精神涣散?‍♂️啃能读完...