MarkTechPost@AI 2024年09月30日
Improving Length Generalization in Algorithmic Tasks with Looped Transformers: A Study on n-RASP-L Problems
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了在算法任务中模型处理未见过长度的输入时存在的长度泛化问题,介绍了Looped Transformer这一有希望的解决方案,以及相关研究的内容和成果,同时也提到了该研究的一些局限性。

🧠Looped Transformer通过迭代处理输入,能根据问题复杂度自适应调整步骤,改善算法任务的长度泛化能力。研究人员以使用RASP-L操作的具有迭代解决方案的函数为重点,在无中间监督的情况下训练该模型。

📚研究探讨了位置嵌入、RNNs、乔姆斯基层次结构、通用Transformer、输入表示和思维链推理在长度泛化中的作用。例如,位置嵌入可增强Transformer的泛化能力,但在RASP-L操作中未使用;结构化内存有助于上下文无关的泛化。

💪n-RASP-L框架使用无循环的固定深度仅解码器Transformer来解决算法任务,使诸如加法或奇偶校验等问题具有挑战性。提出的“循环Transformer”架构通过基于输入长度的多次迭代来重复使用解码器块,以解决诸如n位加法和奇偶校验等任务。

📊对包括奇偶校验、复制、加法、二进制和及乘法等各种任务进行了评估,实验设置涉及课程学习,循环模型在处理超出训练长度的较长序列时显示出优越的泛化能力。

Recent research highlights that Transformers, though successful in tasks like arithmetic and algorithms, need help with length generalization, where models handle inputs of unseen lengths. This is crucial for algorithmic tasks such as coding or reasoning, where input length often correlates with problem difficulty. Large language models face this limitation even when scaled due to their fixed depth. Approaches like Chain-of-Thought reasoning and scratchpad methods offer some improvement. A promising solution is the Looped Transformer, which processes inputs iteratively, allowing adaptive steps based on problem complexity and improving length generalization for algorithmic tasks.

Researchers from the University of Wisconsin-Madison, MIT, and UC Berkeley demonstrate that Looped Transformers with adaptive steps improve length generalization for algorithmic tasks. Focusing on functions with iterative solutions using RASP-L operations, they train Looped Transformers without intermediate supervision, relying solely on input, output, and step count. At inference, the model determines the necessary steps to solve a task. Their method shows that Looped Transformers adapt the number of loops during inference, enabling successful length generalization. The study introduces n-RASP-L problems and demonstrates improved performance on tasks like Copy, Parity, and Addition compared to baseline approaches.

The study explores positional embeddings, RNNs, Chomsky Hierarchy, Universal Transformers, input representations, and Chain-of-Thought (CoT) reasoning in length generalization. Positional embeddings enhance Transformers’ generalization ability but are not used in RASP-L operations. Studies show RNNs and Transformers struggle with non-regular tasks, while structured memory aids in context-free generalization. The Looped Transformer adapts the Universal Transformer with step-dependent supervision, improving task generalization. Additionally, CoT reasoning can simplify predictions, but its steps may introduce complexity that hinders generalization. The study also differentiates between next-token prediction (NTP) and full-answer prediction (FAP) methods.

The n-RASP-L framework addresses algorithmic tasks using fixed-depth decoder-only Transformers without loops, making problems like addition or parity challenging. A “looped Transformer” architecture is proposed to solve this, which reuses decoder blocks across multiple iterations based on input length. This allows solving tasks such as n-digit addition and parity through iterative processes. The model is supervised end-to-end during training, using input-output pairs without intermediate steps. At inference, adaptive stopping rules, such as step oracle or confidence thresholds, are employed to decide when to terminate the looped process.

The study assesses the effectiveness of looped Transformers for tasks requiring length generalization. Various tasks were evaluated, including parity, copy, addition, binary sum, and multiplication. The experimental setup involves curriculum learning, and the looped model shows superior generalization, especially in handling longer sequences beyond training lengths. Comparisons with baseline methods like vanilla NTP, NTP with pause tokens, and weight-tied layers show that the looped model with adaptive depth significantly outperforms these approaches. Ablation studies highlight the positive impact of input injection and adaptive depth on performance, with stopping criteria based on maximum confidence ensuring optimal outputs.

This work has several limitations, including the computational demands of direct looped training when handling many steps and limited training data due to resource constraints. Using simpler positional embeddings (NoPE) also leaves room for improvement. Despite requiring ground-truth step numbers for supervision, the method assumes less than CoT training. In conclusion, looped Transformers with step-dependent supervision effectively improve length generalization, particularly for challenging n-RASP-L tasks. While previous models struggled with unseen input lengths, this approach adapts the number of steps during inference, showing potential for broader applications in more complex reasoning tasks.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 52k+ ML SubReddit.

We are inviting startups, companies, and research institutions who are working on small language models to participate in this upcoming ‘Small Language Models’ Magazine/Report by Marketchpost.com. This Magazine/Report will be released in late October/early November 2024. Click here to set up a call!

The post Improving Length Generalization in Algorithmic Tasks with Looped Transformers: A Study on n-RASP-L Problems appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Looped Transformer 长度泛化 算法任务 研究成果
相关文章