TAPA：提升长文本建模的定位编码方法

cs.AI updates on arXiv.org 09月17日

TAPA：提升长文本建模的定位编码方法

本文提出Token-Aware Phase Attention（TAPA），一种新的定位编码方法，通过引入可学习的相位函数改善注意力机制，有效提升长文本建模能力，降低长文本的困惑度。

arXiv:2509.12635v1 Announce Type: cross Abstract: We prove under practical assumptions that Rotary Positional Embedding (RoPE) introduces an intrinsic distance-dependent bias in attention scores that limits RoPE's ability to model long-context. RoPE extension methods may alleviate this issue, but they typically require post-hoc adjustments after pretraining, such as rescaling or hyperparameters retuning. This paper introduces Token-Aware Phase Attention (TAPA), a new positional encoding method that incorporates a learnable phase function into the attention mechanism. TAPA preserves token interactions over long range, extends to longer contexts with direct and light fine-tuning, extrapolates to unseen lengths, and attains significantly lower perplexity on long-context than RoPE families.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

TAPA 定位编码长文本建模注意力机制困惑度降低

相关文章

AnchorGT: A Novel Attention Architecture for Graph Transformers as a Flexible Building Block to Improve the Scalability of a Wide Range of Graph Transformer Models

Quantizing Transformers by Helping Attention Heads Do Nothing with Markus Nagel - #663

Hyperparameter Optimization through Neural Network Partitioning with Christos Louizos - #627

Unlocking the Potential of SirLLM: Advancements in Memory Retention and Attention Mechanisms

Language Understanding and LLMs with Christopher Manning - #686

Aaren: Rethinking Attention as Recurrent Neural Network RNN for Efficient Sequence Modeling on Low-Resource Devices

重新思考注意力的数学机制

Inspectus: An Open-Sourced Large Language Model LLM Attention Visualization Library

This AI Paper from Georgia Institute of Technology Introduces LARS-VSA (Learning with Abstract RuleS): A Vector Symbolic Architecture For Learning with Abstract Rules

Solving the ‘Lost-in-the-Middle’ Problem in Large Language Models: A Breakthrough in Attention Calibration