cs.AI updates on arXiv.org 08月11日
Reorganizing attention-space geometry with expressive attention
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章提出一种基于平方点积的表达注意力(EA)机制,通过增强平行或反平行查询和键的注意力权重,在保持计算成本和内存需求不变的情况下,提升模型性能,特别是在复杂任务和多任务设置中表现优于传统点积注意力(DPA)。

arXiv:2407.18601v3 Announce Type: replace-cross Abstract: Attention regulates information transfer between tokens. For this, query and key vectors are compared, typically in terms of a scalar product, $\mathbf{Q}^T\mathbf{K}$, together with a subsequent softmax normalization. In geometric terms, the standard dot-product attention (DPA) leads to large/small attention weights for parallel/antiparallel queries and keys. Here we study expressive attention (EA), which is based on $(\mathbf{Q}^T\mathbf{K})^2$, the squared dot product. In this case, attention is enhanced when query and key are either parallel or antiparallel, and suppressed for orthogonal configurations. EA can be introduced into any attention-based code without additional compute costs or memory requirements. For a series of autoregressive prediction tasks, we find that expressive attention performs at least as well as vanilla DPA. Increasing task complexity, EA is observed to outperform DPA with increasing margins, which also holds for multi-task settings. For a given model size, EA manages to achieve 100% performance for a range of complexity levels not accessible to DPA. Our results show that it is possible to reorganize the geometry of the matching condition in the space of attention heads without loss of performance.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

表达注意力 模型性能 注意力机制 点积注意力 复杂任务
相关文章