cs.AI updates on arXiv.org 09月03日
深度张量网络:突破Transformer注意力瓶颈
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提出深度张量网络,通过统一张量代数与神经网络设计,突破Transformer注意力机制的二次复杂度瓶颈,实现高效处理长序列。

arXiv:2311.11091v3 Announce Type: replace-cross Abstract: The quadratic complexity of dot-product attention introduced in Transformer remains a fundamental bottleneck impeding the progress of foundation models toward unbounded context lengths. Addressing this challenge, we introduce the Deep Tensor Network, a new architectural framework that fundamentally reformulates attention by unifying the expressive power of tensor algebra with neural network design. Our approach moves beyond both conventional dot-product attention and subsequent linear-time approximations to capture higher-order statistical dependencies. We introduce two core operators derived from this framework: \emph{Tensor Attention}, which models complex token-mixing via data-dependent polynomial kernels, and Tensor Interaction, a novel mechanism for adaptive channel-mixing. We demonstrate that these operators are powered by second-order summaries that entirely bypass the formation of $n \times n$ matrices, enabling a causality-preserving streaming implementation with $O(d^2)$ per-token updates and $O(d^2)$ state. This efficiency rivals that of modern State Space Models while retaining an attention-like formulation. The Deep Tensor Network thus provides a principled and powerful new class of building blocks for next-generation sequence models, bridging the gap between scalable computation and rich, expressive interaction modeling.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

深度学习 Transformer 注意力机制 张量网络 序列模型
相关文章