热点
"长序列处理" 相关文章
老牌Transformer杀手在ICLR悄然更新:Mamba-3三大改进趋近设计完全体
机器之心 2025-10-14T16:32:51.000000Z
LouisKV: Efficient KV Cache Retrieval for Long Input-Output Sequences
cs.AI updates on arXiv.org 2025-10-14T04:20:04.000000Z
Native Hybrid Attention for Efficient Sequence Modeling
cs.AI updates on arXiv.org 2025-10-09T04:11:37.000000Z
Recurrence-Complete Frame-based Action Models
cs.AI updates on arXiv.org 2025-10-09T04:10:01.000000Z
Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models
cs.AI updates on arXiv.org 2025-10-07T04:16:29.000000Z
Replacing Softmax Similarity with a Sharpened Angular Similarity: Theory and Practice of Scaling To Billion-Context Attention
cs.AI updates on arXiv.org 2025-10-07T04:16:10.000000Z
InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation
cs.AI updates on arXiv.org 2025-09-30T04:07:08.000000Z
SCOUT: Toward Sub-Quadratic Attention via Segment Compression for Optimized Utility in Transformers
cs.AI updates on arXiv.org 2025-09-03T04:17:18.000000Z
Transformer的并行计算与长序列处理瓶颈
掘金 人工智能 2025-08-06T02:49:41.000000Z
ROVER: Recursive Reasoning Over Videos with Vision-Language Models for Embodied Tasks
cs.AI updates on arXiv.org 2025-08-05T11:10:07.000000Z
首次将单目3D重建推向公里级极限!南开、南大提出VGGT-Long:分块、循环、对齐,开源
我爱计算机视觉 2025-07-27T09:01:08.000000Z
Scaling Attention to Very Long Sequences in Linear Time with Wavelet-Enhanced Random Spectral Attention (WERSA)
cs.AI updates on arXiv.org 2025-07-14T04:08:16.000000Z
算力终结者来了,华人天团「降维打击」注意力瓶颈,AI狂飙进对数时代
36kr 2025-06-09T09:29:16.000000Z
Deepseek NSA可能是 Transfermer 的新解法
一支烟花AI 2025-02-23T16:10:57.000000Z