MoC：长视频生成的新思路

cs.AI updates on arXiv.org 10月07日

MoC：长视频生成的新思路

本文提出了一种名为MoC的长视频生成模型，通过将长视频生成视为内部信息检索任务，利用可学习的稀疏注意力路由模块实现有效记忆检索，从而在保持内容连贯性的同时提高生成效率。

arXiv:2508.21058v2 Announce Type: replace-cross Abstract: Long video generation is fundamentally a long context memory problem: models must retain and retrieve salient events across a long range without collapsing or drifting. However, scaling diffusion transformers to generate long-context videos is fundamentally limited by the quadratic cost of self-attention, which makes memory and computation intractable and difficult to optimize for long sequences. We recast long-context video generation as an internal information retrieval task and propose a simple, learnable sparse attention routing module, Mixture of Contexts (MoC), as an effective long-term memory retrieval engine. In MoC, each query dynamically selects a few informative chunks plus mandatory anchors (caption, local windows) to attend to, with causal routing that prevents loop closures. As we scale the data and gradually sparsify the routing, the model allocates compute to salient history, preserving identities, actions, and scenes over minutes of content. Efficiency follows as a byproduct of retrieval (near-linear scaling), which enables practical training and synthesis, and the emergence of memory and consistency at the scale of minutes.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

长视频生成信息检索稀疏注意力

相关文章

Implement RAG Using Weaviate, LangChain4j, and LocalAI

Sparse-Matrix Factorization-based Method: Efficient Computation of Latent Query and Item Representations to Approximate CE Scores

MS MARCO Web Search: A Large-Scale Information-Rich Web Dataset Featuring Millions of Real Clicked Query-Document Labels

This AI Paper by Snowflake Introduces Arctic-Embed: Enhancing Text Retrieval with Optimized Embedding Models

Sharpening LLMs: The Sharpest Tools and Essential Techniques for Precision and Clarity

LaVague’s Open-Sourced Large Action Model Outperforms Gemini and ChatGPT in Information Retrieval: A Game Changer in AI Web Agents

‘GPT Researcher’: An Autonomous AI Agent Designed for Comprehensive Online Research on a Variety of Tasks

BM25S: A Python Package that Implements the BM25 Algorithm for Ranking Documents Based on a Query

APEER: A Novel Automatic Prompt Engineering Algorithm for Passage Relevance Ranking

Path: A Machine Learning Method for Training Small-Scale (Under 100M Parameter) Neural Information Retrieval Models with as few as 10 Gold Relevance Labels