DeepSeek发布低成本长文本处理新模型

TechCrunch News 09月30日

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

DeepSeek发布了其最新的实验性模型V3.2-exp，该模型通过一项名为“DeepSeek Sparse Attention”的新技术，显著降低了长文本处理的推理成本。该技术的核心在于一个“闪电索引器”模块，用于优先选择文本中的关键部分，再通过“细粒度令牌选择系统”进一步精炼，最终以更少的计算资源处理长上下文。DeepSeek初步测试显示，此举可将长文本API调用的价格降低一半。该模型采用开放权重，可在Hugging Face上获取，有望为AI领域在成本控制方面提供新的解决方案，尤其是在Transformer架构的效率提升方面。

💡 **DeepSeek V3.2-exp模型旨在大幅降低长文本处理的推理成本**：该模型通过引入“DeepSeek Sparse Attention”技术，能够更高效地处理包含大量信息的长上下文，从而减少服务器运行成本。

⚡ **“DeepSeek Sparse Attention”技术是核心创新**：该技术包含一个“闪电索引器”，用于识别文本中的关键片段，以及一个“细粒度令牌选择系统”，用于精确选择需要关注的令牌，从而优化了计算资源的利用。

💰 **显著的成本效益**：DeepSeek的初步测试表明，该模型可以将长文本API调用的成本降低高达50%，为用户在处理大规模数据时提供了经济实惠的解决方案。

🌐 **开放与可及性**：该模型为开放权重，并在Hugging Face平台上免费提供，这使得研究人员和开发者能够更容易地访问、测试和应用这项新技术，促进了AI领域的合作与创新。

💡 **对Transformer架构的效率提升**：DeepSeek的研究重点在于提升基础Transformer架构的运行效率，V3.2-exp模型是这一努力的成果，它展示了在不牺牲性能的前提下，降低AI模型运行成本的潜力。

Researchers at DeepSeek on Monday released a new experimental model called V3.2-exp, designed to have dramatically lower inference costs when used in long-context operations. DeepSeek announced the model with a post on Hugging Face, also posting a linked academic paper on GitHub.

The most important feature of the new model is called DeepSeek Sparse Attention, an intricate system described in detail in the diagram below. In essence, the system uses a module called a “lightning indexer” to prioritize specific excerpts from the context window. After that, a separate system called a “fine-grained token selection system” chooses specific tokens from within those excerpts to load into the module’s limited attention window. Taken together, they allow the Sparse Attention models to operate over long portions of context with comparatively small server loads.

For long-context operations, the benefits of the system are significant. Preliminary testing by DeepSeek found that the price of a simple API call could be reduced by as much as half in long-context situations. Further testing will be required to build a more robust assessment, but because the model is open-weight and freely available on Hugging Face, it won’t be long before third-party tests can assess the claims made in the paper.

DeepSeek’s new model is one of a string of recent breakthroughs tackling the problem of inference costs — essentially, the server costs of operating a pre-trained AI model, as distinct from the cost of training it. In DeepSeek’s case, the researchers were looking for ways to make the fundamental transformer architecture operate more efficiently — and finding that there are significant improvements to be made.

Based in China, DeepSeek has been an unusual figure in the AI boom, particularly for those who view AI research as a nationalist struggle between the U.S. and China. The company made waves at the beginning of the year with its R1 model, trained using primarily reinforcement learning at a far lower cost than its American competitors. But the model has not sparked a wholesale revolution in AI training, as some predicted, and the company has receded from the spotlight in the months since.

The new “sparse attention” approach is unlikely to produce the same uproar as R1 — but it could still teach U.S. providers some much needed tricks to help keep inference costs low.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签