cs.AI updates on arXiv.org 前天 13:18
DeepCompress:提升大型推理模型的效率和准确性
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提出DeepCompress框架,通过自适应长度奖励机制,提升大型推理模型的效率和准确性,实验结果表明其在数学基准测试中优于基线方法。

arXiv:2510.27419v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) have demonstrated impressive capabilities but suffer from cognitive inefficiencies like overthinking'' simple problems andunderthinking'' complex ones. While existing methods that use supervised fine-tuning~(SFT) or reinforcement learning~(RL) with token-length rewards can improve efficiency, they often do so at the cost of accuracy. This paper introduces \textbf{DeepCompress}, a novel framework that simultaneously enhances both the accuracy and efficiency of LRMs. We challenge the prevailing approach of consistently favoring shorter reasoning paths, showing that longer responses can contain a broader range of correct solutions for difficult problems. DeepCompress employs an adaptive length reward mechanism that dynamically classifies problems as Simple'' orHard'' in real-time based on the model's evolving capability. It encourages shorter, more efficient reasoning for Simple'' problems while promoting longer, more exploratory thought chains forHard'' problems. This dual-reward strategy enables the model to autonomously adjust its Chain-of-Thought (CoT) length, compressing reasoning for well-mastered problems and extending it for those it finds challenging. Experimental results on challenging mathematical benchmarks show that DeepCompress consistently outperforms baseline methods, achieving superior accuracy while significantly improving token efficiency.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

大型推理模型 效率提升 准确性 DeepCompress 自适应长度奖励机制
相关文章