自适应推理抑制：提升LRLMs效率新方法

cs.AI updates on arXiv.org 10月02日

自适应推理抑制：提升LRLMs效率新方法

本文提出自适应推理抑制（ARS），一种不依赖训练的动态抑制冗余推理步骤的新方法，通过自适应置信度监控保持准确度，显著提升大型推理语言模型（LRLMs）的效率。

arXiv:2510.00071v1 Announce Type: new Abstract: Large Reasoning Language Models (LRLMs or LRMs) demonstrate remarkable capabilities in complex reasoning tasks, but suffer from significant computational inefficiencies due to overthinking phenomena. Existing efficient reasoning methods face the challenge of balancing reasoning quality with inference cost reduction. We propose \textbf{Adaptive Reasoning Suppression (ARS)}, a novel training-free approach that dynamically suppresses redundant reasoning steps while preserving accuracy through adaptive certainty monitoring. ARS introduces a multi-checkpoint certainty estimation mechanism with progressive suppression thresholds, achieving superior efficiency compared to static suppression methods. Our extensive evaluation across mathematical reasoning benchmarks using multiple model architectures demonstrates that ARS achieves up to 53%, 46.1%, and 57.9% in token, latency and energy reduction, while maintaining or improving accuracy.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LRLMs 推理效率自适应抑制

相关文章

拆分Transformer注意力，韩国团队让大模型解码提速20倍

院士领衔推出大模型的第3种记忆：比参数存储和RAG都便宜，2.4B模型越级打13B

只激活3.8B参数，性能比肩同款7B模型！训练微调都能用，来自微软

Q-Sparse: A New Artificial Intelligence AI Approach to Enable Full Sparsity of Activations in LLMs

全新腾讯混元Turbo模型发布价格再低50%

腾讯混元Turbo。该模型采用MoE架构，比上一代产品推理效率提升100%，推理成本降低50%。对外，腾讯混元Turbo的价格也比混元Pro降低50%，输出价格为0.05元/千token...

腾讯发布新一代大模型“混元Turbo” 推理效率提升100%

腾讯发布新一代大模型“混元Turbo”

腾讯推出新一代大模型“混元Turbo”：性能大幅提升，定价低50%

腾讯发布新一代大模型“混元Turbo”：推理成本下降50% 效率提升100%