RoT：高效推理模型RoT降低推理时延与成本

cs.AI updates on arXiv.org 09月29日

RoT：高效推理模型RoT降低推理时延与成本

本文提出了一种名为RoT的推理模型，通过重用先前推理步骤来引导新问题，从而降低推理时延和成本。RoT在推理过程中，检索与查询相关的节点，并通过奖励引导的遍历来构建特定问题模板，以此减少冗余探索并提高效率。

arXiv:2509.21743v1 Announce Type: new Abstract: Large reasoning models improve accuracy by producing long reasoning traces, but this inflates latency and cost, motivating inference-time efficiency. We propose Retrieval-of-Thought (RoT), which reuses prior reasoning as composable ``thought" steps to guide new problems. RoT organizes steps into a thought graph with sequential and semantic edges to enable fast retrieval and flexible recombination. At inference, RoT retrieves query-relevant nodes and applies reward-guided traversal to assemble a problem-specific template that guides generation. This dynamic template reuse reduces redundant exploration and, therefore, reduces output tokens while preserving accuracy. We evaluate RoT on reasoning benchmarks with multiple models, measuring accuracy, token usage, latency, and memory overhead. Findings show small prompt growth but substantial efficiency gains, with RoT reducing output tokens by up to 40%, inference latency by 82%, and cost by 59% while maintaining accuracy. RoT establishes a scalable paradigm for efficient LRM reasoning via dynamic template construction through retrieval.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

推理模型 RoT 推理效率时延降低成本降低

相关文章

深圳房贷新政：首套首付最低至20%、房贷利率降至3.50%

光线传媒：与七维科技达成合作，委托其定制开发AI Studio

Rivian 更新 R1，采用新型电机和电池组，提高了性能，降低了成本

比亚迪保险线上开售，山东、安徽等七地非营运类私家车可投保

【iThome 2024 CIO大調查系列2｜服務業IT趨勢2】服務業最積極擁抱GAI應用，聚焦可量化商業面成果

亚马逊在财务管理方面加大生成式AI使用力度

欣旺达：全固态电池成本可降至2元/Wh

拆分Transformer注意力，韩国团队让大模型解码提速20倍

SHEIN仓储物流环节引入新能源卡车，可减碳近30%

院士领衔推出大模型的第3种记忆：比参数存储和RAG都便宜，2.4B模型越级打13B