LLM推理增强：监控-生成-验证框架

cs.AI updates on arXiv.org 前天 12:10

LLM推理增强：监控-生成-验证框架

本文提出了一种结合Flavell认知监控模型的Monitor-Generate-Verify框架，以解决现有LLM推理增强方法的不足。实验结果显示，该方法在GSM8K数据集上取得了优于现有方法的准确率。

arXiv:2510.16374v1 Announce Type: new Abstract: Current approaches to enhancing LLM reasoning follows two isolated paradigms: Monitor-Generate methods like Plan-and-Solve (Wang et al., 2023) and SELF-DISCOVER (Zhou et al., 2024) excel at strategic planning but lack mechanisms to verify whether selected strategies succeed; while Generate-Verify approaches like Self-Verification (Weng et al., 2022) and SELF-REFINE (Madaan et al., 2023) iteratively refine outputs but commence generation blindly without task assessment. This separation creates inefficiencies -- strategies fail without feedback, and refinement occurs without strategic grounding. We address this gap by implementing Flavell's cognitive monitoring model (1979) from the broader Monitor-Generate-Verify framework (Oh and Gobet, 2025), operationalising it as a three-phase iterative system. On GSM8K, preliminary results show 75.42% accuracy versus 68.44% for SELF-REFINE and 67.07% for Self-Verification, while requiring fewer attempts (1.3 vs 2.0) at 27-37% increased inference cost. These initial findings suggest upfront monitoring produces higher-quality initial solutions that reduce refinement needs, though evaluation beyond arithmetic reasoning is needed to establish generalisability.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLM推理监控-生成-验证框架认知监控模型 GSM8K 准确率

相关文章

Training Data Locality and Chain-of-Thought Reasoning in LLMs with Ben Prystawski - #673

Key Metrics for Evaluating Large Language Models (LLMs)

【科股一线拆解】全国首套低空空域协调及运营服务平台正式启动，“四网”融合推动低空产业应用落地，机构认为行业有望迎来第二波机会

支付宝医疗大模型亮相！中英文考试超GPT-4，已落地江浙沪一线医院

该说的都说了，怎样操作投资仍旧要靠您们自己，投资一定需要天赋的，投资亦需要努力的但更需要悟性执行力等技能，遵循规律操作投资，我不回答的问题一定在前面文...

零一万物开源 Yi-Coder 系列编程助手模型，支持 52 种编程语言

Amazon EC2 P5e instances are generally available

超强o1模型智商已超120！1小时写出NASA博士1年代码，最新编程赛超越99.8%选手

Transformer推理天花板被谷歌打破？DeepMind首席科学家亮出84页PPT，却遭LeCun反对

Break the Sequential Dependency of LLM Inference Using Lookahead Decoding