UVM框架增强LLM搜索鲁棒性

cs.AI updates on arXiv.org 10月21日 12:29

UVM框架增强LLM搜索鲁棒性

本文提出UVM框架，利用价值模型引导搜索，增强大型语言模型（LLM）的搜索鲁棒性，通过不确定性感知价值模型和群体Thompson抽样方法，显著降低验证器失败率并提升解决方案覆盖率。

arXiv:2502.11155v2 Announce Type: replace Abstract: Value model guided search is effective in steering LLM generation but suffers from a lack of robustness. This is due to verifier failure: imperfect VMs mistakenly prune valid reasoning paths, especially when encountering unseen reasoning paths generated during search. To address this, we propose an uncertainty-aware framework with two key components: (1) Uncertainty-Aware Value Models (UVMs), which replace single-point value estimates with value distributions to quantify prediction reliability, and (2) Group Thompson Sampling, an efficient algorithm that selects candidates based on their probability of being optimal. Experiments on two In-Distribution (ID) settings (GSM8K, MATH) and three Out-Of-Distribution (OOD) settings (e.g., AIME25, Minerva Math) show our method significantly mitigates verifier failure and boosts solution coverage, especially on OOD problems. This work provides the first systematic integration of uncertainty quantification into LLM search paradigms, enhancing robustness. The code is released at https://github.com/FreedomIntelligence/UVM.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

UVM框架 LLM搜索鲁棒性不确定性感知价值模型

相关文章

Fairness and Robustness in Federated Learning with Virginia Smith -#504

High-Dimensional Robust Statistics with Ilias Diakonikolas - #351

RABBITS: A Specialized Dataset and Leaderboard to Aid in Evaluating LLM Performance in Healthcare

Comprehensive Analysis of The Performance of Vision State Space Models (VSSMs), Vision Transformers, and Convolutional Neural Networks (CNNs)

多模态大模型看懂图片也会答错，智源联合多家机构推出多模态模型鲁棒性测试基准

北航沙磊教授：当Agentic RAG照进现实——Agent Insights

Generalizable Reward Model (GRM): An Efficient AI Approach to Improve the Generalizability and Robustness of Reward Learning for LLMs

LayerShuffle: Robust Vision Transformers for Arbitrary Layer Execution Orders

击败人类又怎样？“超人”AI简直不堪一击？研究发现：ChatGPT等大模型也不行

Advancing Robustness in Neural Information Retrieval: A Comprehensive Survey and Benchmarking Framework