cs.AI updates on arXiv.org 09月03日
CSR:提升LLM可信度的轻量级训练方法
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提出Counterfactual Sensitivity Regularization(CSR),一种轻量级训练目标,强化中间推理与最终输出间的依赖关系,提高大型语言模型在关键领域的可信度。

arXiv:2509.01544v1 Announce Type: new Abstract: Large language models (LLMs) often produce correct answers while relying on flawed or irrelevant reasoning traces, undermining their trustworthiness in high-stakes domains. We propose Counterfactual Sensitivity Regularization (CSR), a lightweight training objective that enforces dependence between intermediate reasoning and final outputs. CSR introduces automated, operator-level counterfactual interventions (e.g., swapping "+" with "-") during training and penalizes models that preserve the same answer under logically invalid traces. This requires only one additional forward pass per sample. To measure faithfulness, we introduce Counterfactual Outcome Sensitivity (COS), which quantifies the impact of such perturbations on model predictions. Across structured reasoning tasks - arithmetic (GSM8K), logical deduction (PrOntoQA), and planning (Blocks World) - CSR improves faithfulness by up to 70 percentage points over standard fine-tuning and process supervision, with only minor accuracy loss. The learned sensitivity generalizes to larger models and synergizes with inference-time methods such as self-consistency. A pilot study on HellaSwag further demonstrates that extending CSR with semantic perturbations can enhance faithfulness in commonsense reasoning.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

大型语言模型 训练方法 可信度提升
相关文章