cs.AI updates on arXiv.org 10月21日 12:09
LLM生成临床思维链的可靠性评估
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文评估了大型语言模型(LLM)生成临床思维链(CoTs)的可靠性,并研究了提升其质量的提示策略。研究表明,选择性小样本提示策略在所有人类评估指标上显著优于其他策略,并提出“双重原则”框架以生成可信数据。

arXiv:2510.16095v1 Announce Type: new Abstract: Creating high-quality clinical Chains-of-Thought (CoTs) is crucial for explainable medical Artificial Intelligence (AI) while constrained by data scarcity. Although Large Language Models (LLMs) can synthesize medical data, their clinical reliability remains unverified. This study evaluates the reliability of LLM-generated CoTs and investigates prompting strategies to enhance their quality. In a blinded comparative study, senior clinicians in Assisted Reproductive Technology (ART) evaluated CoTs generated via three distinct strategies: Zero-shot, Random Few-shot (using shallow examples), and Selective Few-shot (using diverse, high-quality examples). These expert ratings were compared against evaluations from a state-of-the-art AI model (GPT-4o). The Selective Few-shot strategy significantly outperformed other strategies across all human evaluation metrics (p < .001). Critically, the Random Few-shot strategy offered no significant improvement over the Zero-shot baseline, demonstrating that low-quality examples are as ineffective as no examples. The success of the Selective strategy is attributed to two principles: "Gold-Standard Depth" (reasoning quality) and "Representative Diversity" (generalization). Notably, the AI evaluator failed to discern these critical performance differences. The clinical reliability of synthetic CoTs is dictated by strategic prompt curation, not the mere presence of examples. We propose a "Dual Principles" framework as a foundational methodology to generate trustworthy data at scale. This work offers a validated solution to the data bottleneck and confirms the indispensable role of human expertise in evaluating high-stakes clinical AI.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

大型语言模型 临床思维链 提示策略 数据质量 可靠性评估
相关文章