cs.AI updates on arXiv.org 10月02日
构建可靠模型可解释性基准
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提出一套评估模型可解释性方法的可靠性标准,并引入BackX基准,通过理论证明和实验分析,确保了评估的公平性和一致性,为现有方法提供全面比较。

arXiv:2405.02344v2 Announce Type: replace-cross Abstract: Attribution methods compute importance scores for input features to explain model predictions. However, assessing the faithfulness of these methods remains challenging due to the absence of attribution ground truth to model predictions. In this work, we first identify a set of fidelity criteria that reliable benchmarks for attribution methods are expected to fulfill, thereby facilitating a systematic assessment of attribution benchmarks. Next, we introduce a Backdoor-based eXplainable AI benchmark (BackX) that adheres to the desired fidelity criteria. We theoretically establish the superiority of our approach over the existing benchmarks for well-founded attribution evaluation. With extensive analysis, we further establish a standardized evaluation setup that mitigates confounding factors such as post-processing techniques and explained predictions, thereby ensuring a fair and consistent benchmarking. This setup is ultimately employed for a comprehensive comparison of existing methods using BackX. Finally, our analysis also offers insights into defending against neural Trojans by utilizing the attributions.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

模型可解释性 基准测试 BackX 神经网络
相关文章