DIT：一种描述预训练语言模型微调变化的解释方法

cs.AI updates on arXiv.org 10月07日 12:18

DIT：一种描述预训练语言模型微调变化的解释方法

本文提出了一种名为Diff Interpretation Tuning（DIT）的方法，用于理解预训练语言模型的微调变化。该方法通过训练模型描述其微调引起的修改，提高模型的可解释性。

arXiv:2510.05092v1 Announce Type: cross Abstract: Finetuning (pretrained) language models is a standard approach for updating their internal parametric knowledge and specializing them to new tasks and domains. However, the corresponding model weight changes ("weight diffs") are not generally interpretable. While inspecting the finetuning dataset can give a sense of how the model might have changed, these datasets are often not publicly available or are too large to work with directly. Towards the goal of comprehensively understanding weight diffs in natural language, we introduce Diff Interpretation Tuning (DIT), a method that trains models to describe their own finetuning-induced modifications. Our approach uses synthetic, labeled weight diffs to train a DIT adapter, which can be applied to a compatible finetuned model to make it describe how it has changed. We demonstrate in two proof-of-concept settings (reporting hidden behaviors and summarizing finetuned knowledge) that our method enables models to describe their finetuning-induced modifications using accurate natural language descriptions.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

预训练语言模型微调变化解释方法 DIT 模型可解释性

相关文章

Model Explainability Forum - #401

AI for High-Stakes Decision Making with Hima Lakkaraju - #387

Legal and Policy Implications of Model Interpretability with Solon Barocas - TWiML Talk #219

Trends in Natural Language Processing with Sebastian Ruder - TWiML Talk #216

ShiftAddLLM: Accelerating Pretrained LLMs through Post-Training Shift-and-Add Reparameterization: Creating Efficient Multiplication-Free Models

中学生能看懂：快手「可灵」和「Sora」背后 DiT 技术

Gemma 2-2B Released: A 2.6 Billion Parameter Model Offering Advanced Text Generation, On-Device Deployment, and Enhanced Safety Features

链式思考如何激发大模型算术推理能力？科学家从神经元激活角度给出答案

DataVisT5: A Powerful Pre-Trained Language Model for Seamless Data Visualization Tasks