基于MoE结构习惯的KD检测框架

cs.AI updates on arXiv.org 10月21日 12:27

基于MoE结构习惯的KD检测框架

本文提出一种有效的KD检测框架，通过分析MoE的内部路由模式等结构习惯，实现白盒和黑盒环境下的KD检测，并建立可复现的基准测试，实验结果表明该方法在多种场景下具有超过94%的检测准确率。

arXiv:2510.16968v1 Announce Type: cross Abstract: Knowledge Distillation (KD) accelerates training of large language models (LLMs) but poses intellectual property protection and LLM diversity risks. Existing KD detection methods based on self-identity or output similarity can be easily evaded through prompt engineering. We present a KD detection framework effective in both white-box and black-box settings by exploiting an overlooked signal: the transfer of MoE "structural habits", especially internal routing patterns. Our approach analyzes how different experts specialize and collaborate across various inputs, creating distinctive fingerprints that persist through the distillation process. To extend beyond the white-box setup and MoE architectures, we further propose Shadow-MoE, a black-box method that constructs proxy MoE representations via auxiliary distillation to compare these patterns between arbitrary model pairs. We establish a comprehensive, reproducible benchmark that offers diverse distilled checkpoints and an extensible framework to facilitate future research. Extensive experiments demonstrate >94% detection accuracy across various scenarios and strong robustness to prompt-based evasion, outperforming existing baselines while highlighting the structural habits transfer in LLMs.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

知识蒸馏 MoE结构习惯 KD检测黑盒检测白盒检测

相关文章

Advancements in Knowledge Distillation and Multi-Teacher Learning: Introducing AM-RADIO Framework

Google Researchers Reveal Practical Insights into Knowledge Distillation for Model Compression

ACL 2024｜D2LLM：将Causal LLM改造成向量搜索模型的黑科技

将慢思考蒸馏进快思考，Meta 把 Llama2 跃升至 GPT-4 水平

Nvidia AI Releases Minitron 4B and 8B: A New Series of Small Language Models that are 40x Faster Model Training via Pruning and Distillation

更小更强大的 GPT-4o mini 背后，AI 模型的未来不再是越大越好

Theia: A Robot Vision Foundation Model that Simultaneously Distills Off-the-Shelf VFMs such as CLIP, DINOv2, and ViT

DistillGrasp: A Unique AI Method for Integrating Features Correlation with Knowledge Distillation for Depth Completion of Transparent Objects

Nvidia AI Released Llama-Minitron 3.1 4B: A New Language Model Built by Pruning and Distilling Llama 3.1 8B

小而强，英伟达剪枝、蒸馏出 Llama-3.1-Minitron 4B AI 模型