语音LLM遗忘与模态不平等评估及解决

cs.AI updates on arXiv.org 09月19日

语音LLM遗忘与模态不平等评估及解决

本文系统评估了语音大语言模型中的灾难性遗忘和模态不平等问题，提出跨模态知识蒸馏框架，通过文本到文本和语音到文本通道，将知识从基于文本的教师模型转移到语音LLM，实验验证了方法在保持文本知识、提升跨模态对齐和增强语音交互推理方面的有效性。

arXiv:2509.14930v1 Announce Type: cross Abstract: In this work, we present the first systematic evaluation of catastrophic forgetting and modality inequivalence in speech large language models, showing that introducing speech capabilities can degrade knowledge and reasoning even when inputs remain textual, and performance further decreases with spoken queries. To address these challenges, we propose a cross-modal knowledge distillation framework that leverages both text-to-text and speech-to-text channels to transfer knowledge from a text-based teacher model to a speech LLM. Extensive experiments on dialogue and audio understanding tasks validate the effectiveness of our approach in preserving textual knowledge, improving cross-modal alignment, and enhancing reasoning in speech-based interactions.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签