DAFMSVC: One-Shot Singing Voice Conversion with Dual Attention Mechanism and Flow Matching

cs.AI updates on arXiv.org 08月11日

DAFMSVC: One-Shot Singing Voice Conversion with Dual Attention Mechanism and Flow Matching

本文提出一种名为DAFMSVC的新方法，用于解决Singing Voice Conversion中音色转换问题。该方法通过替换自监督学习特征、引入双重交叉注意力机制和流匹配模块，显著提高了音色相似度和自然度，在主观和客观评价中优于现有技术。

arXiv:2508.05978v1 Announce Type: cross Abstract: Singing Voice Conversion (SVC) transfers a source singer's timbre to a target while keeping melody and lyrics. The key challenge in any-to-any SVC is adapting unseen speaker timbres to source audio without quality degradation. Existing methods either face timbre leakage or fail to achieve satisfactory timbre similarity and quality in the generated audio. To address these challenges, we propose DAFMSVC, where the self-supervised learning (SSL) features from the source audio are replaced with the most similar SSL features from the target audio to prevent timbre leakage. It also incorporates a dual cross-attention mechanism for the adaptive fusion of speaker embeddings, melody, and linguistic content. Additionally, we introduce a flow matching module for high quality audio generation from the fused features. Experimental results show that DAFMSVC significantly enhances timbre similarity and naturalness, outperforming state-of-the-art methods in both subjective and objective evaluations.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Singing Voice Conversion 音色转换 DAFMSVC 自监督学习交叉注意力

相关文章

Trends in Deep Reinforcement Learning with Kamyar Azizzadenesheli - #560

Trends in Computer Vision with Amir Zamir - #338

Question-Answer Cross Attention Networks (QAN): Advancing Answer Selection in Community Question Answering

大模型最强架构TTT问世，一夜推翻Transformer？

大模型最强架构TTT问世！斯坦福UCSD等5年磨一剑，一夜推翻Transformer

澳大利亚国立大学Nick Barnes团队 | 对息肉分割的再思考: 从分布外视角展开

POA: A Novel Self-Supervised Learning Paradigm for Efficient Multi-Scale Model Pre-Training

数字华夏展示“夏澜”人形机器人：高仿外观、百变人脸、自然语音交互

Nat. Methods | 单细胞组学中的transformer：综述与新视角

扩散模型训练方法一直错了！谢赛宁：Representation matters