cs.AI updates on arXiv.org 08月15日
A Training-Free Approach for Music Style Transfer with Latent Diffusion Models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了一种名为Stylus的无监督音乐风格迁移框架,该框架通过直接操作预训练的潜在扩散模型(LDM)的自注意力层,在mel频谱图域内实现音乐风格的转换,无需额外训练,显著提升风格化质量和结构保持。

arXiv:2411.15913v2 Announce Type: replace-cross Abstract: Music style transfer enables personalized music creation by combining the structure of one piece with the stylistic characteristics of another. While recent approaches have explored text-conditioned generation and diffusion-based synthesis, most require extensive training, paired datasets, or detailed textual annotations. In this work, we introduce Stylus, a novel training-free framework for music style transfer that directly manipulates the self-attention layers of a pre-trained Latent Diffusion Model (LDM). Operating in the mel-spectrogram domain, Stylus transfers musical style by replacing key and value representations from the content audio with those of the style reference, without any fine-tuning. To enhance stylization quality and controllability, we further incorporate query preservation, CFG-inspired guidance scaling, multi-style interpolation, and phase-preserving reconstruction. Our method significantly improves perceptual quality and structural preservation compared to prior work, while remaining lightweight and easy to deploy. This work highlights the potential of diffusion-based attention manipulation for efficient, high-fidelity, and interpretable music generation-without training. Codes will be released upon acceptance.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

音乐风格迁移 无监督学习 潜在扩散模型
相关文章