音频编辑新框架：文本引导下高效扩散模型

cs.AI updates on arXiv.org 09月18日

音频编辑新框架：文本引导下高效扩散模型

本文提出了一种基于高效修正流匹配的扩散模型，用于文本引导下的音频编辑。模型能够在不依赖额外字幕或掩膜的情况下，实现语义对齐，并保持较高的编辑质量。

arXiv:2509.14003v1 Announce Type: cross Abstract: Diffusion models have shown remarkable progress in text-to-audio generation. However, text-guided audio editing remains in its early stages. This task focuses on modifying the target content within an audio signal while preserving the rest, thus demanding precise localization and faithful editing according to the text prompt. Existing training-based and zero-shot methods that rely on full-caption or costly optimization often struggle with complex editing or lack practicality. In this work, we propose a novel end-to-end efficient rectified flow matching-based diffusion framework for audio editing, and construct a dataset featuring overlapping multi-event audio to support training and benchmarking in complex scenarios. Experiments show that our model achieves faithful semantic alignment without requiring auxiliary captions or masks, while maintaining competitive editing quality across metrics.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

音频编辑文本引导扩散模型高效编辑语义对齐

相关文章

Top Important Computer Vision Papers for the Week from 29/04 to 05/05

This AI Research Introduces SubGDiff: Utilizing Diffusion Model to Improve Molecular Representation Learning

AI generates high-quality images 30 times faster in a single step

DIAMOND (DIffusion as a Model of Environment Dreams): A Reinforcement Learning Agent Trained in a Diffusion World Model

Controlled diffusion model can change material properties in images

Top AI Courses from NVIDIA

A technique for more effective multipurpose robots

How Do Diffusion Models Work? Simple Explanation: No Mathematical Jargon, Promised!

程序合成中语法树的扩散

NVIDIA’s Autoguidance: Improving Image Quality and Variation in Diffusion Models