低资源语言TTS系统优化框架

cs.AI updates on arXiv.org 09月29日 12:07

低资源语言TTS系统优化框架

本文提出一种基于GRPO的框架，用于将多语言TTS模型适应新语言，通过多目标奖励优化模型，在低资源语言中实现高质语音合成，并提升高资源语言TTS性能。

arXiv:2509.21718v1 Announce Type: new Abstract: Developing high-quality text-to-speech (TTS) systems for low-resource languages is challenging due to the scarcity of paired text and speech data. In contrast, automatic speech recognition (ASR) models for such languages are often more accessible, owing to large-scale multilingual pre-training efforts. We propose a framework based on Group Relative Policy Optimization (GRPO) to adapt an autoregressive, multilingual TTS model to new languages. Our method first establishes a language-agnostic foundation for TTS synthesis by training a multilingual baseline with International Phonetic Alphabet (IPA) tokens. Next, we fine-tune this model on limited paired data of the new languages to capture the target language's prosodic features. Finally, we apply GRPO to optimize the model using only unpaired text and speaker prompts, guided by a multi-objective reward from pretrained ASR, speaker verification, and audio quality estimation models. Experiments demonstrate that this pipeline produces intelligible and speaker-consistent speech in low-resource languages, substantially outperforming fine-tuning alone. Furthermore, our GRPO-based framework also improves TTS performance in high-resource languages, surpassing offline alignment methods such as Direct Preference Optimization (DPO) yielding superior intelligibility, speaker similarity, and audio quality.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

TTS 低资源语言 GRPO 语音合成多语言

相关文章

正面硬刚OpenAI与谷歌？微软竟然偷偷自研出5000亿参数大模型

Neural Synthesis of Binaural Speech From Mono Audio with Alexander Richard - #514

VEON Pledges Support to Expand the Use of AI in Under-resourced Local Languages

AI News Weekly - Issue #386: Best AI Voice Generators 2024: What Scarlett Johansson's AI Dispute Taught Us - May 23rd 2024

快来感受一下，大早上震撼到我了，这也太真实了。这个视频里的声音是推上一个人用开源 TTS https://github.com/2noise/ChatTTS 生成的。 B站这里还有个作者演示...

自己尝试了一下这个 ChatTTS 语音合成项目。真的很牛批，他是有感情的，并且会自己在合适的地方添加语气词帮助衔接内容。并且和字节新上的 LLM 语音合成做了一...

Google’s Advanced AI Models: Gemini, PaLM, and Bard

ChatTTS是最近很火的超逼真TTS（文本转语音）模型，重点是开源了O！试了下，是真的没有机器味儿啊...太强了现在HuggingFace Space上有不少的demo可以体验效果...

TaskUs and Mavenoid Join Hands To Enable AI-Powered Product Support

Instreamatic Announces New Opportunities for Brands to Access Professional Union Talent for Highly Personalized Ads