dify blog 09月19日
Dify与Open Audio合作推出Fish Audio插件
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Dify与Open Audio合作,推出Fish Audio工具集插件。该插件可在Dify Marketplace上使用,使Dify用户能够轻松将高质量的文本转语音和语音克隆功能集成到他们的AI应用中。Fish Audio的核心功能包括实时文本转语音转换、WebSocket API支持、多种音频格式以及快速语音克隆能力。

🔍 Fish Audio是Open Audio开发的一套多功能音频工具集插件,现已登陆Dify Marketplace,允许Dify用户将其先进的文本转语音(TTS)和语音克隆技术无缝集成到AI应用中。

🗣️ Fish Audio的核心功能之一是强大的实时文本转语音转换,它支持WebSocket API进行音频流输出,用户可以控制语速和音量等参数,并兼容Opus、MP3和WAV等常见音频格式。

🎤 语音克隆是Fish Audio的另一大亮点,用户只需提供30-45秒的语音样本即可快速完成克隆,生成高度相似的语音模型,极大地提升了个性化语音内容的生产效率。

🛠️ 在Dify平台上使用Fish Audio非常便捷,用户只需在Dify Marketplace找到并安装Fish Audio插件,配置API密钥和端点URL,选择平衡模式即可开始使用其丰富的音频处理功能。

🌐 Fish Audio的应用场景广泛,包括多语言客户支持、教育训练内容制作以及播客和媒体内容创作。例如,企业可以利用语音克隆技术创建基于客服代表录音的定制语音模型,实现自动化的、自然语言交互的客户服务。

We are thrilled to announce a new collaboration between Dify and Open Audio. The versatile Fish Audio toolset plugin from Open Audio is now available on the Dify Marketplace. This integration enables Dify users to seamlessly incorporate high-quality text-to-speech and voice cloning into their AI applications.

Core Functions of Fish Audio

Fish Audio excels in speech generation and processing, offering the following key capabilities:

Speech Generation (TTS): Fish Audio provides robust real-time text-to-speech conversion. It features a WebSocket API for streaming audio output, giving users control over parameters like speed and volume. It supports common audio formats including Opus, MP3, and WAV.

Voice Cloning: The tool also features excellent voice cloning abilities. Users can perform fast cloning with just 30-45 seconds of voice samples.

Getting Started

To begin using Fish Audio tools in Dify, find and install the "Fish Audio" plugin from the Dify Marketplace.

Next, configure the plugin with your Fish Audio API key and endpoint URL, which you can obtain from here. You'll also need to select the balance mode during this setup.

Using the Fish Audio TTS Tool in a Dify Chatflow

For instance, you can build a Dify chatflow where a Large Language Model (LLM) generates text. You can then use the Fish Audio Text-to-Speech (TTS) tool node to automatically convert that text output into an audio segment.

To configure the Fish Audio TTS node within your workflow:

  1. Input Text: Specify the text you want to convert to speech. In this case, you would link the text output from the LLM node to the input field of the TTS node.

  2. Select Voice: Choose the desired voice by selecting the appropriate Voice ID.

  3. Output Format: Set your preferred output audio file type.

This setup allows the workflow to seamlessly generate speech from the LLM's written response using the specific voice and format you've chosen.

Understanding Voice ID

A Voice ID is the unique identifier for a specific voice model on the Fish Audio platform. It essentially represents a distinct voice profile that you can select for text-to-speech generation.

Creating and Using Custom Voices

You aren't limited to the standard voices. You can train your own unique voice model using the "Build Voice" feature within Fish Audio. Once the training process is complete, you can find your custom trained voice listed in your "My Library". Simply copy the Voice ID associated with your custom voice from there to use it in your Dify workflows.

Real-World Use Cases

  1. Multilingual Customer Support Scenarios Using Fish Audio's voice cloning feature, businesses can create custom voice models based on recordings of their top customer service representatives. The system then automatically turns written customer service replies into natural-sounding audio using these custom voices. It can even switch to the appropriate voice and language automatically based on the customer's language. This whole process leverages Fish Audio's core capabilities: voice cloning, automatic speech recognition (ASR), and text-to-speech (TTS), leading to more natural and efficient customer interactions.

  2. Creating Educational and Training Content For education and training, Fish Audio helps quickly create standardized course materials. For instance, in language learning, it can clone the voices of native speakers to provide clear pronunciation examples, while also using ASR technology to give real-time feedback on a learner's pronunciation. Furthermore, TTS technology can generate consistent audio explanations for course content. This streamlines both the creation and delivery of educational materials, ensuring consistency.

  3. Podcast and Media Content Creation Fish Audio offers media creators a flexible solution for producing content. Creators can use samples of their own voice to create a personalized digital voice and then use this model to turn written scripts into audio recordings. In post-production, the ASR feature can quickly generate transcripts and subtitles, making the content more accessible. The platform also allows adjusting things like speaking speed and emotional tone to ensure the final audio perfectly fits their creative needs.

About Open Audio

Open Audio is a Research lab belonging to Hanabi AI Inc, dedicated to providing better audio-related projects for the open-source community. Currently, its product Fish Audio offers audio synthesis and speech recognition capabilities that have reached industry-leading levels in both open-source and closed-source domains.

Website | Github | FishAudio | X | Discord

About Dify.AI

Dify.AI is revolutionizing AI-native application development by providing an open-source platform that simplifies the entire lifecycle of AI application creation, deployment, and management. With its extensible plugin ecosystem, Dify.AI enables developers and businesses to seamlessly integrate AI capabilities, customize workflows, and accelerate innovation. By lowering the barriers to AI adoption, Dify.AI empowers users to build intelligent applications with greater efficiency and flexibility.

Website | GitHub | Docs | X | Discord | Linkedin | YouTube

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Dify Open Audio Fish Audio 文本转语音 语音克隆 AI应用
相关文章