AWS Machine Learning Blog 10月22日 01:48
Amazon Nova Sonic:构建多智能体语音助手的创新实践
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了Amazon Nova Sonic,一个能生成自然、类人语音对话的基础模型,为生成式AI应用提供实时语音交互能力。文章重点阐述了多智能体架构在构建生产级语音助手中的优势,包括其模块化、健壮性和可扩展性。通过一个银行语音助手案例,文章展示了如何将Nova Sonic与Strands Agents框架和Amazon Bedrock AgentCore结合,构建一个高效的多智能体系统。其中,每个子智能体负责特定任务,如身份验证、账户查询和抵押贷款咨询,从而提升了系统的维护性和增强性,并提供了无缝的用户体验。

🚀 **多智能体架构的优势:** 传统的单一模型(monolithic design)在处理复杂任务时容易变得难以维护和增强。多智能体架构将复杂任务分解为更小、更易管理的子任务,每个智能体专注于特定领域,如身份验证、账户查询等。这种设计模式提供了模块化、健壮性和可扩展性,类似于微服务架构,允许重用已开发的智能体工作流,从而提高开发效率和系统性能。

📞 **Amazon Nova Sonic与Bedrock AgentCore的集成:** Amazon Nova Sonic作为语音接口层,通过“工具使用”(tool use)事件与Bedrock AgentCore中的子智能体进行集成。当用户提出请求时,Nova Sonic能够识别并将其路由到相应的子智能体(例如,银行子智能体)。子智能体在AgentCore Runtime上运行,处理具体的业务逻辑,并将结果返回给Nova Sonic,最终生成语音回复,实现无缝的语音交互。

🏦 **银行语音助手案例详解:** 文章以一个银行语音助手为例,展示了多智能体架构的实际应用。该助手包含三个核心子智能体:身份验证子智能体、银行子智能体和抵押贷款子智能体。每个子智能体独立处理其领域的逻辑,包括输入验证。这种封装方式简化了主智能体的推理逻辑,并保持了业务逻辑的清晰分离,符合软件工程的模块化设计原则。

💡 **构建语音智能体的最佳实践:** 在开发语音优先的多智能体系统时,需注意平衡灵活性与延迟。虽然子智能体的调用能力强大,但也可能增加响应延迟,因此设计时应考虑响应时间。为子智能体选择高效的模型(如Nova Lite)可以降低延迟。同时,应 crafted 简洁、聚焦的语音回复,以便用户理解和跟进,从而提升对话的自然流畅度。考虑使用无状态(stateless)或有状态(stateful)的子智能体设计,以适应不同的交互需求。

Amazon Nova Sonic is a foundation model that creates natural, human-like speech-to-speech conversations for generative AI applications, allowing users to interact with AI through voice in real-time, with capabilities for understanding tone, enabling natural flow, and performing actions.

Multi-agent architecture offers a modular, robust, and scalable design pattern for production-level voice assistants. This blog post explores Amazon Nova Sonic voice agent applications and demonstrates how they integrate with Strands Agents framework sub-agents while leveraging Amazon Bedrock AgentCore to create an effective multi-agent system.

Why multi-agent architecture?

Imagine developing a financial assistant application responsible for user onboarding, information collection, identity verification, account inquiries, exception handling, and handing off to human agents based on predefined conditions. As functional requirements expand, the voice agent continues to add new inquiry types. The system prompt grows enormous, and the underlying logic becomes increasingly complex, illustrates a persistent challenge in software development: monolithic designs lead to systems that are difficult to maintain and enhance.

Think of multi-agent architecture as building a team of specialized AI assistants rather than relying on a single do-it-all helper. Just like companies divide responsibilities across different departments, this approach breaks complex tasks into smaller, manageable pieces. Each AI agent becomes an expert in a specific area—whether that’s fact-checking, data processing, or handling specialized requests. For the user, the experience feels seamless: there’s no delay, no change in voice, and no visible handoff. The system functions behind the scenes, directing each expert agent to step in at the right moment.

In addition to modular and robust benefits, multi-agent systems offer advantages similar to a microservice architecture, a popular enterprise software design pattern, providing scalability, distribution and maintainability while allowing organizations to reuse agentic workflows already developed for their large language model (LLM)-powered applications.

Sample application

In this blog, we refer to the Amazon Nova Sonic workshop multi-agent lab code, which uses the banking voice assistant as a sample to demonstrate how to deploy specialized agents on Amazon Bedrock AgentCore. It uses Nova Sonic as the voice interface layer and acts as an orchestrator to delegate detailed inquiries to sub-agents written in Strands Agents hosted on AgentCore Runtime. You can find the sample source code on the GitHub repo.

In the banking voice agent sample, the conversation flow begins with a greeting and collecting the user’s name, and then it handles inquiries related to banking or mortgages. We use three secondary level agents hosted on AgentCore to handle specialized logic:

Sub-agents are self-contained, handling their own logic such as input validation. For instance, the authentication agent validates account IDs and returns errors to Nova Sonic if needed. This simplifies the reasoning logic in Nova Sonic while keeping business logic encapsulated, similar to the software engineering modular design patterns.

Integrate Nova Sonic with AgentCore through tool use events

Amazon Nova Sonic relies on tool use to integrate with agentic workflows. During the Nova Sonic event lifecycle, you can provide tool use configurations through the promptStart event, which is designed to initiate when Sonic receives specific types of input.

For example, in the following Sonic tool configuration sample, tool use is configured to initiate events based on Sonic’s built-in reasoning model, which classifies the inquiry for routing to the banking sub-agents.

[    {        "toolSpec": {            "name": "bankAgent",            "description": `Use this tool whenever the customer asks about their **bank account balance** or **bank statement**.                      It should be triggered for queries such as:                      - "What’s my balance?"                      - "How much money do I have in my account?"                      - "Can I see my latest bank statement?"                      - "Show me my account summary."`,            "inputSchema": {                "json": JSON.stringify({                "type": "object",                "properties": {                    "accountId": {                        "type": "string",                        "description": "This is a user input. It is the bank account Id which is a numeric number."                    },                    "query": {                        "type": "string",                        "description": "The inquiry to the bank agent such as check account balance, get statement etc."                    }                },                "required": [                    "accountId", "query"                ]                })            }        }    }]

When a user asks Nova Sonic a question such as ‘What is my account balance?’, Sonic sends a toolUse event to the client application with the specified toolName (for example, bankAgent) defined in the configuration. The application can then invoke the sub-agent hosted on AgentCore to handle the banking logic and return the response to Sonic, which in turn generates an audio reply for the user.

{  "event": {    "toolUse": {      "completionId": "UUID",      "content": "{\"accountId\":\"one two three four five\",\"query\":\"check account balance\"}",      "contentId": "UUID",      "promptName": "UUID",      "role": "TOOL",      "sessionId": "UUID",      "toolName": "bankAgent",      "toolUseId": "UUID"    }  }}

Sub-agent on AgentCore

The following sample showcases the banking sub-agent developed using the Strands Agents framework, specifically configured for deployment on Bedrock AgentCore. It leverages Nova Lite through Amazon Bedrock as its reasoning model, providing effective cognitive capabilities with minimal latency. The agent implementation features a system prompt that defines its banking assistant responsibilities, complemented by two specialized tools: one for account balance inquiries and another for bank statement retrieval.

from strands import Agent, toolimport jsonfrom bedrock_agentcore.runtime import BedrockAgentCoreAppfrom strands.models import BedrockModelimport re, argparseapp = BedrockAgentCoreApp()@tooldef get_account_balance(account_id) -> str:    """Get account balance for given account Id    Args:        account_id: Bank account Id    """    # The actual implementation will retrieve information from a database API or another backend service.        return {"result": result}@tooldef get_statement(account_id: str, year_and_month: str) -> str:    """Get account statement for a given year and month    Args:        account_id: Bank account Id        year_and_month: Year and month of the bank statement. For example: 2025_08 or August 2025    """    # The actual implementation will retrieve information from a database API or another backend service.        return {"result": result}# Specify Bedrock LLM for the Agentbedrock_model = BedrockModel(    model_id="amazon.nova-lite-v1:0",)# System promptsystem_prompt = '''You are a banking agent. You will receive requests that include:  - `account_id`  - `query` (the inquiry type, such as **balance** or **statement**, plus any additional details like month).  ## Instructions1. Use the provided `account_id` and `query` to call the tools.  2. The tool will return a JSON response.  3. Summarize the result in 2–3 sentences.     - For a **balance inquiry**, give the account balance with currency and date.     - For a **statement inquiry**, provide opening balance, closing balance, and number of transactions.  4. Do not return raw JSON. Always respond in natural language.  '''# Create an agent with tools, LLM, and system promptagent = Agent(    tools=[ get_account_balance, get_statement],     model=bedrock_model,    system_prompt=system_prompt)@app.entrypointdef banking_agent(payload):    response = agent(json.dumps(payload))    return response.message['content'][0]['text']    if __name__ == "__main__":    app.run()

Best practices for voice-based multi-agent systems

Multi-agent architecture provides exceptional flexibility and a modular design approach, allowing developers to structure voice assistants efficiently and potentially reuse existing specialized agent workflows. When implementing voice-first experiences, there are important best practices to consider that address the unique challenges of this modality.

Consider stateless vs. stateful sub-agent design

Stateless sub-agents handle each request independently, without retaining memory of past interactions or session-level states. They are simple to implement, easy to scale, and work well for straightforward, one-off tasks. However, they cannot provide context-aware responses unless external state management is introduced.

Stateful sub-agents, on the other hand, maintain memory across interactions to support context-aware responses and session-level states. This enables more personalized and cohesive user experiences, but comes with added complexity and resource requirements. They are best suited for scenarios involving multi-turn interactions and user or session-level context caching.

Conclusion

Multi-agent architectures unlock flexibility, scalability, and accuracy for complex AI-driven workflows. By combining the Nova Sonic conversational capabilities with the orchestration power of Bedrock AgentCore, you can build intelligent, specialized agents that work together seamlessly. If you’re exploring ways to enhance your AI applications, multi-agent patterns with Nova Sonic and AgentCore are a powerful approach worth testing.

Learn more about Amazon Nova Sonic by visiting the User Guide, building your application with the sample applications, and exploring the Nova Sonic workshop to get started. You can also refer to the technical report and model card for additional benchmarks.


About the authors

Lana Zhang is a Senior Specialist Solutions Architect for Generative AI at AWS within the Worldwide Specialist Organization. She specializes in AI/ML, with a focus on use cases such as AI voice assistants and multimodal understanding. She works closely with customers across diverse industries, including media and entertainment, gaming, sports, advertising, financial services, and healthcare, to help them transform their business solutions through AI.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Amazon Nova Sonic 多智能体架构 语音助手 生成式AI Bedrock AgentCore Strands Agents AI对话
相关文章