Amazon Nova Sonic：构建多智能体语音助手的创新实践

Amazon Nova Sonic is a foundation model that creates natural, human-like speech-to-speech conversations for generative AI applications, allowing users to interact with AI through voice in real-time, with capabilities for understanding tone, enabling natural flow, and performing actions.

Multi-agent architecture offers a modular, robust, and scalable design pattern for production-level voice assistants. This blog post explores Amazon Nova Sonic voice agent applications and demonstrates how they integrate with Strands Agents framework sub-agents while leveraging Amazon Bedrock AgentCore to create an effective multi-agent system.

Why multi-agent architecture?

Imagine developing a financial assistant application responsible for user onboarding, information collection, identity verification, account inquiries, exception handling, and handing off to human agents based on predefined conditions. As functional requirements expand, the voice agent continues to add new inquiry types. The system prompt grows enormous, and the underlying logic becomes increasingly complex, illustrates a persistent challenge in software development: monolithic designs lead to systems that are difficult to maintain and enhance.

Think of multi-agent architecture as building a team of specialized AI assistants rather than relying on a single do-it-all helper. Just like companies divide responsibilities across different departments, this approach breaks complex tasks into smaller, manageable pieces. Each AI agent becomes an expert in a specific area—whether that’s fact-checking, data processing, or handling specialized requests. For the user, the experience feels seamless: there’s no delay, no change in voice, and no visible handoff. The system functions behind the scenes, directing each expert agent to step in at the right moment.

In addition to modular and robust benefits, multi-agent systems offer advantages similar to a microservice architecture, a popular enterprise software design pattern, providing scalability, distribution and maintainability while allowing organizations to reuse agentic workflows already developed for their large language model (LLM)-powered applications.

Sample application

In this blog, we refer to the Amazon Nova Sonic workshop multi-agent lab code, which uses the banking voice assistant as a sample to demonstrate how to deploy specialized agents on Amazon Bedrock AgentCore. It uses Nova Sonic as the voice interface layer and acts as an orchestrator to delegate detailed inquiries to sub-agents written in Strands Agents hosted on AgentCore Runtime. You can find the sample source code on the GitHub repo.

In the banking voice agent sample, the conversation flow begins with a greeting and collecting the user’s name, and then it handles inquiries related to banking or mortgages. We use three secondary level agents hosted on AgentCore to handle specialized logic:

Authenticate sub-agent: Handles user authentication using the account ID and other information Banking sub-agent: Handles account balance checks, statements, and other banking-related inquiries Mortgage sub-agent: Handles mortgage-related inquiries, including refinancing, rates, and repayment options

Sub-agents are self-contained, handling their own logic such as input validation. For instance, the authentication agent validates account IDs and returns errors to Nova Sonic if needed. This simplifies the reasoning logic in Nova Sonic while keeping business logic encapsulated, similar to the software engineering modular design patterns.

Integrate Nova Sonic with AgentCore through tool use events

Amazon Nova Sonic relies on tool use to integrate with agentic workflows. During the Nova Sonic event lifecycle, you can provide tool use configurations through the promptStart event, which is designed to initiate when Sonic receives specific types of input.

For example, in the following Sonic tool configuration sample, tool use is configured to initiate events based on Sonic’s built-in reasoning model, which classifies the inquiry for routing to the banking sub-agents.

[    {        "toolSpec": {            "name": "bankAgent",            "description": `Use this tool whenever the customer asks about their **bank account balance** or **bank statement**.                      It should be triggered for queries such as:                      - "What’s my balance?"                      - "How much money do I have in my account?"                      - "Can I see my latest bank statement?"                      - "Show me my account summary."`,            "inputSchema": {                "json": JSON.stringify({                "type": "object",                "properties": {                    "accountId": {                        "type": "string",                        "description": "This is a user input. It is the bank account Id which is a numeric number."                    },                    "query": {                        "type": "string",                        "description": "The inquiry to the bank agent such as check account balance, get statement etc."                    }                },                "required": [                    "accountId", "query"                ]                })            }        }    }]

When a user asks Nova Sonic a question such as ‘What is my account balance?’, Sonic sends a toolUse event to the client application with the specified toolName (for example, bankAgent) defined in the configuration. The application can then invoke the sub-agent hosted on AgentCore to handle the banking logic and return the response to Sonic, which in turn generates an audio reply for the user.

{  "event": {    "toolUse": {      "completionId": "UUID",      "content": "{\"accountId\":\"one two three four five\",\"query\":\"check account balance\"}",      "contentId": "UUID",      "promptName": "UUID",      "role": "TOOL",      "sessionId": "UUID",      "toolName": "bankAgent",      "toolUseId": "UUID"    }  }}

Sub-agent on AgentCore

The following sample showcases the banking sub-agent developed using the Strands Agents framework, specifically configured for deployment on Bedrock AgentCore. It leverages Nova Lite through Amazon Bedrock as its reasoning model, providing effective cognitive capabilities with minimal latency. The agent implementation features a system prompt that defines its banking assistant responsibilities, complemented by two specialized tools: one for account balance inquiries and another for bank statement retrieval.

from strands import Agent, toolimport jsonfrom bedrock_agentcore.runtime import BedrockAgentCoreAppfrom strands.models import BedrockModelimport re, argparseapp = BedrockAgentCoreApp()@tooldef get_account_balance(account_id) -> str:    """Get account balance for given account Id    Args:        account_id: Bank account Id    """    # The actual implementation will retrieve information from a database API or another backend service.        return {"result": result}@tooldef get_statement(account_id: str, year_and_month: str) -> str:    """Get account statement for a given year and month    Args:        account_id: Bank account Id        year_and_month: Year and month of the bank statement. For example: 2025_08 or August 2025    """    # The actual implementation will retrieve information from a database API or another backend service.        return {"result": result}# Specify Bedrock LLM for the Agentbedrock_model = BedrockModel(    model_id="amazon.nova-lite-v1:0",)# System promptsystem_prompt = '''You are a banking agent. You will receive requests that include:  - `account_id`  - `query` (the inquiry type, such as **balance** or **statement**, plus any additional details like month).  ## Instructions1. Use the provided `account_id` and `query` to call the tools.  2. The tool will return a JSON response.  3. Summarize the result in 2–3 sentences.     - For a **balance inquiry**, give the account balance with currency and date.     - For a **statement inquiry**, provide opening balance, closing balance, and number of transactions.  4. Do not return raw JSON. Always respond in natural language.  '''# Create an agent with tools, LLM, and system promptagent = Agent(    tools=[ get_account_balance, get_statement],     model=bedrock_model,    system_prompt=system_prompt)@app.entrypointdef banking_agent(payload):    response = agent(json.dumps(payload))    return response.message['content'][0]['text']    if __name__ == "__main__":    app.run()

Best practices for voice-based multi-agent systems

Multi-agent architecture provides exceptional flexibility and a modular design approach, allowing developers to structure voice assistants efficiently and potentially reuse existing specialized agent workflows. When implementing voice-first experiences, there are important best practices to consider that address the unique challenges of this modality.

Balance flexibility and latency:

Optimize model selection for sub-agents:

Craft voice-optimized responses:

Consider stateless vs. stateful sub-agent design

Stateless sub-agents handle each request independently, without retaining memory of past interactions or session-level states. They are simple to implement, easy to scale, and work well for straightforward, one-off tasks. However, they cannot provide context-aware responses unless external state management is introduced.

Stateful sub-agents, on the other hand, maintain memory across interactions to support context-aware responses and session-level states. This enables more personalized and cohesive user experiences, but comes with added complexity and resource requirements. They are best suited for scenarios involving multi-turn interactions and user or session-level context caching.

Conclusion

Multi-agent architectures unlock flexibility, scalability, and accuracy for complex AI-driven workflows. By combining the Nova Sonic conversational capabilities with the orchestration power of Bedrock AgentCore, you can build intelligent, specialized agents that work together seamlessly. If you’re exploring ways to enhance your AI applications, multi-agent patterns with Nova Sonic and AgentCore are a powerful approach worth testing.

Learn more about Amazon Nova Sonic by visiting the User Guide, building your application with the sample applications, and exploring the Nova Sonic workshop to get started. You can also refer to the technical report and model card for additional benchmarks.

About the authors

Lana Zhang is a Senior Specialist Solutions Architect for Generative AI at AWS within the Worldwide Specialist Organization. She specializes in AI/ML, with a focus on use cases such as AI voice assistants and multimodal understanding. She works closely with customers across diverse industries, including media and entertainment, gaming, sports, advertising, financial services, and healthcare, to help them transform their business solutions through AI.

Why multi-agent architecture?

Sample application

Integrate Nova Sonic with AgentCore through tool use events

Sub-agent on AgentCore

Best practices for voice-based multi-agent systems

Consider stateless vs. stateful sub-agent design

Conclusion

About the authors

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签