AWS 推出全局跨区域推理，提升 AI 应用性能

Organizations are increasingly integrating generative AI capabilities into their applications to enhance customer experiences, streamline operations, and drive innovation. As generative AI workloads continue to grow in scale and importance, organizations face new challenges in maintaining consistent performance, reliability, and availability of their AI-powered applications. Customers are looking to scale their AI inference workloads across multiple AWS Regions to support consistent performance and reliability.

To address this need, we introduced cross-Region inference (CRIS) for Amazon Bedrock. This managed capability automatically routes inference requests across multiple Regions, enabling applications to handle traffic bursts seamlessly and achieve higher throughput without requiring developers to predict demand fluctuations or implement complex load-balancing mechanisms. CRIS works through inference profiles, which define a foundation model (FM) and the Regions to which requests can be routed.

We are excited to announce availability of global cross-Region inference with Anthropic’s Claude Sonnet 4.5 on Amazon Bedrock. Now, with cross-Region inference, you can choose either a geography-specific inference profile or a global inference profile. This evolution from geography-specific routing provides greater flexibility for organizations because Amazon Bedrock automatically selects the optimal commercial Region within that geography to process your inference request. Global CRIS further enhances cross-Region inference by enabling the routing of inference requests to supported commercial Regions worldwide, optimizing available resources and enabling higher model throughput. This helps support consistent performance and higher throughput, particularly during unplanned peak usage times. Additionally, global CRIS supports key Amazon Bedrock features, including prompt caching, batch inference, Amazon Bedrock Guardrails, Amazon Bedrock Knowledge Bases, and more.

In this post, we explore how global cross-Region inference works, the benefits it offers compared to Regional profiles, and how you can implement it in your own applications with Anthropic’s Claude Sonnet 4.5 to improve your AI applications’ performance and reliability.

Core functionality of global cross-Region inference

Global cross-Region inference helps organizations manage unplanned traffic bursts by using compute resources across different Regions. This section explores how this feature works and the technical mechanisms that power its functionality.

Understanding inference profiles

An inference profile in Amazon Bedrock defines an FM and one or more Regions to which it can route model invocation requests. The global cross-Region inference profile for Anthropic’s Claude Sonnet 4.5 extends this concept beyond geographic boundaries, allowing requests to be routed to one of the supported Amazon Bedrock commercial Regions globally, so you can prepare for unplanned traffic bursts by distributing traffic across multiple Regions.

Inference profiles operate on two key concepts:

Source Region

Destination Region

At the time of writing, global CRIS supports over 20 source Regions, and the destination Region is a supported commercial Region dynamically chosen by Amazon Bedrock.

Intelligent request routing

Global cross-Region inference uses an intelligent request routing mechanism that considers multiple factors, including model availability, capacity, and latency, to route requests to the optimal Region. The system automatically selects the optimal available Region for your request without requiring manual configuration:

Regional capacity

Latency considerations

Availability metrics

This intelligent routing system enables Amazon Bedrock to distribute traffic dynamically across the AWS global infrastructure, facilitating optimal availability for each request and smoother performance during high-usage periods.

Monitoring and logging

When using global cross-Region inference, Amazon CloudWatch and AWS CloudTrail continue to record log entries only in the source Region where the request originated. This simplifies monitoring and logging by maintaining all records in a single Region regardless of where the inference request is ultimately processed. To track which Region processed a request, CloudTrail events include an additionalEventData field with an inferenceRegion key that specifies the destination Region. Organizations can monitor and analyze the distribution of their inference requests across the AWS global infrastructure.

Data security and compliance

Global cross-Region inference maintains high standards for data security. Data transmitted during cross-Region inference is encrypted and remains within the secure AWS network. Sensitive information remains protected throughout the inference process, regardless of which Region processes the request. Because security and compliance is a shared responsibility, you must also consider legal or compliance requirements that come with processing inference request in a different geographic location. Because global cross-Region inference allows requests to be routed globally, organizations with specific data residency or compliance requirements can elect, based on their compliance needs, to use geography-specific inference profiles to make sure data remains within certain Regions. This flexibility helps businesses balance redundancy and compliance needs based on their specific requirements.

Implement global cross-Region inference

To use global cross-Region inference with Anthropic’s Claude Sonnet 4.5, developers must complete the following key steps:

Use the global inference profile ID

global.anthropic.claude-sonnet-4-5-20250929-v1:0

InvokeModel

Converse

Configure IAM permissions

AWS Identity and Access Management

prerequisites for inference profiles

Implementing global cross-Region inference with Anthropic’s Claude Sonnet 4.5 is straightforward, requiring only a few changes to your existing application code. The following is an example of how to update your code in Python:

import boto3import jsonbedrock = boto3.client('bedrock-runtime', region_name='us-east-1')model_id = "global.anthropic.claude-sonnet-4-5-20250929-v1:0"  response = bedrock.converse(    messages=[{"role": "user", "content": [{"text": "Explain cloud computing in 2 sentences."}]}],    modelId=model_id,)print("Response:", response['output']['message']['content'][0]['text'])print("Tokens used:", result.get('usage', {}))

If you’re using the Amazon Bedrock InvokeModel API, you can quickly switch to a different model by changing the model ID, as shown in Invoke model code examples.

IAM policy requirements for global CRIS

In this section, we discuss the IAM policy requirements for global CRIS.

Enable global CRIS

To enable global CRIS for your users, you must apply a three-part IAM policy to the role. The following is an example IAM policy to provide granular control. You can replace <REQUESTING REGION> in the example policy with the Region you are operating in.

{    "Version": "2012-10-17",    "Statement": [        {            "Sid": "GrantGlobalCrisInferenceProfileRegionAccess",            "Effect": "Allow",            "Action": "bedrock:InvokeModel",            "Resource": [                "arn:aws:bedrock:<REQUESTING REGION>:<ACCOUNT>:inference-profile/global.<MODEL NAME>"            ],            "Condition": {                "StringEquals": {                    "aws:RequestedRegion": "<REQUESTING REGION>"                }            }        },        {            "Sid": "GrantGlobalCrisInferenceProfileInRegionModelAccess",            "Effect": "Allow",            "Action": "bedrock:InvokeModel",            "Resource": [                "arn:aws:bedrock:<REQUESTING REGION>::foundation-model/<MODEL NAME>"            ],            "Condition": {                "StringEquals": {                    "aws:RequestedRegion": "<REQUESTING REGION>",                    "bedrock:InferenceProfileArn": "arn:aws:bedrock:<REQUESTING REGION>:<ACCOUNT>:inference-profile/global.<MODEL NAME>"                }            }        },        {            "Sid": "GrantGlobalCrisInferenceProfileGlobalModelAccess",            "Effect": "Allow",            "Action": "bedrock:InvokeModel",            "Resource": [                "arn:aws:bedrock:::foundation-model/<MODEL NAME>"            ],            "Condition": {                "StringEquals": {                    "aws:RequestedRegion": "unspecified",                    "bedrock:InferenceProfileArn": "arn:aws:bedrock:<REQUESTING REGION>:<ACCOUNT>:inference-profile/global.<MODEL NAME>"                }            }        }    ]}

The first part of the policy grants access to the Regional inference profile in your requesting Region. This policy allows users to invoke the specified global CRIS inference profile from their requesting Region. The second part of the policy provides access to the Regional FM resource, which is necessary for the service to understand which model is being requested within the Regional context. The third part of the policy grants access to the global FM resource, which enables the cross-Region routing capability that makes global CRIS function. When implementing these policies, make sure all three resource Amazon Resource Names (ARNs) are included in your IAM statements:

arn:aws:bedrock:REGION:ACCOUNT:inference-profile/global.MODEL-NAME

arn:aws:bedrock:REGION::foundation-model/MODEL-NAME

arn:aws:bedrock:::foundation-model/MODEL-NAME

The global FM ARN has no Region or account specified, which is intentional and required for the cross-Region functionality.

To simplify onboarding, global CRIS doesn’t require complex changes to an organization’s existing Service Control Policies (SCPs) that might deny access to services in certain Regions. When you opt in to global CRIS using this three-part policy structure, Amazon Bedrock will process inference requests across commercial Regions without validating against Regions denied in other parts of SCPs. This prevents workload failures that could occur when global CRIS routes inference requests to new or previously unused Regions that might be blocked in your organization’s SCPs. However, if you have data residency requirements, you should carefully evaluate your use cases before implementing global CRIS, because requests might be processed in any supported commercial Region.

Disable global CRIS

You can choose from two primary approaches to implement deny policies to global CRIS for specific IAM roles, each with different use cases and implications:

Remove an IAM policy

Implement a deny policy

StringEquals

"aws:RequestedRegion": "unspecified"

global

When implementing deny policies, it’s crucial to understand that global CRIS changes how the aws:RequestedRegion field behaves. Traditional Region-based deny policies that use StringEquals conditions with specific Region names such as "aws:RequestedRegion": "us-west-2" will not work as expected with global CRIS because the service sets this field to global rather than the actual destination Region. However, as mentioned earlier, "aws:RequestedRegion": "unspecified" will result in the deny effect.

Note: To simplify customer onboarding, global CRIS has been designed to work without requiring complex changes to an organization’s existing SCPs that may deny access to services in certain Regions. When customers opt in to global CRIS using the three-part policy structure described above, Amazon Bedrock will process inference requests across supported AWS commercial Regions without validating against regions denied in any other parts of SCPs. This prevents workload failures that could occur when global CRIS routes inference requests to new or previously unused Regions that might be blocked in your organization’s SCPs. However, customers with data residency requirements should evaluate their use cases before implementing global CRIS, because requests may be processed in any supported commercial Regions. As a best practice, organizations who use geographic CRIS but want to opt out from global CRIS should implement the second approach.

Request limit increases for global CRIS with Anthropic’s Claude Sonnet 4.5

When using global CRIS inference profiles, it’s important to understand that service quota management is centralized in the US East (N. Virginia) Region. However, you can use global CRIS from over 20 supported source Regions. Because this will be a global limit, requests to view, manage, or increase quotas for global cross-Region inference profiles must be made through the Service Quotas console or AWS Command Line Interface (AWS CLI) specifically in the US East (N. Virginia) Region. Quotas for global CRIS inference profiles will not appear on the Service Quotas console or AWS CLI for other source Regions, even when they support global CRIS usage. This centralized quota management approach makes it possible to access your limits globally without estimating usage in individual Regions. If you don’t have access to US East (N. Virginia), reach out to your account teams or AWS support.

Complete the following steps to request a limit increase:

US East (N. Virginia)

AWS services

Amazon Bedrock

Global cross-Region model inference tokens per day for Anthropic Claude Sonnet 4.5 V1 Global cross-Region model inference tokens per minute for Anthropic Claude Sonnet 4.5 V1

Request increase at account level

Request

Use global cross-Region inference with Anthropic’s Claude Sonnet 4.5

Claude Sonnet 4.5 is Anthropic’s most intelligent model (at the time of writing), and is best for coding and complex agents. Anthropic’s Claude Sonnet 4.5 demonstrates advancements in agent capabilities, with enhanced performance in tool handling, memory management, and context processing. The model shows marked improvements in code generation and analysis, including identifying optimal improvements and exercising stronger judgment in refactoring decisions. It particularly excels at autonomous long-horizon coding tasks, where it can effectively plan and execute complex software projects spanning hours or days while maintaining consistent performance and reliability throughout the development cycle.

Global cross-Region inference for Anthropic’s Claude Sonnet 4.5 delivers multiple advantages over traditional geographic cross-Region inference profiles:

Enhanced throughput during peak demand

Cost-efficiency

offers approximately 10% savings on both input and output token pricing compared to geographic cross-Region inference

Streamlined monitoring

On-demand quota flexibility

If you’re currently using Anthropic’s Sonnet models on Amazon Bedrock, upgrading to Claude Sonnet 4.5 is a great opportunity to enhance your AI capabilities. It offers a significant leap in intelligence and capability, offered as a straightforward, drop-in replacement at a comparable price point as Sonnet 4. The primary reason to switch is Sonnet 4.5’s superior performance across critical, high-value domains. It is Anthropic’s most powerful model so far for building complex agents, demonstrating state-of-the-art performance in coding, reasoning, and computer use. Furthermore, its advanced agentic capabilities, such as extended autonomous operation and more effective use of parallel tool calls, enable the creation of more sophisticated AI workflows.

Conclusion

Amazon Bedrock global cross-Region inference for Anthropic’s Claude Sonnet 4.5 marks a significant evolution in AWS generative AI capabilities, enabling global routing of inference requests across the AWS worldwide infrastructure. With straightforward implementation and comprehensive monitoring through CloudTrail and CloudWatch, organizations can quickly use this powerful capability for their AI applications, high-volume workloads, and disaster recovery scenarios.We encourage you to try global cross-Region inference with Anthropic’s Claude Sonnet 4.5 in your own applications and experience the benefits firsthand. Start by updating your code to use the global inference profile ID, configure appropriate IAM permissions, and monitor your application’s performance as it uses the AWS global infrastructure to deliver enhanced resilience.

For more information about global cross-Region inference for Anthropic’s Claude Sonnet 4.5 in Amazon Bedrock, refer to Increase throughput with cross-Region inference, Supported Regions and models for inference profiles, and Use an inference profile in model invocation.

About the authors

Melanie Li, PhD, is a Senior Generative AI Specialist Solutions Architect at AWS based in Sydney, Australia, where her focus is on working with customers to build solutions using state-of-the-art AI/ML tools. She has been actively involved in multiple generative AI initiatives across APJ, harnessing the power of LLMs. Prior to joining AWS, Dr. Li held data science roles in the financial and retail industries.

Saurabh Trikande is a Senior Product Manager for Amazon Bedrock and Amazon SageMaker Inference. He is passionate about working with customers and partners, motivated by the goal of democratizing AI. He focuses on core challenges related to deploying complex AI applications, inference with multi-tenant models, cost optimizations, and making the deployment of generative AI models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch, and spending time with his family.

Derrick Choo is a Senior Solutions Architect at AWS who accelerates enterprise digital transformation through cloud adoption, AI/ML, and generative AI solutions. He specializes in full-stack development and ML, designing end-to-end solutions spanning frontend interfaces, IoT applications, data integrations, and ML models, with a particular focus on computer vision and multi-modal systems.

Satveer Khurpa is a Sr. WW Specialist Solutions Architect, Amazon Bedrock at Amazon Web Services. In this role, he uses his expertise in cloud-based architectures to develop innovative generative AI solutions for clients across diverse industries. Satveer’s deep understanding of generative AI technologies allows him to design scalable, secure, and responsible applications that unlock new business opportunities and drive tangible value.

Jared Dean is a Principal AI/ML Solutions Architect at AWS. Jared works with customers across industries to develop machine learning applications that improve efficiency. He is interested in all things AI, technology, and BBQ.

Jan Catarata is a software engineer working on Amazon Bedrock, where he focuses on designing robust distributed systems. When he’s not building scalable AI solutions, you can find him strategizing his next move with friends and family at game night.