AWS Machine Learning Blog 10月04日
AWS 推出全局跨区域推理,提升 AI 应用性能
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

亚马逊云科技(AWS)推出了全局跨区域推理(CRIS)功能,允许企业通过 Anthropic 的 Claude Sonnet 4.5 模型,将 AI 推理请求智能地路由到全球支持的 AWS 商业区域。这项新功能旨在解决日益增长的生成式 AI 工作负载带来的性能、可靠性和可用性挑战。通过自动平衡流量、优化资源利用,CRIS 能够应对突发流量高峰,提升吞吐量,而无需开发者进行复杂的手动配置。它支持包括提示缓存、批量推理和知识库等多种 Amazon Bedrock 的核心功能,同时简化了监控和日志记录,并将数据安全置于首位,为企业构建更强大、更具弹性的 AI 应用提供了有力支持。

🌐 **全局智能路由,提升 AI 应用韧性**:全局跨区域推理(CRIS)通过 Anthropic Claude Sonnet 4.5 模型,将推理请求智能地分配到全球 AWS 商业区域,有效应对突发流量高峰,确保 AI 应用的持续高性能和高可用性,无需开发者预测需求波动。

⚙️ **简化的操作与管理**:CRIS 允许用户通过推理配置文件定义模型和可路由区域,实现自动化流量管理。它支持包括提示缓存、批量推理、Bedrock Guardrails 和 Knowledge Bases 在内的多项 Amazon Bedrock 功能,并且监控和日志记录仅保留在源区域,简化了运维复杂度。

🛡️ **数据安全与合规性保障**:数据在跨区域推理过程中全程加密,并保持在 AWS 安全网络内。CRIS 提供了灵活性,企业可根据数据驻留和合规性要求,选择地理区域特定的推理配置文件,以满足特定区域的数据处理需求。

🚀 **便捷的实施与配置**:使用全局 CRIS 仅需更新 API 调用中的模型 ID 为全局推理配置文件 ID。IAM 策略配置也得到简化,允许跨区域模型访问,且不会与可能阻止某些区域访问的组织服务控制策略(SCPs)冲突,同时提供了禁用 CRIS 的选项。

📈 **集中的配额管理**:对于全局 CRIS 推理配置文件,服务配额管理集中在美国东部(弗吉尼亚州北部)区域。用户可以通过该区域的服务配额控制台或 AWS CLI 来查看、管理和请求配额增加,确保全球范围内的资源使用得到统一管理。

Organizations are increasingly integrating generative AI capabilities into their applications to enhance customer experiences, streamline operations, and drive innovation. As generative AI workloads continue to grow in scale and importance, organizations face new challenges in maintaining consistent performance, reliability, and availability of their AI-powered applications. Customers are looking to scale their AI inference workloads across multiple AWS Regions to support consistent performance and reliability.

To address this need, we introduced cross-Region inference (CRIS) for Amazon Bedrock. This managed capability automatically routes inference requests across multiple Regions, enabling applications to handle traffic bursts seamlessly and achieve higher throughput without requiring developers to predict demand fluctuations or implement complex load-balancing mechanisms. CRIS works through inference profiles, which define a foundation model (FM) and the Regions to which requests can be routed.

We are excited to announce availability of global cross-Region inference with Anthropic’s Claude Sonnet 4.5 on Amazon Bedrock. Now, with cross-Region inference, you can choose either a geography-specific inference profile or a global inference profile. This evolution from geography-specific routing provides greater flexibility for organizations because Amazon Bedrock automatically selects the optimal commercial Region within that geography to process your inference request. Global CRIS further enhances cross-Region inference by enabling the routing of inference requests to supported commercial Regions worldwide, optimizing available resources and enabling higher model throughput. This helps support consistent performance and higher throughput, particularly during unplanned peak usage times. Additionally, global CRIS supports key Amazon Bedrock features, including prompt caching, batch inference, Amazon Bedrock Guardrails, Amazon Bedrock Knowledge Bases, and more.

In this post, we explore how global cross-Region inference works, the benefits it offers compared to Regional profiles, and how you can implement it in your own applications with Anthropic’s Claude Sonnet 4.5 to improve your AI applications’ performance and reliability.

Core functionality of global cross-Region inference

Global cross-Region inference helps organizations manage unplanned traffic bursts by using compute resources across different Regions. This section explores how this feature works and the technical mechanisms that power its functionality.

Understanding inference profiles

An inference profile in Amazon Bedrock defines an FM and one or more Regions to which it can route model invocation requests. The global cross-Region inference profile for Anthropic’s Claude Sonnet 4.5 extends this concept beyond geographic boundaries, allowing requests to be routed to one of the supported Amazon Bedrock commercial Regions globally, so you can prepare for unplanned traffic bursts by distributing traffic across multiple Regions.

Inference profiles operate on two key concepts:

At the time of writing, global CRIS supports over 20 source Regions, and the destination Region is a supported commercial Region dynamically chosen by Amazon Bedrock.

Intelligent request routing

Global cross-Region inference uses an intelligent request routing mechanism that considers multiple factors, including model availability, capacity, and latency, to route requests to the optimal Region. The system automatically selects the optimal available Region for your request without requiring manual configuration:

This intelligent routing system enables Amazon Bedrock to distribute traffic dynamically across the AWS global infrastructure, facilitating optimal availability for each request and smoother performance during high-usage periods.

Monitoring and logging

When using global cross-Region inference, Amazon CloudWatch and AWS CloudTrail continue to record log entries only in the source Region where the request originated. This simplifies monitoring and logging by maintaining all records in a single Region regardless of where the inference request is ultimately processed. To track which Region processed a request, CloudTrail events include an additionalEventData field with an inferenceRegion key that specifies the destination Region. Organizations can monitor and analyze the distribution of their inference requests across the AWS global infrastructure.

Data security and compliance

Global cross-Region inference maintains high standards for data security. Data transmitted during cross-Region inference is encrypted and remains within the secure AWS network. Sensitive information remains protected throughout the inference process, regardless of which Region processes the request. Because security and compliance is a shared responsibility, you must also consider legal or compliance requirements that come with processing inference request in a different geographic location. Because global cross-Region inference allows requests to be routed globally, organizations with specific data residency or compliance requirements can elect, based on their compliance needs, to use geography-specific inference profiles to make sure data remains within certain Regions. This flexibility helps businesses balance redundancy and compliance needs based on their specific requirements.

Implement global cross-Region inference

To use global cross-Region inference with Anthropic’s Claude Sonnet 4.5, developers must complete the following key steps:

Implementing global cross-Region inference with Anthropic’s Claude Sonnet 4.5 is straightforward, requiring only a few changes to your existing application code. The following is an example of how to update your code in Python:

import boto3import jsonbedrock = boto3.client('bedrock-runtime', region_name='us-east-1')model_id = "global.anthropic.claude-sonnet-4-5-20250929-v1:0"  response = bedrock.converse(    messages=[{"role": "user", "content": [{"text": "Explain cloud computing in 2 sentences."}]}],    modelId=model_id,)print("Response:", response['output']['message']['content'][0]['text'])print("Tokens used:", result.get('usage', {}))

If you’re using the Amazon Bedrock InvokeModel API, you can quickly switch to a different model by changing the model ID, as shown in Invoke model code examples.

IAM policy requirements for global CRIS

In this section, we discuss the IAM policy requirements for global CRIS.

Enable global CRIS

To enable global CRIS for your users, you must apply a three-part IAM policy to the role. The following is an example IAM policy to provide granular control. You can replace <REQUESTING REGION> in the example policy with the Region you are operating in.

{    "Version": "2012-10-17",    "Statement": [        {            "Sid": "GrantGlobalCrisInferenceProfileRegionAccess",            "Effect": "Allow",            "Action": "bedrock:InvokeModel",            "Resource": [                "arn:aws:bedrock:<REQUESTING REGION>:<ACCOUNT>:inference-profile/global.<MODEL NAME>"            ],            "Condition": {                "StringEquals": {                    "aws:RequestedRegion": "<REQUESTING REGION>"                }            }        },        {            "Sid": "GrantGlobalCrisInferenceProfileInRegionModelAccess",            "Effect": "Allow",            "Action": "bedrock:InvokeModel",            "Resource": [                "arn:aws:bedrock:<REQUESTING REGION>::foundation-model/<MODEL NAME>"            ],            "Condition": {                "StringEquals": {                    "aws:RequestedRegion": "<REQUESTING REGION>",                    "bedrock:InferenceProfileArn": "arn:aws:bedrock:<REQUESTING REGION>:<ACCOUNT>:inference-profile/global.<MODEL NAME>"                }            }        },        {            "Sid": "GrantGlobalCrisInferenceProfileGlobalModelAccess",            "Effect": "Allow",            "Action": "bedrock:InvokeModel",            "Resource": [                "arn:aws:bedrock:::foundation-model/<MODEL NAME>"            ],            "Condition": {                "StringEquals": {                    "aws:RequestedRegion": "unspecified",                    "bedrock:InferenceProfileArn": "arn:aws:bedrock:<REQUESTING REGION>:<ACCOUNT>:inference-profile/global.<MODEL NAME>"                }            }        }    ]}

The first part of the policy grants access to the Regional inference profile in your requesting Region. This policy allows users to invoke the specified global CRIS inference profile from their requesting Region. The second part of the policy provides access to the Regional FM resource, which is necessary for the service to understand which model is being requested within the Regional context. The third part of the policy grants access to the global FM resource, which enables the cross-Region routing capability that makes global CRIS function. When implementing these policies, make sure all three resource Amazon Resource Names (ARNs) are included in your IAM statements:

The global FM ARN has no Region or account specified, which is intentional and required for the cross-Region functionality.

To simplify onboarding, global CRIS doesn’t require complex changes to an organization’s existing Service Control Policies (SCPs) that might deny access to services in certain Regions. When you opt in to global CRIS using this three-part policy structure, Amazon Bedrock will process inference requests across commercial Regions without validating against Regions denied in other parts of SCPs. This prevents workload failures that could occur when global CRIS routes inference requests to new or previously unused Regions that might be blocked in your organization’s SCPs. However, if you have data residency requirements, you should carefully evaluate your use cases before implementing global CRIS, because requests might be processed in any supported commercial Region.

Disable global CRIS

You can choose from two primary approaches to implement deny policies to global CRIS for specific IAM roles, each with different use cases and implications:

When implementing deny policies, it’s crucial to understand that global CRIS changes how the aws:RequestedRegion field behaves. Traditional Region-based deny policies that use StringEquals conditions with specific Region names such as "aws:RequestedRegion": "us-west-2" will not work as expected with global CRIS because the service sets this field to global rather than the actual destination Region. However, as mentioned earlier, "aws:RequestedRegion": "unspecified" will result in the deny effect.

Note: To simplify customer onboarding, global CRIS has been designed to work without requiring complex changes to an organization’s existing SCPs that may deny access to services in certain Regions. When customers opt in to global CRIS using the three-part policy structure described above, Amazon Bedrock will process inference requests across supported AWS commercial Regions without validating against regions denied in any other parts of SCPs. This prevents workload failures that could occur when global CRIS routes inference requests to new or previously unused Regions that might be blocked in your organization’s SCPs. However, customers with data residency requirements should evaluate their use cases before implementing global CRIS, because requests may be processed in any supported commercial Regions. As a best practice, organizations who use geographic CRIS but want to opt out from global CRIS should implement the second approach.

Request limit increases for global CRIS with Anthropic’s Claude Sonnet 4.5

When using global CRIS inference profiles, it’s important to understand that service quota management is centralized in the US East (N. Virginia) Region. However, you can use global CRIS from over 20 supported source Regions. Because this will be a global limit, requests to view, manage, or increase quotas for global cross-Region inference profiles must be made through the Service Quotas console or AWS Command Line Interface (AWS CLI) specifically in the US East (N. Virginia) Region. Quotas for global CRIS inference profiles will not appear on the Service Quotas console or AWS CLI for other source Regions, even when they support global CRIS usage. This centralized quota management approach makes it possible to access your limits globally without estimating usage in individual Regions. If you don’t have access to US East (N. Virginia), reach out to your account teams or AWS support.

Complete the following steps to request a limit increase:

    Sign in to the Service Quotas console in your AWS account.
    Make sure your selected Region is US East (N. Virginia). In the navigation pane, choose AWS services. From the list of services, find and choose Amazon Bedrock. In the list of quotas for Amazon Bedrock, use the search filter to find the specific global CRIS quotas. For example:
      Global cross-Region model inference tokens per day for Anthropic Claude Sonnet 4.5 V1 Global cross-Region model inference tokens per minute for Anthropic Claude Sonnet 4.5 V1
    Select the quota you want to increase. Choose Request increase at account level.
    Enter your desired new quota value. Choose Request to submit your request.

Use global cross-Region inference with Anthropic’s Claude Sonnet 4.5

Claude Sonnet 4.5 is Anthropic’s most intelligent model (at the time of writing), and is best for coding and complex agents. Anthropic’s Claude Sonnet 4.5 demonstrates advancements in agent capabilities, with enhanced performance in tool handling, memory management, and context processing. The model shows marked improvements in code generation and analysis, including identifying optimal improvements and exercising stronger judgment in refactoring decisions. It particularly excels at autonomous long-horizon coding tasks, where it can effectively plan and execute complex software projects spanning hours or days while maintaining consistent performance and reliability throughout the development cycle.

Global cross-Region inference for Anthropic’s Claude Sonnet 4.5 delivers multiple advantages over traditional geographic cross-Region inference profiles:

If you’re currently using Anthropic’s Sonnet models on Amazon Bedrock, upgrading to Claude Sonnet 4.5 is a great opportunity to enhance your AI capabilities. It offers a significant leap in intelligence and capability, offered as a straightforward, drop-in replacement at a comparable price point as Sonnet 4. The primary reason to switch is Sonnet 4.5’s superior performance across critical, high-value domains. It is Anthropic’s most powerful model so far for building complex agents, demonstrating state-of-the-art performance in coding, reasoning, and computer use. Furthermore, its advanced agentic capabilities, such as extended autonomous operation and more effective use of parallel tool calls, enable the creation of more sophisticated AI workflows.

Conclusion

Amazon Bedrock global cross-Region inference for Anthropic’s Claude Sonnet 4.5 marks a significant evolution in AWS generative AI capabilities, enabling global routing of inference requests across the AWS worldwide infrastructure. With straightforward implementation and comprehensive monitoring through CloudTrail and CloudWatch, organizations can quickly use this powerful capability for their AI applications, high-volume workloads, and disaster recovery scenarios.We encourage you to try global cross-Region inference with Anthropic’s Claude Sonnet 4.5 in your own applications and experience the benefits firsthand. Start by updating your code to use the global inference profile ID, configure appropriate IAM permissions, and monitor your application’s performance as it uses the AWS global infrastructure to deliver enhanced resilience.

For more information about global cross-Region inference for Anthropic’s Claude Sonnet 4.5 in Amazon Bedrock, refer to Increase throughput with cross-Region inference, Supported Regions and models for inference profiles, and Use an inference profile in model invocation.


About the authors

Melanie Li, PhD, is a Senior Generative AI Specialist Solutions Architect at AWS based in Sydney, Australia, where her focus is on working with customers to build solutions using state-of-the-art AI/ML tools. She has been actively involved in multiple generative AI initiatives across APJ, harnessing the power of LLMs. Prior to joining AWS, Dr. Li held data science roles in the financial and retail industries.

Saurabh Trikande is a Senior Product Manager for Amazon Bedrock and Amazon SageMaker Inference. He is passionate about working with customers and partners, motivated by the goal of democratizing AI. He focuses on core challenges related to deploying complex AI applications, inference with multi-tenant models, cost optimizations, and making the deployment of generative AI models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch, and spending time with his family.

Derrick Choo is a Senior Solutions Architect at AWS who accelerates enterprise digital transformation through cloud adoption, AI/ML, and generative AI solutions. He specializes in full-stack development and ML, designing end-to-end solutions spanning frontend interfaces, IoT applications, data integrations, and ML models, with a particular focus on computer vision and multi-modal systems.

Satveer Khurpa is a Sr. WW Specialist Solutions Architect, Amazon Bedrock at Amazon Web Services. In this role, he uses his expertise in cloud-based architectures to develop innovative generative AI solutions for clients across diverse industries. Satveer’s deep understanding of generative AI technologies allows him to design scalable, secure, and responsible applications that unlock new business opportunities and drive tangible value.

Jared Dean is a Principal AI/ML Solutions Architect at AWS. Jared works with customers across industries to develop machine learning applications that improve efficiency. He is interested in all things AI, technology, and BBQ.

Jan Catarata is a software engineer working on Amazon Bedrock, where he focuses on designing robust distributed systems. When he’s not building scalable AI solutions, you can find him strategizing his next move with friends and family at game night.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Amazon Bedrock Generative AI Cross-Region Inference CRIS Anthropic Claude Sonnet 4.5 AWS AI Performance AI Reliability Cloud Computing
相关文章