AWS Machine Learning Blog 09月19日
Amazon Bedrock 批量推理:优化成本与性能
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Amazon Bedrock 批量推理服务现已推出多项重要更新,旨在满足组织在扩展生成式 AI 应用时对成本效益高、可预测性能的大规模数据处理需求。新功能包括支持更多模型家族(如 Anthropic Claude Sonnet 4 和 OpenAI OSS 模型),性能增强以提高吞吐量,以及增强的作业监控能力,允许用户通过 Amazon CloudWatch 轻松追踪批量作业进度。该服务特别适用于历史数据分析、大规模文本摘要和后台处理等非实时性工作负载,可实现比按需推理低 50% 的成本。

💡 **成本效益与性能优化**:Amazon Bedrock 批量推理服务通过大规模数据批量处理,实现了比按需推理低 50% 的成本,并提供可预测的性能。这对于需要处理海量数据集而非实时响应的生成式 AI 工作负载至关重要,如历史数据分析、大规模文本摘要和后台处理。

🚀 **增强的模型支持与性能**:服务现已扩展支持更多模型家族,包括 Anthropic 的 Claude Sonnet 4 和 OpenAI OSS 模型。针对最新的 Anthropic Claude 和 OpenAI GPT OSS 模型进行了性能优化,带来了更高的批量吞吐量,使得大规模工作负载的处理速度更快。

📈 **简化的作业监控与管理**:用户现在可以直接在 Amazon CloudWatch 中监控批量作业的进度,无需自行构建复杂的监控解决方案。该功能提供了 AWS 账户级别的可见性,包括待处理记录数、每分钟处理的输入/输出令牌数等关键指标,极大地简化了大规模工作负载的管理和操作效率。

🎯 **典型应用场景广泛**:Amazon Bedrock 批量推理适用于多种场景,包括不要求实时响应的周期性数据处理(如每日新闻摘要)、历史数据分析(如客服记录)、知识库丰富(如生成嵌入、摘要)、大规模内容转换(如情感分析)以及实验和合规性检查等。

🛠️ **集成 CloudWatch 实现自动化**:通过集成 Amazon CloudWatch 指标、警报和仪表板,用户可以实现批量推理作业的自动化监控与管理。例如,设置 CloudWatch 警报以在处理速率超出阈值时发送通知,或构建仪表板以进行集中的操作监控和故障排除,从而最大化效率和价值。

As organizations scale their use of generative AI, many workloads require cost-efficient, bulk processing rather than real-time responses. Amazon Bedrock batch inference addresses this need by enabling large datasets to be processed in bulk with predictable performance—at 50% lower cost than on-demand inference. This makes it ideal for tasks such as historical data analysis, large-scale text summarization, and background processing workloads.

In this post, we explore how to monitor and manage Amazon Bedrock batch inference jobs using Amazon CloudWatch metrics, alarms, and dashboards to optimize performance, cost, and operational efficiency.

New features in Amazon Bedrock batch inference

Batch inference in Amazon Bedrock is constantly evolving, and recent updates bring significant enhancements to performance, flexibility, and cost transparency:

Use cases for batch inference

AWS recommends using batch inference in the following use cases:

Launch an Amazon Bedrock batch inference job

You can start a batch inference job in Amazon Bedrock using the AWS Management Console, AWS SDKs, or AWS Command Line Interface (AWS CLI). For detailed instructions, see Create a batch inference job.

To use the console, complete the following steps:

    On the Amazon Bedrock console, choose Batch inference under Infer in the navigation pane. Choose Create batch inference job. For Job name, enter a name for your job. For Model, choose the model to use. For Input data, enter the location of the Amazon Simple Storage Service (Amazon S3) input bucket (JSONL format). For Output data, enter the S3 location of the output bucket. For Service access, select your method to authorize Amazon Bedrock. Choose Create batch inference job.

Monitor batch inference with CloudWatch metrics

Amazon Bedrock now automatically publishes metrics for batch inference jobs under the AWS/Bedrock/Batch namespace. You can track batch workload progress at the AWS account level with the following CloudWatch metrics. For current Amazon Bedrock models, these metrics include records pending processing, input and output tokens processed per minute, and for Anthropic Claude models, they also include tokens pending processing.

The following metrics can be monitored by modelId:

To view these metrics using the CloudWatch console, complete the following steps:

    On the CloudWatch console, choose Metrics in the navigation pane. Filter metrics by AWS/Bedrock/Batch. Select your modelId to view detailed metrics for your batch job.

To learn more about how to use CloudWatch to monitor metrics, refer to Query your CloudWatch metrics with CloudWatch Metrics Insights.

Best practices for monitoring and managing batch inference

Consider the following best practices for monitoring and managing your batch inference jobs:

Example of CloudWatch metrics

In this section, we demonstrate how you can use CloudWatch metrics to set up proactive alerts and automation.

For example, you can create a CloudWatch alarm that sends an Amazon Simple Notification Service (Amazon SNS) notification when the average NumberOfInputTokensProcessedPerMinute exceeds 1 million within a 6-hour period. This alert could prompt an Ops team review or trigger downstream data pipelines.

The following screenshot shows that the alert has In alarm status because the batch inference job met the threshold. The alarm will trigger the target action, in our case an SNS notification email to the Ops team.

The following screenshot shows an example of the email the Ops team received, notifying them that the number of processed tokens exceeded their threshold.

You can also build a CloudWatch dashboard displaying the relevant metrics. This is ideal for centralized operational monitoring and troubleshooting.

Conclusion

Amazon Bedrock batch inference now offers expanded model support, improved performance, deeper visibility into the progress of your batch workloads, and enhanced cost monitoring.

Get started today by launching an Amazon Bedrock batch inference job, setting up CloudWatch alarms, and building a monitoring dashboard, so you can maximize efficiency and value from your generative AI workloads.


About the authors

Vamsi Thilak Gudi is a Solutions Architect at Amazon Web Services (AWS) in Austin, Texas, helping Public Sector customers build effective cloud solutions. He brings diverse technical experience to show customers what’s possible with AWS technologies. He actively contributes to the AWS Technical Field Community for Generative AI.

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves traveling, working out, and exploring new things.

Avish Khosla is a software developer on Bedrock’s Batch Inference team, where the team build reliable, scalable systems to run large-scale inference workloads on generative AI models. he care about clean architecture and great docs. When he is not shipping code, he is on a badminton court or glued to a good cricket match.

Chintan Vyas serves as a Principal Product Manager–Technical at Amazon Web Services (AWS), where he focuses on Amazon Bedrock services. With over a decade of experience in Software Engineering and Product Management, he specializes in building and scaling large-scale, secure, and high-performance Generative AI services. In his current role, he leads the enhancement of programmatic interfaces for Amazon Bedrock. Throughout his tenure at AWS, he has successfully driven Product Management initiatives across multiple strategic services, including Service Quotas, Resource Management, Tagging, Amazon Personalize, Amazon Bedrock, and more. Outside of work, Chintan is passionate about mentoring emerging Product Managers and enjoys exploring the scenic mountain ranges of the Pacific Northwest.

Mayank Parashar is a Software Development Manager for Amazon Bedrock services.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Amazon Bedrock 批量推理 生成式AI 成本优化 性能监控 CloudWatch Amazon Bedrock batch inference generative AI cost optimization performance monitoring CloudWatch
相关文章