AWS Machine Learning Blog 前天 03:03
Amazon Bedrock 成本管理进阶:精细化追踪与报告
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文深入探讨了 Amazon Bedrock 的高级成本监控策略,重点介绍了如何通过精细化的自定义标签实现准确的成本分配,并构建全面的报告机制。文章详细阐述了调用级别标签的应用,以增强可追溯性,并展示了如何演进 API 输入结构以支持自定义标签。此外,文章还介绍了验证和标签处理步骤,以及如何利用 CloudWatch 指标进行详细的日志记录和分析。最后,文章强调了 Amazon Bedrock 新推出的应用推理配置文件功能,该功能允许用户为按需基础模型使用应用自定义成本分配标签,从而与 AWS 成本管理工具(如 Cost Explorer)集成,实现更精细化的成本分析和预算控制。

💡 **调用级别标签增强可追溯性**:通过为每次 API 请求附加丰富的元数据,在 Amazon CloudWatch 日志中创建全面的审计跟踪。这对于调查预算相关决策、分析速率限制影响或理解跨不同应用程序和团队的使用模式至关重要。API 输入结构已更新,支持可选的模型特定配置和自定义标签,包括必填的 `applicationId` 以及可选的 `costCenter` 和 `environment`。

🚀 **精细化成本分配与监控**:引入了应用推理配置文件功能,允许为按需基础模型使用自定义成本分配标签,以追踪不同业务单元和应用程序的成本。这些标签与 AWS Cost Explorer、AWS Budgets 和 AWS Cost Anomaly Detection 等工具集成,支持详细的成本分析和预算控制。用户可以通过 AWS CLI 或 AWS API 创建自定义推理配置文件,并将其与模型和所需标签(如 `costCenter`、`environment`)关联。

📊 **强大的日志记录与分析能力**:利用 CloudWatch 指标配合自定义标签和维度,可以跨模型类型、成本中心、应用程序和环境等多个维度追踪详细指标。通过生成自定义标签、存储指标数据并进行分析,可以全面了解 AI 服务的使用情况。自定义指标数据存储在 `GenAIRateLimiting` 命名空间,包含 `TotalRequests`、`RateLimitApproved`、`RateLimitDenied`、`InvocationFailed`、`InputTokens` 和 `OutputTokens` 等关键指标,并支持按模型、成本中心等维度进行分析。

In Part 1 of our series, we introduced a proactive cost management solution for Amazon Bedrock, featuring a robust cost sentry mechanism designed to enforce real-time token usage limits. We explored the core architecture, token tracking strategies, and initial budget enforcement techniques that help organizations control their generative AI expenses.

Building upon that foundation, this post explores advanced cost monitoring strategies for generative AI deployments. We introduce granular custom tagging approaches for precise cost allocation, and develop comprehensive reporting mechanisms.

Solution overview

The cost sentry solution introduced in Part 1 was developed as a centralized mechanism to proactively limit generative AI usage to adhere to prescribed budgets. The following diagram illustrates the core components of the solution, adding in cost monitoring through AWS Billing and Cost Management.

Invocation-level tagging for enhanced traceability

Invocation-level tagging extends our solution’s capabilities by attaching rich metadata to every API request, creating a comprehensive audit trail within Amazon CloudWatch logs. This becomes particularly valuable when investigating budget-related decisions, analyzing rate-limiting impacts, or understanding usage patterns across different applications and teams. To support this, the main AWS Step Functions workflow was updated, as illustrated in the following figure.

Enhanced API input

We also evolved the API input to support custom tagging. The new input structure introduces optional parameters for model-specific configurations and custom tagging:

{  "model": "string",     // e.g., "claude-3" or "anthropic.claude-3-sonnet-20240229-v1:0"  "prompt": {    "messages": [      {        "role": "string",    // "system", "user", or "assistant"        "content": "string"      }    ],    "parameters": {      "max_tokens": number,    // Optional, model-specific defaults      "temperature": number,   // Optional, model-specific defaults      "top_p": number,         // Optional, model-specific defaults      "top_k": number          // Optional, model-specific defaults    }  },  "tags": {    "applicationId": "string",  // Required    "costCenter": "string",     // Optional    "environment": "string"     // Optional - dev/staging/prod  }}

The input structure comprises three key components:

In this example, we use different cost centers for sales, services, and support to simulate the use of a business attribute to track usage and spend for inference in Amazon Bedrock. For example:

{  "model": "claude-3-5-haiku",  "prompt": {    "messages": [      {        "role": "user",        "content": "Explain the benefits of using S3 using only 100 words."      },      {        "role": "assistant",        "content": "You are a helpful AWS expert."      }    ],    "parameters": {      "max_tokens": 2000,      "temperature": 0.7,      "top_p": 0.9,      "top_k": 50    }  },  "tags": {    "applicationId": "aws-documentation-helper",    "costCenter": "support",    "environment": "production"  }}

Validation and tagging

A new validation step was added to the workflow for tagging. This step uses an AWS Lambda function to add validation checks and maps the model requested to the specific model ID in Amazon Bedrock. It supplements the tags object with tags that will be required for downstream analysis.

The following code is an example of a simple map to get the appropriate model ID from the model specified:

MODEL_ID_MAPPING = {    "nova-lite": "amazon.nova-lite-v1:0",    "nova-micro": "amazon.nova-micro-v1:0",    "claude-2": "anthropic.claude-v2:0",    "claude-3-haiku": "anthropic.claude-3-haiku-20240307-v1:0",    "claude-3-5-sonnet-v2": "us.anthropic.claude-3-5-sonnet-20241022-v2:0",    "claude-3-5-haiku": "us.anthropic.claude-3-5-haiku-20241022-v1:0"}

Logging and analysis

By using CloudWatch metrics with custom-generated tags and dimensions, you can track detailed metrics across multiple dimensions such as model type, cost center, application, and environment. Custom tags and dimensions show how teams use AI services. To see this analysis, steps were implemented to generate custom tags, store metric data, and analyze metric data:

    We include a unique set of tags that capture contextual information. This can include user-supplied tags as well as ones that are dynamically generated, such as requestId and timestamp:
      "tags": {    "requestId": "ded98994-eb76-48d9-9dbc-f269541b5e49",    "timestamp": "2025-01-31T14:05:26.854682",    "applicationId": "aws-documentation-helper",    "costCenter": "support",    "environment": "production"}
    As each workflow is executed, the limit for each model will be evaluated to make sure the request is within budgetary guidelines. The workflow will end based on three possible outcomes:
      Rate limit approved and invocation successful Rate limit approved and invocation unsuccessful Rate limit denied

    The custom metric data is saved in CloudWatch in the GenAIRateLimiting namespace. This namespace includes the following key metrics:

      TotalRequests – Counts every invocation attempt regardless of outcome RateLimitApproved – Tracks requests that passed rate limiting checks RateLimitDenied – Tracks requests blocked by rate limiting InvocationFailed – Counts requests that failed during model invocation InputTokens – Measures input token consumption for successful requests OutputTokens – Measures output token consumption for successful requests

    Each metric includes dimensions for Model, ModelId, CostCenter, Application, and Environment for data analysis.

    We use CloudWatch metrics query capabilities with math expressions to analyze the data collected by the workflow. The data can be displayed in a variety of visual formats to get a granular view of requests by the dimensions provided, such as model or cost center. The following screenshot shows an example dashboard that displays invocation metrics where one model has reached its limit.

Additional Amazon Bedrock analytics

In addition to the custom metrics dashboard, CloudWatch provides automatic dashboards for monitoring Amazon Bedrock performance and usage. The Bedrock dashboard offers visibility into key performance metrics and operational insights, as shown in the following screenshot.

Cost tagging and reporting

Amazon Bedrock has introduced application inference profiles, a new capability that organizations can use to apply custom cost allocation tags to track and manage their on-demand foundation model (FM) usage. This feature addresses a previous limitation where tagging wasn’t possible for on-demand FMs, making it difficult to track costs across different business units and applications. You can now create custom inference profiles for base FMs and apply cost allocation tags like department, team, and application identifiers. These tags integrate with AWS cost management tools including AWS Cost Explorer, AWS Budgets, and AWS Cost Anomaly Detection, enabling detailed cost analysis and budget control.

Application inference profiles

To start, you must create application inference profiles for each type of usage you want to track. In this case, the solution defines custom tags for costCenter, environment, and applicationId. An inference profile will also be based on an existing Amazon Bedrock model profile, so you must combine the desired tags and model into the profile. At the time of writing, you must use the AWS Command Line Interface (AWS CLI) or AWS API to create one. See the following example code:

aws bedrock create-inference-profile \  --inference-profile-name "aws-docs-sales-prod" \  --model-source '{"copyFrom":  "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-haiku-20240307-v1:0"}' \  --tags '[    {"key": "applicationId", "value": "aws-documentation-helper"},    {"key": "costCenter", "value": "sales"},    {"key": "environment", "value": "production"}  ]'

This command creates a profile for the sales cost center and production environment using Anthropic’s Claude Haiku 3.5 model. The output from this command is an Amazon Resource Name (ARN) that you will use as the model ID. In this solution, the ValidateAndSetContext Lambda function was modified to allow for specifying the model by cost center (for example, sales). To see which profiles you created, use the following command:

aws bedrock list-inference-profiles --type-equals APPLICATION

After the profiles have been created and the validation has been updated to map cost centers to the profile ARNs, the workflow will start running inference requests with the aligned profile. For example, when the user submits a request, they will specify the model as sales, services, or support to align with the three cost centers defined. The following code is a similar map to the previous example:

MODEL_ID_MAPPING = {    "sales": "arn:aws:bedrock:<region>:<account>:application-inference-profile/<unique id1>",    "services": "arn:aws:bedrock:<region>:<account>:application-inference-profile/<unique id2>",    "support": "arn:aws:bedrock:<region>:<account>:application-inference-profile/<unique id3>"   }

To query CloudWatch metrics for the model usage correctly when using application inference profiles, you must specify the unique ID for the profile (the last part of the ARN). CloudWatch will store metrics like token usage based on the unique ID. To support both profile and direct model usage, the Lambda function was modified to add a new tag for modelMetric to be the appropriate term to use to query for token usage. See the following code:

  "tags": {    "requestId": "ded98994-eb76-48d9-9dbc-f269541b5e49",    "timestamp": "2025-01-31T14:05:26.854682",    "applicationId": "aws-documentation-helper",    "costCenter": "support",    "environment": "production",        "modelMetric": "<unique id> | <model id>"  }

Cost Explorer

Cost Explorer is a powerful cost management tool that provides comprehensive visualization and analysis of your cloud spending across AWS services, including Amazon Bedrock. It offers intuitive dashboards to track historical costs, forecast future expenses, and gain insights into your cloud consumption. With Cost Explorer, you can break down expenses by service, tags, and custom dimensions, for detailed financial analysis. The tool updates on a daily basis.

When you use application inference profiles with Amazon Bedrock, your AI service usage is automatically tagged and flows directly into Billing and Cost Management. These tags enable detailed cost tracking across different dimensions like cost center, application, and environment. This means you can generate reports that break down Amazon Bedrock AI expenses by specific business units, projects, or organizational hierarchies, providing clear visibility into your generative AI spending.

Cost allocation tags

Cost allocation tags are key-value pairs that help you categorize and track AWS resource costs across your organization. In the context of Amazon Bedrock, these tags can include attributes like application name, cost center, environment, or project ID. To activate a cost allocation tag, you must first enable it on the Billing and Cost Management console. After they’re activated, these tags will appear in your AWS Cost and Usage Report (CUR), helping you break down Amazon Bedrock expenses with granular detail.

To activate a cost allocation tag, complete the following steps:

    On the Billing and Cost Management console, in the navigation pane, choose Cost Allocation Tags. Locate your tag (for this example, it’s named costCenter) and choose Activate. Confirm the activation.

After activation, the costCenter tag will appear in your CUR and will be used in Cost Explorer. It might take 24 hours for the tag to become fully active in your billing reports.

Cost Explorer reporting

To create an Amazon Bedrock usage report in Cost Explorer based on your tag, complete the following steps:

    On the Billing and Cost Management console, choose Cost Explorer in the navigation pane. Set your desired date range (relative time range or custom period). Select Daily or Monthly granularity. On the Group by dropdown menu, choose Tag. Choose costCenter as the tag key. Review the displayed Amazon Bedrock costs broken down by each unique cost center value. Optionally, filter the values by applying a filter in the Filters section:
      Choose Tag filter. Choose the costCenter tag. Choose specific cost center values you want to analyze.

The resulting report will provide a detailed view of Amazon Bedrock AI service expenses, helping you compare spending across different organizational units or projects with precision.

Summary

The AWS Cost and Usage Reports (including budgets) act as trailing edge indicators because they show what you’ve already spent on Amazon Bedrock after the fact. By blending real-time alerts from Step Functions with comprehensive cost reports, you can get a 360-degree view of your Amazon Bedrock usage. This reporting can alert you before you overspend and help you understand your actual consumption. This approach gives you the power to manage AI resources proactively, keeping your innovation budget on track and your projects running smoothly.

Try out this cost management approach for your own use case, and share your feedback in the comments.


About the Author

Jason Salcido is a Startups Senior Solutions Architect with nearly 30 years of experience pioneering innovative solutions for organizations from startups to enterprises. His expertise spans cloud architecture, serverless computing, machine learning, generative AI, and distributed systems. Jason combines deep technical knowledge with a forward-thinking approach to design scalable solutions that drive value, while translating complex concepts into actionable strategies.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Amazon Bedrock 成本管理 AI成本 AWS Cost Explorer 自定义标签 CloudWatch generative AI cost allocation custom tagging CloudWatch metrics
相关文章