AWS Machine Learning Blog 08月12日
Demystifying Amazon Bedrock Pricing for a Chatbot Assistant
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文详细介绍了如何估算在Amazon Bedrock上运行聊天机器人的成本。通过一个客户服务聊天机器人的实际案例,文章分解了关键成本组成部分,包括数据源、嵌入(embeddings)、Token使用量(输入、输出、上下文窗口)以及模型选择。文章还讲解了容量规划,考虑了查询量、响应长度和并发用户数,并提供了使用不同基础模型(如Anthropic Claude、Amazon Nova、Meta Llama)的按需定价模型下的成本估算示例。最终目标是帮助用户理解并预测其Amazon Bedrock应用的成本,做出明智的预算和技术选型决策。

💰 **成本构成解析**:Amazon Bedrock的成本主要由模型推理和定制化组成。对于推理,主要有两种定价模式:按需(pay-as-you-go)和预置吞吐量(provisioned throughput)。理解Token(输入、输出、上下文窗口)、嵌入(embeddings)的生成和存储,以及向量数据库的成本是精确估算的关键。

📊 **容量规划要点**:估算Amazon Bedrock成本时,需考虑知识库大小(文档数量、长度、分块)、用户查询量及复杂度、模型响应长度以及系统并发用户数。例如,一个拥有10,000个文档、平均每篇500 Token的知识库,经过分块后可能产生500万Token,并需要约5万个嵌入,这些都会直接影响初始设置和持续的运营成本。

💡 **模型选择与成本效益**:Amazon Bedrock提供多种基础模型,如Anthropic Claude、Amazon Nova、Meta Llama等。不同模型在性能和价格(每Token成本)上存在差异。用户应根据具体用例的需求,权衡模型的自然语言理解(NLU)和生成(NLG)能力与成本效益,选择最适合的解决方案,例如,对比Claude 3 Haiku的低成本与Meta Llama的性价比。

📈 **实例成本估算**:文章通过一个中型呼叫中心的聊天机器人场景,演示了如何应用按需定价公式进行成本计算。以10,000次查询、平均100 Token响应为例,计算了使用不同模型(如Claude 4 Sonnet、Amazon Nova Pro、Llama 4 Maverick)的月度推理成本,并叠加了嵌入成本,为用户提供了具体的费用参考。

🛠️ **成本优化建议**:在规划Bedrock实施时,建议先评估知识库和查询量,区分一次性成本(如初始嵌入)和运营成本。同时,务必比较不同模型的性能与价格,并根据并发需求选择合适的定价模式。最终决策应综合考虑成本、性能和特定业务需求,而非仅仅追求最低价格。

“How much will it cost to run our chatbot on Amazon Bedrock?” This is one of the most frequent questions we hear from customers exploring AI solutions. And it’s no wonder — calculating costs for AI applications can feel like navigating a complex maze of tokens, embeddings, and various pricing models. Whether you’re a solution architect, technical leader, or business decision-maker, understanding these costs is crucial for project planning and budgeting. In this post, we’ll look at Amazon Bedrock pricing through the lens of a practical, real-world example: building a customer service chatbot. We’ll break down the essential cost components, walk through capacity planning for a mid-sized call center implementation, and provide detailed pricing calculations across different foundation models. By the end of this post, you’ll have a clear framework for estimating your own Amazon Bedrock implementation costs and understanding the key factors that influence them.

For those that aren’t familiar, Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Amazon Bedrock provides a comprehensive toolkit for powering AI applications, including pre-trained large language models (LLMs), Retrieval Augmented Generation (RAG) capabilities, and seamless integration with existing knowledge bases. This powerful combination enables the creation of chatbots that can understand and respond to customer queries with high accuracy and contextual relevance.

Solution overview

For this example, our Amazon Bedrock chatbot will use a curated set of data sources and use Retrieval-Augmented Generation (RAG) to retrieve relevant information in real time. With RAG, our output from the chatbot will be enriched with contextual information from our data sources, giving our users a better customer experience. When understanding Amazon Bedrock pricing, it’s crucial to familiarize yourself with several key terms that significantly influence the expected cost. These components not only form the foundation of how your chatbot functions but also directly impact your pricing calculations. Let’s explore these key components. Key Components

The figure below demonstrates the architecture of a fully managed RAG solution on AWS.

Estimating Pricing

One of the most challenging aspects of implementing an AI solution is accurately predicting your capacity needs. Without proper capacity estimation, you might either over-provision (leading to unnecessary costs) or under-provision (resulting in performance issues). Let’s walk through how to approach this crucial planning step for a real-world scenario. Before we dive into the numbers, let’s understand the key factors that affect your capacity and costs:

To make this concrete, let’s examine a typical call center implementation. Imagine you’re planning to deploy a customer service chatbot for a mid-sized organization handling product inquiries and support requests. Here’s how we’d break down the capacity planning: First, consider your knowledge base. In our scenario, we’re working with 10,000 support documents, each averaging 500 tokens in length. These documents need to be chunked into smaller pieces for effective retrieval, with each document typically splitting into 5 chunks. This gives us a total of 5 million tokens for our knowledge base. For the embedding process, those 10,000 documents will generate approximately 50,000 embeddings when we account for chunking and overlapping content. This is important because embeddings affect both your initial setup costs and ongoing storage needs.

Now, let’s look at the operational requirements. Based on typical call center volumes, we’re planning for:

When we aggregate these numbers, our monthly capacity requirements shape up to:

Understanding these numbers is crucial because they directly impact your costs in several ways:

This gives us a solid foundation for our cost calculations, which we’ll explore in detail in the next section.

Calculating total cost of ownership (TCO)

Amazon Bedrock offers flexible pricing modes. With Amazon Bedrock, you are charged for model inference and customization. You have a choice of two pricing plans for inference: 1. On-Demand and Batch: This mode allows you to use FMs on a pay-as-you-go basis without having to make time-based term commitments. 2. Provisioned Throughput: This mode allows you to provision sufficient throughput to meet your application’s performance requirements in exchange for a time-based term commitment.

To calculate the TCO for this scenario as one-time cost we’ll consider the foundation model, the volume of data in the knowledge base, the estimated number of queries and responses, and the concurrency level mentioned above. For this scenario we’ll be using an on-demand pricing model and showing how the pricing would be for some of the foundation models available on Amazon Bedrock.

The On-Demand Pricing formula will be:

The cost of this setup will be the sum of cost of LLM inferences and cost of vector store. To estimate cost of inferences, you can obtain the number of input tokens, context size and output tokens in the response metadata returned by the LLM. Total Cost Incurred = ((Input Tokens + Context Size) * Price per 1000 Input Tokens + Output tokens * Price per 1000 Output Tokens) + Embeddings. For input tokens we will be adding an additional context size of about 150 tokens for User Queries. Therefore as per our assumption of 10,000 User Queries, the total Context Size will be 1,500,000 tokens.

The following is a comparison of estimated monthly costs for various models on Amazon Bedrock based on our example use case using the on-demand pricing formula:

Embeddings Cost:

For text embeddings on Amazon Bedrock, we can choose from Amazon Titan Embeddings V2 model or Cohere Embeddings Model. In this example we are calculating a one-time cost for the embeddings.

The usual cost of vector stores has 2 components: size of vector data + number of requests to the store. You can choose whether to let the Amazon Bedrock console set up a vector store in Amazon OpenSearch Serverless for you or to use one that you have created in a supported service and configured with the appropriate fields. If you’re using OpenSearch Serverless as part of your setup, you’ll need to consider its costs. Pricing details can be found here: OpenSearch Service Pricing .

Here using the On-Demand pricing formula, the overall cost is calculated using some foundation models (FMs) available on Amazon Bedrock and the Embeddings cost.

Anthropic Claude:

Amazon Nova:

Meta Llama:

Evaluate models not just on their natural language understanding (NLU) and generation (NLG) capabilities, but also on their price-per-token ratios for both input and output processing. Consider whether premium models with higher per-token costs deliver proportional value for your specific use case, or if more cost-effective alternatives like Amazon Nova Lite or Meta Llama models can meet your performance requirements at a fraction of the cost.

Conclusion

Understanding and estimating Amazon Bedrock costs doesn’t have to be overwhelming. As we’ve demonstrated through our customer service chatbot example, breaking down the pricing into its core components – token usage, embeddings, and model selection – makes it manageable and predictable.

Key takeaways for planning your Bedrock implementation costs:

By following this systematic approach to cost estimation, you can confidently plan your Amazon Bedrock implementation and choose the most cost-effective configuration for your specific use case. Remember that the cheapest option isn’t always the best – consider the balance between cost, performance, and your specific requirements when making your final decision.

Getting Started with Amazon Bedrock

With Amazon Bedrock, you have the flexibility to choose the most suitable model and pricing structure for your use case. We encourage you to explore the AWS Pricing Calculator for more detailed cost estimates based on your specific requirements.

To learn more about building and optimizing chatbots with Amazon Bedrock, check out the workshop Building with Amazon Bedrock.

We’d love to hear about your experiences building chatbots with Amazon Bedrock. Share your success stories or challenges in the comments!


About the authors

Srividhya Pallay is a Solutions Architect II at Amazon Web Services (AWS) based in Seattle, where she supports small and medium-sized businesses (SMBs) and specializes in Generative Artificial Intelligence and Games. Srividhya holds a Bachelor’s degree in Computational Data Science from Michigan State University College of Engineering, with a minor in Computer Science and Entrepreneurship. She holds 6 AWS Certifications.

Prerna Mishra is a Solutions Architect at Amazon Web Services(AWS) supporting Enterprise ISV customers. She specializes in Generative AI and MLOPs as part of Machine Learning and Artificial Intelligence community. She graduated from New York University in 2022 with a Master’s degree in Data Science and Information Systems.

Brian Clark is a Solutions Architect at Amazon Web Services (AWS) supporting Enterprise customers in the financial services vertical. He is a part of the Machine Learning and Artificial Intelligence community and specializes in Generative AI and Agentic workflows. Brian has over 14 years of experience working in technology and holds 8 AWS certifications.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Amazon Bedrock AI成本 聊天机器人 成本估算 基础模型
相关文章