philschmid RSS feed 09月30日
部署生成式AI模型成本解析
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

部署生成式AI模型的总拥有成本(TCO)远超仅考虑计算资源。除了GPU等硬件,还需容器服务(如Kubernetes)、网络DNS、负载均衡、自动扩展、日志与遥测、CI等。隐藏成本包括开发维护时间、专业人才费用,自建方案初期看似成本低,但长期因缺乏专注和创新可能反超。对比单A100 80GB的托管服务与自建方案,自建总成本仍高27%,若计入额外工程师成本则差距更大。明智决策需全面评估非货币因素。

💡部署生成式AI模型不仅涉及GPU等计算资源,还需容器服务(如Kubernetes)、网络DNS、负载均衡、自动扩展、日志与遥测、CI等基础设施支持,这些构成了总拥有成本(TCO)的重要组成部分。

⚠️自建解决方案初期看似因计算成本低而具有优势,但开发维护时间、专业人才(数据科学家、机器学习工程师、DevOps)费用等隐藏成本会逐渐抵消这种优势,长期来看可能比托管服务更昂贵。

🔄利用现有资源重新实现解决方案会阻碍创新并分散团队注意力,因为团队不再专注于改进模型或产品,这种机会成本也应计入TCO考量中。

📊以单A100 80GB为例,托管服务的总成本为每小时4美元,而自建方案(不含工程师成本)为2.4美元,但若计入一名工程师的工资,自建方案总成本反而更高,凸显了人力成本的重要性。

🤔初创公司以极具侵略性的低价推广服务可能获得VC补贴,但这类公司未来稳定性存疑或长期谈判灵活性差,决策时应考虑这些非货币因素。

The cost of deploying Generative AI models is very shallow, many people are fixated on raw compute pricing. Statements like "we're cheaper" often dominate discussions and decision making. But in reality, the cost of deploying Generative AI models in production is much more complex and involves various factors beyond just compute resources.

This post aims to provide a more comprehensive understanding the total cost of ownership (TCO) for Generative AI Models. Deploying a Generative AI model requires more than a VM with a GPU. It normally includes:

    Container Service: Most often Kubernetes to run LLM Serving solutions like Hugging Face Text Generation Inference or vLLM.Compute Resources: GPUs for running models, CPUs for management servicesNetworking and DNS: Routing traffic to the appropriate service, Ingress & Egress.Load Balancing: Distributes incoming traffic needed for scalingAutoscaling: Adjusts the number of models based on current demand to optimize resource utilization and cost.Logging & Telemetry: Collects and analyzes data on system performance and troubleshoot issues.CI: Continuous integration systems that allow for seamless updates and the ability to revert to previous versions if needed.

Hidden Costs of Deploying Generative AI Models

Hidden costs can impact the total cost of ownership (TCO) when deploying Generative AI models. These hidden costs are often unrecognized at the beginning and come gradually. Self-built solutions may appear cheaper initially, particularly in terms of raw compute costs.

However development time, and maintenance can offset these savings. Hiring skilled data scientists, machine learning engineers, and DevOps professionals can be expensive and time consuming. Using available resources for “reimplementing” solutions hinder innovation and lead to a lack of focus. Since You not longer work on improving your model or product.

Unless building and maintaining such infrastructure is the USP of your company, opting for managed services that offer integrated support and faster time to market can be more cost-effective in the long run.

To illustrate this let's consider a hypothetical scenario of deploying LLM on a single A100 80GB using a managed service versus building it yourself with a cheaper compute service.

Note: For compute i used 1.89$/h. For other cost i went with AWS pricing. Some compute provider might not offer services for Networking or Logging our even Kubernetes, meaning you would need to roll them yourself leading to more development resources and time needed.

Infrastructure cost

CostManaged ServiceSelf-Built Solution
Compute (1x A100 80GB)-$1.89/hr
Storage (500GB)-$0.09/hr
Domain / IP (1x)-$0.01/hr
Networking (100GB)-$0.05/hr
Load Balancer (1x)-$0.03/h
Container Services (k8s)-$0.28/hr
Logging and Monitoring (25GB)-$0.05/hr
Total$4/h$2.40/h
Total + 1 Engineer*$4/h$70.9/h

* based on Machine learning engineer salary in United States from indeed

Including all required services the total cost of the self-built solution is increased 27% over the raw compute prices. It still appears to be cheaper in costs, but if you include development time and maintenance for just one additional Engineer its offset. This 1 engineer would now need to manage ~43 models to amortize its cost.

Conclusion

Understanding the cost for deploying Generative AI Models in production is important for making informed decisions. Raw compute pricing is an important factor, but it's just one piece of the puzzle. Infrastructure costs, development time, and operational overheads play a role similar to non-monetary factors, like stability of providers. Startups with aggressive prices could be VC subsidized, disappear in the future or have thin margins for less flexibility in long term negotiations.

Whether you are an individual, a startup or a company, don't let you fool by raw compute prices unless you want raw ssh access! Value your time!


Thanks for reading! If you have any questions or feedback, please let me know on Twitter or LinkedIn.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

生成式AI 总拥有成本(TCO) 部署成本 隐藏成本 基础设施成本 托管服务 自建方案
相关文章