少点错误 10月25日 04:59
AI算力成本与模型规模的深入分析
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章详细解析了1 GW数据中心的成本构成,包括基础设施、计算硬件及运营费用,并指出不同报告方式可能导致成本数字的差异。重点阐述了Nvidia GPU和非Nvidia硬件在成本中的占比,以及AI公司与云服务商签订的算力合同的潜在金额和年限。同时,文章还探讨了2026年模型尺寸的增长趋势,分析了HBM(高带宽内存)在模型规模扩展中的作用,并比较了Anthropic和OpenAI在算力及硬件配置上的差异,预测了未来模型参数和稀疏性的变化。

📊 **多维度揭示AI算力成本构成与定价模式**:文章详细分析了1 GW数据中心的总成本,将其分解为基础设施(10-150亿美元)、计算硬件(300-350亿美元)和年度运营费用(20-25亿美元)。指出不同报告方式(如基础设施资本支出、硬件资本支出、总资本支出或5年合同总成本)会导致1 GW算力成本在100亿至600亿美元之间浮动,强调了合同金额与实际年化成本的差异。

🚀 **Nvidia与非Nvidia硬件的成本权衡与合同解读**:文章估算了Nvidia GPU服务器(每GW约5500台GB200 NVL72服务器,共40万颗芯片)约占200亿美元的资本支出,并基于70%的Nvidia利润率推算出每GW GPU硬件成本约为140亿美元,年化成本约28亿美元。由此推断,非Nvidia硬件成本可能仅低25%,并解释了“数十亿美元”的TPU合同可能意味着约1.2 GW算力在5年内的年化成本。

🧠 **2026年模型规模的指数级增长与HBM的重要性**:文章预测,到2026年,模型尺寸将实现10倍增长,这得益于算力的指数级提升(从2024年的10万颗H100到2026年的40万颗GB200/GB300芯片)以及HBM(高带宽内存)的显著增加。例如,1 GW的TPUv7 Ironwood将拥有每256颗芯片Pod 49 TB HBM,与OpenAI的GB200/GB300 NVL72机架具备相当的HBM容量,这使得更高效的模型稀疏性成为可能,并突破了早期服务器的内存限制。

Published on October 24, 2025 8:42 PM GMT

There are many ways in which costs of compute get reported. A 1 GW datacenter site costs $10-15bn in the infrastructure (buildings, cooling, power), plus $30-35bn in compute hardware (servers, networking, labor), assuming Nvidia GPUs. Useful life of the infrastructure is about 10-15 years, and with debt financing a developer only needs to ensure it's paid off over those 10-15 years, which comes out at $1-2bn per year. For the compute hardware, the useful life is taken as about 5 years, which gives $6-7bn per year. Operational expenses (electricity, maintenance) are about $2.0-2.5bn per year.

In total, 1 GW of compute costs about $9-11bn per year, but whoever paid the compute hardware capex needs to ensure the payments continue for 5 years, so a contract for 1 GW of compute will be 5 years long, which makes it a single contract for at least $45-55bn, which might become $55-65bn to allow a profit for the cloud provider.

Thus 1 GW of compute could get reported as $10bn (infrastructure capex; or alternatively the compute costs for the AI company in a calendar year), as $30bn (compute hardware capex without labor costs), as $45bn (infrastructure plus compute hardware capex), or as $60bn (the total cost of the contract between the AI company and the cloud provider over 5 years).

A $300bn contract then might mean about 5 GW of total capacity, while a $27bn datacenter site could simultaneously mean 2 GW of total capacity. And a $300bn contract doesn't mean that the 5 GW of capacity will be built immediately, for example if only 2 GW is built initially, that requires the AI company to be capable of paying about $25bn per year (for 5 years), with the other 3 GW being contingent on the AI company's continuing growth.

Non-Nvidia Hardware

The Nvidia servers (1 GW all-in is about 5500 GB200 NVL72 servers with 72 chips each, or 400K chips in total) take up about $20bn of capex, so if Nvidia's margin of about 70% applies to this part (it's probably less, since GPUs are not all of the server), it comes out to $14bn per GW, or $2.8bn per year in a 5-year contract between an AI company and a cloud provider, about 25% of it. This suggests that non-Nvidia compute might only cost up to 25% less, all else equal (which it isn't), even though GPUs are usually portrayed as the majority of the cost of compute, and Nvidia's margin as the majority of the cost of the GPUs.

Thus a TPU contract for "tens of billions of dollars" and "over a gigawatt of capacity" admits an interpretation where it's a ~5-year contract at ~$12bn per year for ~1.2 GW of compute (in total power, not just IT power).

Model Sizes in 2026

If the new contract is for TPUv7 Ironwood, in 2026 Anthropic will have 1 GW of compute with 49 TB of HBM per 256-chip pod. This is comparable to OpenAI's Abilene site, which is 1 GW of compute with 14-20 TB of HBM per GB200/GB300 NVL72 rack, and will also be ready at this capacity in 2026. Currently Anthropic has access to Trainium 2 Ultra servers of AWS's Project Rainier with 6 TB of HBM per rack, while OpenAI's capacity is probably mostly in 8-chip Nvidia servers with 0.64-1.44 TB of HBM per server.

Feasible model size scales with HBM per server/rack/pod that serves as a scale-up world (especially for reasoning models and their training), so 2026 brings not just 10x more compute than 2024 (400K chips in GB200/GB300 NVL72 servers instead of 100K H100s), but also 10x larger models (in total parameter count). As the active params for compute optimal models scale with the square root of compute, this enables more sparsity in MoE models than 2024 compute did, and the number of active params for the larger models will no longer be constrained in practice by the 8-chip Nvidia servers (with 100K H100s, compute optimality asked for about 1T active params, which is almost too much for servers with 1 TB of HBM, and MoE models ask for multiple times more total params).



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI算力成本 数据中心 Nvidia GPU 模型规模 HBM TPU AI Compute Costs Data Center Model Scaling AI Hardware
相关文章