Artificial Fintelligence 09月25日
LLM API市场竞争加剧
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

随着更多公司进入LLM API市场,竞争日益激烈。OpenAI最初垄断市场,但如今GPT-4面临多个竞争对手,如Gemini Ultra和Llama 3。市场分化为高端和低端,高端模型由大型实验室提供,价格昂贵;低端模型由开源社区提供,成本更低。企业为降低成本,可能自行训练模型,如Harvey和Cursor。未来市场将转向成本最低的模型,除非任务复杂到需要GPT-4。

🔍 高端市场由大型实验室主导,提供高性能但昂贵的模型,如GPT-4,目前缺乏直接竞争者。

🌐 开源社区(如Meta和LocalLlama)推出高质量且低成本的模型,推动市场向低端分化。

💡 企业为降低成本,可能自行训练模型,如Harvey和Cursor,以减少对第三方API的依赖。

📉 随着竞争加剧,高端模型的利润空间被压缩,低端模型价格趋近于GPU+电费成本。

🔄 开源模型将持续提升质量并降低成本,对大型实验室构成压力,市场将形成两极分化。

Before I studied machine learning, I was an Econ grad student banging out OLS problem sets (I see the OLS equation— (X’X)^-1X’y— whenever I close my eyes, I derived it so many times). My research area was antitrust theory, and in particular, vertical integration. That gives me a unique perspective: how will the LLM API market evolve as more companies enter the space?

The market began, famously, with OpenAI releasing ChatGPT and rapidly hitting $1.3B in revenue. At this time last year, however, there was basically no competition in the LLM API market. Bard was yet to be released, let alone Claude, and Gemini was a mere twinkle in Sundar’s eyes. OpenAI had a monopoly in the market, letting them capture basically all of the value.

Artificial Fintelligence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In the year since, what we’ve seen is that there doesn’t appear to be a moat in LLMs except at the highest end. GPT-4 is the only model which doesn’t have competition, and there are competitors sniffing around— Gemini Ultra, Llama 3, and the as-yet-unreleased mysterious Mistral model bigger than medium. At the GPT 3.5 level, however, you have many options for hosting, and you can even host it yourself. This necessarily limits the prices any company can charge.

Generally speaking, companies enter a new market when they think they can make a profit above the minimum threshold they require. The larger the company is, the smaller the profit threshold they require. If I, an individual, were to start offering a service to finetune LLMs, I would need to charge a fairly high margin at first, as I would have a small customer base to spread the costs over. As my company grows, I would have a larger customer base to spread the costs over, and would have more money to spend on optimizations enabling me to serve LLMs for cheaper:

with each optimization that you do to make your own process more efficient, you increase your margin. That’s great! You make more money per token. Right? Well, not quite. In a vacuum with a spherical cow, you do. But just as you invest in your ability to serve tokens more efficiently, your competitors are all doing the same, eroding your margins. To do a bad Ben Horowitz impersonation, You run this hard just to stay in place.

The necessary implication is that the undifferentiated LLM market will become a ruthless competition for efficiency, with companies competing to see who can demand the lowest return on invested capital.

In the classic business strategy book, the Innovator’s Dilemma, there lives what is the canonical example for how technological disruption happens (this is taken from the New Yorker profile on the author, Clayton Christensen):

In the world of steel manufacturing, historically, steel was made in massive integrated mills. They made high quality steel with reasonable margins. Then came along electric mini mills. These mills were able to make the lowest quality steel at a cheaper cost. The large steel manufacturers saw this, shrugged, and focused on making high quality steel at a (relatively) high margin. Over time, the electric mini mill operators figured out how to make higher and higher quality steel, moved upmarket, and killed the massive integrated mills (US Steel— once the 16th largest corporation by market cap in the US— was removed from the S&P 500 in 2014).

Get 20% off a group subscription

The analogy to LLMs is straightforward. The large labs focus on making the highest performing models. They are expensive, but excellent, and outperform every other model. However, they are expensive. You need margin to pay for all of those $900k engineers! Even then, however, we see competition on price. Gemini Pro

At the low end, we have the open source community, led by Meta and r/LocalLlama, which are cranking out high quality models and figuring out how to serve them on ridiculously low powered machines. We should expect to see the open weight models improve in quality and decrease in cost (on a quality adjusted basis), putting pressure on the margins of the largest labs. As a real-time example, Together came out with a hosted version of Mixtral that is 70% cheaper than Mistral’s own version.

We should thus expect a bifurcated market. At the high end live more expensive, higher quality models, and at the low end, lower quality, less expensive models. For open weights models, we should expect their price to converge to price of GPUs + electricity (and as competition increases in the GPU market, perhaps just to the price of electricity).

The question, then, is what does the buyer for these APIs look like? If we were to rank the economically valuable tasks that LLMs can perform from most complex to least complex, how many of the tasks require high end complexity? At some point there’s a threshold where GPT-4 is required, but it’s hard to image that the threshold will remain static. The open weight models will continue their inexorable climb up the list, biting at the margins of the large labs. As tooling makes it easier to effortlessly switch between model APIs, the developers using the API will switch to whatever the lowest cost model is that accomplishes their task. If you’re using a LLM for, say, short-length code completion, do you need the biggest and best model? Probably not!

Moreover, the companies with the biggest success in the consumer marketplace will inevitably start to balk at paying a significant amount of their profits to another company, and will start to train their own models. We see companies like Harvey, and Cursor, which were some of the companies with the earliest access to GPT-4, start to hire research scientists/engineers, giving them the talent required to train their own foundation models. As API fees are probably the biggest expense for these companies, it seems natural that they will do everything they can to reduce their costs as much as possible.

If you’re building your own models, you can go out and raise a round of investment to invest in your own models, trading off a one-time capital expenditure to increase your overall margins. This is the justification for Google’s TPU program, for example. By spending billions of dollars on custom silicon, they’re able to avoid paying Nvidia’s Danegeld.

The conclusion, then, is that we will see the market for LLM APIs converge to one of lowest cost as long as your task is simple enough to be solved by open weight models. If your task is so complex that it requires the best model, you’re stuck paying OpenAI. For everyone else, there’s finetuned Mistral 7B.

Artificial Fintelligence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLM API市场 竞争加剧 开源模型 高端与低端分化 企业自研模型
相关文章