Fortune | FORTUNE 10月10日 00:21
小型AI模型:应对AI成本与能耗的创新方案
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

在西班牙圣塞瓦斯蒂安,Multiverse Computing公司正致力于开发小型AI模型,以解决当前大型语言模型(LLMs)高昂的计算成本和能源消耗问题。他们通过压缩技术,使AI模型能在普通CPU上运行,显著降低能耗和成本,并已将其应用于军事、金融、公共服务等多个领域。尽管面临模型性能的质疑,Multiverse的创新理念吸引了众多客户,并有望在AI领域开辟一条更可持续的发展路径,应对日益增长的AI能源危机。

💡 **AI模型的小型化与效率提升**:Multiverse Computing公司专注于开发和压缩大型语言模型(LLMs),创建更小、更节能的模型(SLMs)。这些模型能够运行在普通的CPU上,大幅降低了对昂贵GPU的需求,从而显著减少了能源消耗和运行成本。这种“超级节俭”的AI设计,尤其适用于需要持续监控和数据传输的场景,如军事无人机通信。

💰 **成本效益与广泛应用前景**:通过压缩技术,Multiverse的SLMs能在普通计算机和智能设备上运行,极大地降低了AI的应用门槛和使用成本。其客户群涵盖了制造商、金融服务公司、公用事业和国防承包商等多个行业,并成功为西班牙电信等公司重塑了客户服务系统,大幅削减了LLM的使用成本。未来,这种小型AI还有望应用于智能家居设备,如智能冰箱。

🌍 **应对AI的能源危机与环境影响**:大型AI模型巨大的能源消耗已成为一个紧迫的问题,可能加剧环境问题。Multiverse的CEO Lizaso认为,许多科技公司更关注成本而非环境,但两者正趋于一致,因为“绿色意味着更便宜”。通过推广小型、低能耗的AI模型,Multiverse旨在缓解AI数据中心日益增长的电力需求,并可能为应对全球能源危机做出贡献。

🔬 **技术创新与量子启发**:Multiverse的技术灵感来源于量子计算,其产品被称为“量子启发”模型。他们利用量子物理算法来训练普通计算机,实现了比传统编程更快、更智能的操作。这些算法能够将拥有数十亿参数的LLMs压缩到仅数百万参数,使得模型体积大大减小,例如其“Slim”系列模型将Meta的Llama模型效率提高了84%,准确性仅损失2%-3%,其Superfly模型甚至可以下载到手机上运行。

The city of San Sebastián, in Spain’s Basque region, is a relaxed surfers’ haven that feels a world removed from any war. Yet atop a pine-forested hill overlooking the city, engineers in a conference room at Multiverse Computing are training their focus on combat of the kind raging at the other end of Europe, in Ukraine. They’re demonstrating one of their latest creations: a small AI model designed to help drones communicate from high above a chaotic battlefield.

On a laptop, the engineers demonstrate how a drone can pinpoint precisely what is in its sights. Using the ordinary, workhorse computer processors known as CPUs, the device can identify encroaching enemy tanks and soldiers, for example, and zip only that information back to military units, using a compressed AI model that’s vastly cheaper and more energy-efficient than the behemoth large language models that power chatbots like ChatGPT. “You need an AI system that is super-frugal,” says Enrique Lizaso Olmos, Multiverse’s CEO and one of four cofounders, as the program quickly picks out a tank. “The drones use very, very, very little energy,” he adds, even when monitoring a situation that “is getting more and more complex.”

Multiverse, like its AI models, is currently small—predicted sales this year are a modest $25 million. But it’s on to a big idea. Its work focuses on compressing large language platforms, or LLMs, and creating smaller models, in the belief that most consumers and business customers can do just fine with lower-powered but thoughtfully designed AI that needs less power and fewer chips to run. 

Some experts question how well compressed AI models can truly perform. But the concept has plenty of believers. Multiverse’s clients include manufacturers, financial-services companies, utilities, and defense contractors, among them big names like Bosch, Moody’s, and Bank of Canada. The company recently redesigned the customer service system for Spanish mobile operator Telefonica, drastically cutting the cost of the LLM it had been using. Lizaso and his team envision their SLMs—small language models—being used for “smart” appliances, like a refrigerator that can tell owners instantly what food needs replacing.

More recently, Multiverse has begun collaborating with Deloitte and Intel on running public services in the U.S, including a state Medicaid platform, using its SLMs. “There are tons and tons of applications where to a user you will not see any big difference,” says Burnie Legette, AI senior solutions architect for Intel’s government technologies group. But the savings to taxpayers are potentially huge. “To run an LLM is very, very expensive,” he says. 

By focusing on creating super-small, affordable AI, Multiverse is tackling head-on an issue that has become increasingly urgent in Silicon Valley and in corporate C-suites. In the scramble to ramp up AI capabilities, many have begun wondering whether the giant investments AI requires will pay off —or whether the costs that LLMs’ power demands inflict on the environment will outweigh the benefits. (For its potential in addressing the latter issue, Multiverse earned a spot on Fortune’s 2025 Change the World list.)

“There is a big problem with the way we are doing AI,” says Román Orús, 42, Multiverse’s chief scientific officer. “It is fundamentally wrong.” He and Lizaso see an opportunity to get it right, while it’s still early days for the technology.

Quantum computing brought the founders together

As far back as 2023, OpenAI CEO Sam Altman predicted that giant AI models would eventually fade, given the dizzying expenditures involved. Nvidia CEO Jensen Huang has estimated that a single AI data center could cost $50 billion, of which $35 billion alone goes to acquiring the GPU chips, the category that Nvidia dominates. As engineers race to create next-generation AI models capable of reasoning, the ever-increasing tab is becoming more evident, as are the voracious electricity and water needs of AI data centers.

Orús and Lizaso believe that the AI arms race is foolish. They argue that the great majority of AI users have constrained needs that could be met with small, affordable, less energy-hungry models. In their view, millions are unnecessarily downloading giant LLMs like ChatGPT to perform simple tasks like booking air tickets or solving arithmetic problems. 

Multiverse’s founders came to AI in a roundabout way. Lizaso, now 62, originally trained as a doctor, then worked in banking. But he found his “true passion” as a tech entrepreneur in his mid-50s, after joining a WhatsApp group of Spaniards debating an esoteric question: How financial firms could benefit from quantum computing. The group, whose members came from different generations and professions, eventually published an academic paper in 2018, arguing that quantum computers could far more accurately and quickly price derivatives and analyze risks than regular computing.

The paper was, and still is, largely theoretical, since quantum computing hasn’t yet seen wide commercial deployment. Still, the response was immediate. “We started getting phone calls and realized we were on to something,” recalls Orús, a quantum physicist and seasoned academic. The University of Toronto’s Creative Destruction Lab invited the authors to an accelerator bootcamp in 2019. There, they discovered that VC firms and others had distributed their paper to prospective startups, suggesting they jump on their idea; they nicknamed their work “the Goldman paper,” since it had caught the attention of Goldman Sachs execs. “We were famous,” Orús laughs. The friends quit their jobs, and Multiverse was born.

Six years after its launch, Multiverse now calls its products “quantum inspired”: The team uses quantum-physics algorithms to train regular computers, a combination they say enables faster, smarter operations than traditional programming does. These algorithms enable Multiverse to create SLMs—models that can operate on a few million parameters, rather than the billions found in LLMs.

Multiverse’s core business is compressing open-source LLMs with such extreme shrinkage that most of its versions can run on CPUs, or central processing units, of the kind used in smartphones and regular computers, rather than GPUs, or graphics processing units. Because it works with open-source models, it doesn’t need the LLMs creators’ cooperation to do the shrinking. 

The company has so far raised $290 million in two funding rounds, for a valuation of over $500 million. It’s hardly a household name, although Lizaso confidently predicts it could grow to the size of Anthropic, which projects $5 billion in revenue this year. 

Last April Multiverse rolled out its “Slim” series of compressed AI models, including versions of three of Meta’s Llama models and one from France’s Mistral AI, using an algorithm Multiverse developed known as CompactifAI. The company says its versions increase energy efficiency by 84%, compared to the original, with only a 2% to 3% loss in accuracy, and that they drastically cut compute costs. Its so-called Superfly model compressed an open-source AI model on the Hugging Face platform to such a great degree that the whole model could be downloaded onto a phone.

In August, the company launched another product in its “model zoo,” called ChickenBrain, a compressed version of Meta’s Llama 3.1 model that includes some reasoning capabilities. Intel’s senior principal Stephen Phillips, a computer engineer, says Intel has chosen to work with Multiverse among others because “its models did not appear to lose accuracy when compressed, as SLMs often do.”

‘The energy crisis is coming’

The sense that something is going “wrong,” as Orús puts it, has been echoed even by some leading AI scientists. One consequence is already clear: The potential environmental cost to the planet. U.S. data centers now use about 4.4% of the country’s electricity supply, and globally, data centers’ electricity consumption will more than double by 2030, according to the International Energy Agency. By that date, according to the IEA, America’s data centers will use more electricity than the production of aluminum, steel, chemicals and all other energy-intensive manufacturing combined.

Switching AI applications to small, CPU-based models might stem that trend, according to Multiverse. Lizaso believes tech companies are less concerned about the environment than the costs. But the two issues are converging. “If green means cheaper, they are fully green,” he says. “The energy crisis is coming.”

Some experts question Multiverse’s claim that for most people, they are just as good as LLMs running on GPUs.. “That’s a big statement that no one has proven yet,” says Théo Alves Da Costa, AI sustainability head at Ekimetrics, an AI solutions company in Paris. “When you use that kind of compression, it is always at the cost of something.” He says he has not found a small language model capable of working in French as well as an LLM, for example, and that his own tests found that models slowed down markedly when switching to CPUs. It’s also generally the case that open-source models of the kind that Multiverse compresses don’t perform quite as well as proprietary LLMs.

Multiverse’s argument that compressed models significantly cut energy use might also not hold up over time, because cheaper, more accessible AI models will likely attract billions more users. That conundrum is already playing out. In August, Google AI reported that the energy consumed for each prompt on its Gemini AI platform was 33 times smaller than one year before. Nonetheless, power consumption at Google data centers more than doubled between 2020 and 2024, according to an analysis of Google’s report by news site carboncredits.com, because so many more people are now using Google AI.

For now, Multiverse says it is determined to shrink the biggest open-source models to a size that saves both energy and money. One of the next Multiverse models, expected to roll out imminently, is a version of DeepSeek, the Chinese generative AI model that shook the tech industry last year, when its creators announced that they had trained its LLM at a fraction of the cost of competitors like ChatGPT.

Multiverse says that, thanks to compression, its version will be cheaper still. And true to its desire to “challenge the status quo,” Multiverse has tweaked DeepSeek in another way, too, removing government-imposed censorship. Unlike on the original LLM, users will be able to gain access to information about politically charged events like the 1989 massacre of protesters in Beijing’s Tiananmen Square. “We have removed its filters,” says Lizaso—another parameter stripped away. 

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

小型AI模型 SLM AI成本 能源效率 Multiverse Computing AI可持续性 量子启发 Small AI Models AI Costs Energy Efficiency AI Sustainability Quantum-Inspired
相关文章