小型AI模型：应对AI成本与能耗的创新方案

The city of San Sebastián, in Spain’s Basque region, is a relaxed surfers’ haven that feels a world removed from any war. Yet atop a pine-forested hill overlooking the city, engineers in a conference room at Multiverse Computing are training their focus on combat of the kind raging at the other end of Europe, in Ukraine. They’re demonstrating one of their latest creations: a small AI model designed to help drones communicate from high above a chaotic battlefield.

On a laptop, the engineers demonstrate how a drone can pinpoint precisely what is in its sights. Using the ordinary, workhorse computer processors known as CPUs, the device can identify encroaching enemy tanks and soldiers, for example, and zip only that information back to military units, using a compressed AI model that’s vastly cheaper and more energy-efficient than the behemoth large language models that power chatbots like ChatGPT. “You need an AI system that is super-frugal,” says Enrique Lizaso Olmos, Multiverse’s CEO and one of four cofounders, as the program quickly picks out a tank. “The drones use very, very, very little energy,” he adds, even when monitoring a situation that “is getting more and more complex.”

Multiverse, like its AI models, is currently small—predicted sales this year are a modest $25 million. But it’s on to a big idea. Its work focuses on compressing large language platforms, or LLMs, and creating smaller models, in the belief that most consumers and business customers can do just fine with lower-powered but thoughtfully designed AI that needs less power and fewer chips to run.

Some experts question how well compressed AI models can truly perform. But the concept has plenty of believers. Multiverse’s clients include manufacturers, financial-services companies, utilities, and defense contractors, among them big names like Bosch, Moody’s, and Bank of Canada. The company recently redesigned the customer service system for Spanish mobile operator Telefonica, drastically cutting the cost of the LLM it had been using. Lizaso and his team envision their SLMs—small language models—being used for “smart” appliances, like a refrigerator that can tell owners instantly what food needs replacing.

More recently, Multiverse has begun collaborating with Deloitte and Intel on running public services in the U.S, including a state Medicaid platform, using its SLMs. “There are tons and tons of applications where to a user you will not see any big difference,” says Burnie Legette, AI senior solutions architect for Intel’s government technologies group. But the savings to taxpayers are potentially huge. “To run an LLM is very, very expensive,” he says.

By focusing on creating super-small, affordable AI, Multiverse is tackling head-on an issue that has become increasingly urgent in Silicon Valley and in corporate C-suites. In the scramble to ramp up AI capabilities, many have begun wondering whether the giant investments AI requires will pay off —or whether the costs that LLMs’ power demands inflict on the environment will outweigh the benefits. (For its potential in addressing the latter issue, Multiverse earned a spot on Fortune’s 2025 Change the World list.)

“There is a big problem with the way we are doing AI,” says Román Orús, 42, Multiverse’s chief scientific officer. “It is fundamentally wrong.” He and Lizaso see an opportunity to get it right, while it’s still early days for the technology.

Quantum computing brought the founders together

As far back as 2023, OpenAI CEO Sam Altman predicted that giant AI models would eventually fade, given the dizzying expenditures involved. Nvidia CEO Jensen Huang has estimated that a single AI data center could cost $50 billion, of which $35 billion alone goes to acquiring the GPU chips, the category that Nvidia dominates. As engineers race to create next-generation AI models capable of reasoning, the ever-increasing tab is becoming more evident, as are the voracious electricity and water needs of AI data centers.

Orús and Lizaso believe that the AI arms race is foolish. They argue that the great majority of AI users have constrained needs that could be met with small, affordable, less energy-hungry models. In their view, millions are unnecessarily downloading giant LLMs like ChatGPT to perform simple tasks like booking air tickets or solving arithmetic problems.

Multiverse’s founders came to AI in a roundabout way. Lizaso, now 62, originally trained as a doctor, then worked in banking. But he found his “true passion” as a tech entrepreneur in his mid-50s, after joining a WhatsApp group of Spaniards debating an esoteric question: How financial firms could benefit from quantum computing. The group, whose members came from different generations and professions, eventually published an academic paper in 2018, arguing that quantum computers could far more accurately and quickly price derivatives and analyze risks than regular computing.

The paper was, and still is, largely theoretical, since quantum computing hasn’t yet seen wide commercial deployment. Still, the response was immediate. “We started getting phone calls and realized we were on to something,” recalls Orús, a quantum physicist and seasoned academic. The University of Toronto’s Creative Destruction Lab invited the authors to an accelerator bootcamp in 2019. There, they discovered that VC firms and others had distributed their paper to prospective startups, suggesting they jump on their idea; they nicknamed their work “the Goldman paper,” since it had caught the attention of Goldman Sachs execs. “We were famous,” Orús laughs. The friends quit their jobs, and Multiverse was born.

Six years after its launch, Multiverse now calls its products “quantum inspired”: The team uses quantum-physics algorithms to train regular computers, a combination they say enables faster, smarter operations than traditional programming does. These algorithms enable Multiverse to create SLMs—models that can operate on a few million parameters, rather than the billions found in LLMs.

Multiverse’s core business is compressing open-source LLMs with such extreme shrinkage that most of its versions can run on CPUs, or central processing units, of the kind used in smartphones and regular computers, rather than GPUs, or graphics processing units. Because it works with open-source models, it doesn’t need the LLMs creators’ cooperation to do the shrinking.

The company has so far raised $290 million in two funding rounds, for a valuation of over $500 million. It’s hardly a household name, although Lizaso confidently predicts it could grow to the size of Anthropic, which projects $5 billion in revenue this year.

Last April Multiverse rolled out its “Slim” series of compressed AI models, including versions of three of Meta’s Llama models and one from France’s Mistral AI, using an algorithm Multiverse developed known as CompactifAI. The company says its versions increase energy efficiency by 84%, compared to the original, with only a 2% to 3% loss in accuracy, and that they drastically cut compute costs. Its so-called Superfly model compressed an open-source AI model on the Hugging Face platform to such a great degree that the whole model could be downloaded onto a phone.

In August, the company launched another product in its “model zoo,” called ChickenBrain, a compressed version of Meta’s Llama 3.1 model that includes some reasoning capabilities. Intel’s senior principal Stephen Phillips, a computer engineer, says Intel has chosen to work with Multiverse among others because “its models did not appear to lose accuracy when compressed, as SLMs often do.”

‘The energy crisis is coming’

The sense that something is going “wrong,” as Orús puts it, has been echoed even by some leading AI scientists. One consequence is already clear: The potential environmental cost to the planet. U.S. data centers now use about 4.4% of the country’s electricity supply, and globally, data centers’ electricity consumption will more than double by 2030, according to the International Energy Agency. By that date, according to the IEA, America’s data centers will use more electricity than the production of aluminum, steel, chemicals and all other energy-intensive manufacturing combined.

Switching AI applications to small, CPU-based models might stem that trend, according to Multiverse. Lizaso believes tech companies are less concerned about the environment than the costs. But the two issues are converging. “If green means cheaper, they are fully green,” he says. “The energy crisis is coming.”

Some experts question Multiverse’s claim that for most people, they are just as good as LLMs running on GPUs.. “That’s a big statement that no one has proven yet,” says Théo Alves Da Costa, AI sustainability head at Ekimetrics, an AI solutions company in Paris. “When you use that kind of compression, it is always at the cost of something.” He says he has not found a small language model capable of working in French as well as an LLM, for example, and that his own tests found that models slowed down markedly when switching to CPUs. It’s also generally the case that open-source models of the kind that Multiverse compresses don’t perform quite as well as proprietary LLMs.

Multiverse’s argument that compressed models significantly cut energy use might also not hold up over time, because cheaper, more accessible AI models will likely attract billions more users. That conundrum is already playing out. In August, Google AI reported that the energy consumed for each prompt on its Gemini AI platform was 33 times smaller than one year before. Nonetheless, power consumption at Google data centers more than doubled between 2020 and 2024, according to an analysis of Google’s report by news site carboncredits.com, because so many more people are now using Google AI.

For now, Multiverse says it is determined to shrink the biggest open-source models to a size that saves both energy and money. One of the next Multiverse models, expected to roll out imminently, is a version of DeepSeek, the Chinese generative AI model that shook the tech industry last year, when its creators announced that they had trained its LLM at a fraction of the cost of competitors like ChatGPT.

Multiverse says that, thanks to compression, its version will be cheaper still. And true to its desire to “challenge the status quo,” Multiverse has tweaked DeepSeek in another way, too, removing government-imposed censorship. Unlike on the original LLM, users will be able to gain access to information about politically charged events like the 1989 massacre of protesters in Beijing’s Tiananmen Square. “We have removed its filters,” says Lizaso—another parameter stripped away.

Quantum computing brought the founders together

‘The energy crisis is coming’

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签