a16z 前天 23:25
ElevenLabs:从欧洲视角看AI公司如何构建独特优势与应对挑战
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文深入探讨了AI公司ElevenLabs如何凭借其独特的欧洲起源、全球化团队和创新文化,在竞争激烈的AI领域脱颖而出。文章剖析了ElevenLabs如何通过理解非英语母语者的声音细微差别来构建强大的语音技术,如何平衡研究进展与客户需求,以及如何与创意产业合作而非对抗。此外,还阐述了该公司在产品开发、团队扩张和从消费者市场转向企业级服务过程中所面临的挑战与策略,例如在研究与产品之间找到平衡点,以及设计激励机制以支持规模化发展。ElevenLabs的经验为其他AI公司提供了关于建立信任、保持独特性和实现可持续增长的宝贵见解。

🌍 **全球化视角与欧洲优势**:ElevenLabs的国际化分布式团队,特别是其欧洲根源,使其深刻理解语音传达的文化差异和细微之处。与美国本土团队不同,欧洲团队从一开始就具备了更强的跨文化敏感性,这对于构建真正自然的语音技术至关重要,并为产品在全球市场的扩张提供了独特优势,而非劣势。

💡 **研究与产品间的智慧平衡**:ElevenLabs采用了一种务实的策略来平衡前沿研究与市场需求。当研究无法在短期内(如三个月)解决问题时,团队会迅速转向产品层面寻找解决方案。这一“产品优先”的思维模式,即使是添加简单的功能如语速滑块,也极大地提升了用户体验和产品可用性,确保了公司能够快速响应客户需求。

🤝 **与创意产业共创未来**:ElevenLabs选择与创意产业紧密合作,而非试图取代。通过深入了解艺术家、制作人和厂牌的需求,ElevenLabs开发了Voice Marketplace等平台,使创作者能够克隆、授权和货币化他们的声音,并已向社区支付了数百万美元。这种协作模式不仅赢得了信任,也推动了AI技术在音乐等领域的创新应用。

🚀 **消费者到企业级的转型策略**:ElevenLabs在从消费者市场向企业级服务转型过程中,认识到时间感知和销售周期的差异。公司投资于将80%精力投入销售和20%投入工程的混合角色,以更好地理解企业客户的需求,并构建了强大的集成和管道能力。这种适应快速迭代和长期部署的能力,使其能够为医疗、电信和娱乐等多个行业提供服务。

San Francisco is the hub of AI. However, not all model companies should be built in the same ten square miles atop the peninsula. ElevenLabs is one of my favorite examples of why having a unique origin story (born out of frustration with Polish single-narrator voiceovers), a unique set of geographical constraints and opportunities, and a unique company culture gives you a differentiated advantage from day one. It’s also why ElevenLabs – because they’re a little different from every other AI company – can help illuminate many of the classic challenges these companies face: like how to earn trust from users and creatives; or how to scale a team without losing what makes you unique. I hope you enjoy these lessons from an unusual and special company.

The outside view inside the challenge

When I first met Mati and Piotr, ElevenLabs was two people in London pursuing a deceptively simple goal: give machines a voice as natural as our own. Three years later, they’ve become one of the fastest-scaling AI companies, building a full creative ecosystem across voice, music, and now conversational agents. Realistic speech and voice have been goals in the AI industry for decades, so what makes ElevenLabs different? Is it just about mastering voice, or are other things at play here too?

Over the course of my conversation with ElevenLabs founder Mati Staniszewski at Runtime, something became clear: in addition to their voice models, the team has also mastered two other domains: space and time. They’ve built out a truly global team (which you need in a field like voice) and have an intensely strategic approach to everything from research and product timelines, to balancing feedback loops from consumer and enterprise customers alike:

    Being in Europe is an edge for building a voice company and expanding globally. There are unique pains and opportunities that only non-native English speakers can appreciate.At an AI lab, research comes first. But you can’t always expect research progress to align with customer demands. So be prepared to deliver: at ElevenLabs, if research can’t solve a problem in 3 months, the team finds a way to build a product that bridges the gap in a shorter timeframe.Work with the creative industry instead of around it. Creative people are curious about AI, and want to understand where it can aid and accelerate them. Ask people questions: what parts of the production process benefit from AI? Where is it actually helpful?Having a product that reaches both consumer and enterprise users means learning how to live on faster and slower timelines. The product feedback loop with consumers is rapid; deployment times with enterprises can be longer. You need to build an organization that’s comfortable with both working styles.

Europe as an edge, not a handicap

ElevenLabs is international and distributed, and particularly attuned to all of the ways in which voice conveys meaning, and the ways that meaning can get (literally and figuratively) lost in translation. Tone, inflection, and other vocal nuances not only carry much of the real “meaning” in a spoken conversation; that meaning can also vary tremendously across cultures. You won’t really master this as a product company unless your team is international by default, and starting with different assumptions than a US team entirely based in San Francisco.

If you live in the United States or another English-speaking country, you might take it for granted that movies, podcasts, audiobooks, and pretty much all voice-based content you encounter has a variety of emotions, intonations, voices, styles, and even accents contained within the English language. It’s the lattice of those things, combined with words, that create meaning. If you encounter that same content dubbed in a different language, you may not enjoy that same variety.

ElevenLabs wouldn’t exist if they weren’t based in Europe. In Poland, where much of the original team is from, all foreign films are dubbed with one actor, who performs lines in monotone for both genders (I personally felt this pain too growing up in China, but at least the production teams responsible for dubbing splurged for two actors!). This frustration partially led to the recognition that the world needed a strong text-to-speech offering.

As the team scaled beyond 30 people (ElevenLabs now has ~350 employees) they decided to build out office hubs in London, Warsaw, and San Francisco. There are benefits to this beyond the wisdom of building team culture in person: in the case of ElevenLabs, international teams understand that voice is an extremely flexible abstraction layer.

“We realized that if we wanted the best people[…] we needed to hire wherever they were. We couldn’t lock ourselves to just San Francisco or the West coast.”

Research vs Product: when to ship and when to wait

In an earlier session at Runtime, Jeetu Patel (President & Chief Product Officer at Cisco) shared his candid opinion about what kind of companies will do well in coming years: the integrated product-model companies. As he put it: “I think the combination of a model working very closely with the product, and the model getting better as there’s feedback in the product, is gonna be super important.”

The motion of building a model and a product that makes use of it presents obvious advantages: the more people use the product, the better you can fine-tune the underlying model. In addition, when there are functionalities research alone cannot address yet, the product can perform the task to give research brewing time.

Tiny differences in the product – which in this case, could mean vocal output itself (in all its international nuance), or the interface presented to teams putting ElevenLabs to work in their own services – can have huge compounding impacts on what the model and the product become. So the designed product interface needs to be just in front of what your model “could do on its own”.

ElevenLabs felt this in the early days of their text-to-speech product, when one of their most commonly heard requests from customers was the desire for a slider to adjust the speed of voices. Initially, Mati and other members of the team were reluctant: they didn’t want to have sliders, toggles, or any kind of product that would make them seem too similar to previous generations of tooling.

The team then spent about nine months trying to solve this problem on the research side, not the product side. In the meantime, customers still wanted sliders. Eventually, ElevenLabs capitulated and built them directly into the product. Now the team has a heuristic: if something will take more than three months to solve with research, they try to solve it on the product layer instead.

“We didn’t want to become another generation of editing tools with endless sliders and toggles. So we tried to solve it through research — letting the model decide how fast a voice should speak. After nine months, we couldn’t crack it. A simple product fix did. Now our rule is clear: if research takes more than three months, product moves ahead.”

Design incentives to scale

My colleague Martin Casado likes to say that companies go through three phases: a product phase, a sales phase, and a scaling phase. ElevenLabs now has 350 people and has gotten a taste of the growing pains that come with navigating through each of these phases. I asked Mati about the biggest challenges that came with this evolution. His answer was:

“In the early days, everyone ran on passion and instinct. But as we scaled, it became obvious – building a real machine means designing the right incentives.”

Mati relayed a recent shift that was motivated by the observation that “quota and commissions are a lagging indicator, strategy is a leading indicator.” Explaining further, Mati spoke about a recent negotiation that would have seen a major foundation lab licensing and distributing ElevenLabs’ voice models in demos. For the sales team involved in that deal, it would have resulted in a big commission. But it also may not have been the best move strategically for the company as a whole: as we all know, this is an extremely competitive space.

So Mati arrived at a solution that some may consider unorthodox: sales teams can still see commission on a deal that gets killed. Sometimes it’s smarter to forgo shorter-term revenue wins in favor of keeping research and models proprietary.

Working with the creative industry instead of against it

When ElevenLabs was starting out in the creative space, the environment wasn’t necessarily friendly to the generative models pitching themselves to be used in production. ElevenLabs chose collaboration over disruption. Mati described spending time with artists, producers, and labels to deeply understand their priorities and incentives. He wanted to appreciate how AI could enhance rather than replace creative work. That meant learning from figures like Jarre about where AI adds value in the production process, and where human expression should remain untouched.

That philosophy led to the Voice Marketplace, where creators can clone, license, and monetize their voices. Mati shared that they now have almost 10,000 voices and have paid $10 million back to the community. One of the earliest voices, a deep Spanish tone that initially underperformed in Spain, became one of the top three voices globally once made available in English. The marketplace turned talent discovery into a global, multilingual phenomenon.

It’s a cool example of how AI can distill a voice as a medium: in a prior era, that person’s voice would have been confined to his own language.

It’s a cool example of how AI can distill a voice as a medium: in a prior era, that person’s voice would have been confined to his own language. Today, voice is more like software that can “run” in any language.

ElevenLabs applied the same approach to music, partnering with all major labels in the Big Four and other labels like Kobalt to build a licensed music model. “It took us 18 months to find an agreement that worked,” Mati said. At every step, transparency and engagement were key. ElevenLabs spent time with artists and label members to demystify the technology and avoid the “knee-jerk reaction that AI is bad.” The result is rare in generative AI: a company working with the creative industry to build the future, not bulldoze its past.

Transitioning from Consumer to Enterprise

When a company begins to scale from consumer to enterprise, something funny happens to everyone’s perception of time and urgency: the number of high-stakes deals and customers begins to accelerate, but the actual sales cycles and deployment times can elapse across months. While hearing feedback from a creator on a voice model can happen in a matter of minutes, iterating with a large customer takes much longer. You need to start getting comfortable with fast mode and slow mode.

When ElevenLabs launched, they had a ton of inbound from enterprises. At the time the organization was composed mostly of engineers and researchers, with no salespeople to speak of. This resulted in a temporary misstep, when Mati asked engineers (who as we know are not always the most sales-oriented folks in an organization) to handle sales. Per Mati, “At first we thought we could do it all with engineers — no salespeople, just product.”

That idealistic approach quickly met reality. Mati decided to invest in roles that were divided into 80% sales and 20% engineering. This became an important way to better understand customers, what they care about, and then proactively build product offerings that address those needs.

To serve hospitals, telcos, and global media platforms, the company had to build not just models, but pipelines and integrations. Today, ElevenLabs’ voice and agent platforms power applications in healthcare, customer experience, and entertainment, turning what started as a creator tool into a robust enterprise stack.

“It’s easy to do a demo, but how do you actually build it to production? How do you test it, do version control, evaluate, monitor, fine tune, based on the results?”

At this point, ElevenLabs has around 20 product teams composed of 5-10 people, which support both high-velocity shipping and enterprise-level discipline. Some teams work on verticals that are post-product market fit. For those teams, the stakes are well-understood: give enterprise customers and smaller creators a quality experience with no downtime. But there are also product teams working on newer initiatives at ElevenLabs that operate more like a fast moving micro-startup. These teams are working on pre-product market fit initiatives, and the stakes are more existential: they have six months to prove a product can resonate and get traction with customers, otherwise the product gets axed.

ElevenLabs’ early team thrived on product velocity and instant feedback from creators. Enterprise sales, by contrast, meant long cycles and patience. “Some of our team had never worked in enterprise: they were skeptical about waiting six or twelve months to see results,” Mati admitted. “In the early days, we had to shield them from that and just say, trust us, it’ll work.”

And you know what? It did.

                                            </div>

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

ElevenLabs AI 语音技术 欧洲AI 产品策略 创意产业 企业级服务 ElevenLabs AI Voice Technology European AI Product Strategy Creative Industry Enterprise Services
相关文章