TechCrunch News 2024年10月15日
Gladia believes real-time processing is the next frontier of audio transcription APIs
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

法国初创公司Gladia提供语音识别API,在A轮融资中获1600万美元。其API能将音频文件高精度、快速转为文本,性能优于部分大厂产品。它支持多种语言和口音,已被600多家公司使用。Gladia欲通过新资金简化流程,解决延迟问题,并认为音频应用将迎来类似ChatGPT的时刻。

💬Gladia的语音识别API具有高精度和低周转时间的特点,可将音频文件快速准确地转为文本,且性能优于亚马逊、微软和谷歌等大厂的部分产品。

🌐该API支持100种语言和多种口音,已被600多家公司采用,包括一些会议记录和笔记助手应用,用户反馈良好。

📈Gladia获得新融资后,计划简化流程,将音频智能和基于LLM的任务集成在一个API调用中,并解决实时处理的延迟问题。

🎯Gladia认为音频应用即将迎来类似ChatGPT的重要时刻,随着苹果、谷歌等在系统中加入转录模型,开发者会更重视音频功能,Gladia等API提供商将发挥重要作用。

French startup Gladia, which offers a speech-recognition application programming interface (API), has raised $16 million in a Series A funding round. Essentially, Gladia’s API lets you turn any audio file into text with a high level of accuracy and low turnaround time.

While Amazon, Microsoft and Google all offer speech-to-text APIs as part of their cloud-hosting product suites, they don’t perform as well as newer models offered by specialized startups.

There has been tremendous progress in this field over the past couple of years, especially after the release of Whisper by OpenAI. Gladia competes with other well-funded companies in the space, such as AssemblyAI, Deepgram and Speechmatics.

Gladia originally offered a fine-tuned version of Whisper’s speech-to-text model with some much needed improvements. For instance, the startup supports diarization out of the box — it can detect when there are multiple speakers in a conversation and separate the recording, and transcribed text, depending on who’s talking.

Gladia supports 100 languages and a wide variety of accents. This reporter can confirm that it works, as we’ve been using Gladia to transcribe some interviews, and accents weren’t an issue.

The startup offers its speech-to-text model as a hosted API that users can leverage in their own applications and services. Over 600 companies use Gladia, including several meeting recorders and note-taking assistants like Attention, Circleback, Method Financial, Recall, Sana and Veed.io.

That particular use case is interesting, because many companies have to chain API calls. They first turn speech into text, which they then feed into a large language model (LLM), such as GPT-4o or ‎Claude 3.5 Sonnet, to extract knowledge from large walls of text.

With the new funding, Gladia wants to simplify that pipeline by integrating audio intelligence and LLM-based tasks in a single API call. For instance, a customer could get a conversation summary generated from a handful of bullet points without having to rely on a third-party LLM API.

The other issue that Gladia is looking to solve is latency. You may have seen some demos of real-time audio conversations with an AI-based calling agent (11x has a good demo on its website), and these systems have to be able to transcribe in near real time to make such conversations sound as human-like as possible.

“We realized that real time wasn’t very good in terms of quality in the market in general. And people had a weird use case. They were doing real-time processing, and then they were grabbing the audio and running it in batch. We wondered: ‘Why are you doing this?’ They told us: ‘The quality isn’t good in real-time processing, so we transcribe it in batch afterwards,’” co-founder and CEO Jean-Louis Quéguiner (pictured above; right) told TechCrunch.

Gladia chose to tackle this problem, and it can currently transcribe a live conversation with a latency of under 300 milliseconds. The company claims that the real-time processing is now more or less as good as the default, asynchronous batch transcription API, but it’s hard for us to judge without some proper testing. As Quéguiner says, the startup is aiming for “batch quality with real-time capabilities.”

AI calling agents aside, you could imagine a call center using those real-time capabilities to help calling agents find relevant information in the middle of a call. “Our single API is compatible with all existing tech stacks and protocols, including SIP, VoIP, FreeSwitch and Asterisk,” co-founder and CTO Jonathan Soto (pictured above; left) said in a statement.

XAnge is leading the Series A funding round. Illuminate Financial, XTX Ventures, Athletico Ventures, Gaingels, Mana Ventures, Motier Ventures, Roosh Ventures and Soma Capital also participated.

Gladia believes we are on the brink of a “ChatGPT moment” for audio applications. GPT technology has been around for years, but ChatGPT really popularized LLMs with its consumer chat-like interface.

As Apple or Google start including transcription models within iOS or Android, consumers will start to understand the value of automated transcription within the apps they use. Developers will likely then integrate audio features in their products, and that’s where API providers like Gladia will come in.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Gladia 语音识别 API 音频应用
相关文章