VentureBeat 10月03日 20:42
Databricks 收购 Mooncake,简化 PostgreSQL 数据分析
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Databricks 收购了专注于连接 PostgreSQL 和 Lakehouse 格式的初创公司 Mooncake,旨在彻底消除 ETL 数据管道的需求。此举将使运营数据能够即时用于分析和 AI 工作负载,并显著提升数据移动操作的性能。Mooncake 的技术,包括 'pgmooncake' 扩展和 'moonlink' 组件,能够实现行式 PostgreSQL 数据与列式分析格式之间的实时转换,无需传统 ETL 流程,从而为企业节省大量成本并提高效率,尤其是在 AI 代理快速生成和部署应用程序的时代。

🚀 **消除 ETL 管道,实现数据即时可用:** Databricks 收购 Mooncake 的核心目标是彻底改变企业处理 PostgreSQL 数据的模式。通过 Mooncake 的技术,运营数据不再需要通过耗时且易出错的 ETL(提取、转换、加载)管道才能用于分析或 AI 模型。这意味着数据可以在生成后立即投入使用,大大缩短了数据可用性延迟,提高了决策和模型训练的效率。

⚡ **性能提升 10x 至 100x,降低基础设施成本:** Mooncake 的技术,特别是 'moonlink' 组件,能够实现 PostgreSQL 行式数据与 Delta Lake 或 Iceberg 等 Lakehouse 列式格式之间的实时无缝转换。这不仅消除了构建和维护复杂 ETL 基础设施的成本,还将常见数据移动操作的性能提升了 10 倍甚至 100 倍,对于拥有大量 PostgreSQL 实例的企业而言,这是一笔巨大的成本节约。

🤖 **赋能 AI 代理,加速应用开发:** 在 AI 代理能够以机器速度生成和部署应用程序的时代,传统的 ETL 管道已无法跟上数据变化的速度。Mooncake 的解决方案为 AI 代理提供了统一的数据访问能力,使它们能够快速迭代并访问实时数据,从而加速下一代 AI 驱动应用的开发和部署,解决了 Agentic AI 基础设施的瓶颈。

💡 **统一数据架构,简化管理:** Mooncake 的技术使得 Databricks 能够提供一个集成的平台,同时支持运营(OLTP)和分析(OLAP)数据模型。企业的数据平台团队可以将精力从维护复杂的跨系统管道转移到数据治理、访问控制和工作负载优化上,从而简化基础设施管理,减少数据质量问题和管道故障的发生几率。

Many enterprises running PostgreSQL databases for their applications face the same expensive reality. When they need to analyze that operational data or feed it to AI models, they build ETL (Extract, Transform, Load) data pipelines to move it into analytical systems. Those pipelines require dedicated data engineering teams, break frequently and create delays measured in hours or days between when data is written to a database and when it becomes available for analytics.

For companies with large numbers of PostgreSQL instances, this infrastructure tax is massive. More critically, it wasn't designed for a world where AI agents generate and deploy applications at machine speed, creating new tables, events and workflows faster than any data engineering team can keep up.

Databricks is making a bet that this architecture is fundamentally broken. The company is acquiring Mooncake, an early-stage startup focused on bridging PostgreSQL with lakehouse formats, to eliminate the need for ETL pipelines entirely. Financial terms of the deal are not being publicly disclosed. The technology promises to make operational data instantly available for analytics and AI workloads, with performance improvements ranging from 10x to 100x faster for common data movement operations.

The acquisition, announced today, comes just months after Databricks acquired Neon, a serverless PostgreSQL provider. But the speed of this second deal reveals something more urgent. Nikita Shamgunov, who joined Databricks as VP of Engineering after leading Neon, told Databricks co-founder and chief architect Reynold Xin that Databricks should buy Mooncake on his literal first day at the company.

"On day one, when we closed, Nikita said, 'Hey, we're gonna buy this company,'" Xin told VentureBeat in an exclusive interview. "And then I was like, 'Hey, you don't know where the bathrooms are yet.' And then over time... I got to know more about what the company does. Like, wow, Nikita was 100% correct. It's such a no-brainer to do it."

The agent infrastructure gap

What made Mooncake urgent is the continued acceleration of agentic AI. 

"Eighty percent of Neon customers' databases were already created by agents," Xin said, describing the shift happening across Neon's platform. "And that's actually supported by the separation of storage and compute architecture that the Neon team pioneered."

This creates a fundamental problem. Agents that build applications expect to work with PostgreSQL, which is a transactional database. However, when those same applications need to run analytics, the data must exist in columnar formats optimized for analytical queries. Historically, this required building and maintaining ETL pipelines. These are expensive, brittle systems that break frequently and require dedicated data engineering teams.

"I don't think it's fair for agents to do the ETL part as part of building those next-generation applications," Shamgunov told VentureBeat. "I think what agents expect now is the ability to iterate very quickly, and then the infrastructure should give agents fairly uniform access to data."

Beyond the cake, the Moonlink architecture advantage

Mooncake has several technologies in its portfolio. There is the 'pgmooncake' extension that enables analytical workloads to run on PostgreSQL. Then there is the moonlink component that Shamgunov describes as an acceleration tier. It enables real-time transformation between row-oriented PostgreSQL data and columnar analytical formats without traditional ETL pipelines.

"Moonlink allows you to basically create a mirror of your OLTP data in a columnar representation in Iceberg and Delta," Shamgunov explained. "Moonlink also supports an acceleration tier as well. So in many places, you accumulate latencies when you query the data lake by metadata lookups, or s3 both on the way in and on the way out."

The performance implications are dramatic. For operations such as moving data from the data lake into what Databricks calls Lakebase (its OLTP database category), Shamgunov said the improvements range from "10 to 100 times faster" to "almost unlimited times faster" for embarrassingly parallel operations, like data format transformations.

For Xin, it's all about expanding the size of the pipe.

"Imagine, in the past, OLTP databases always had a single little pipe, the pipe might be a JDBC driver and it's very narrow. It's fast, but it's very low throughput," Xin explained as an analogy. "With Mooncake and the other things we're developing, we now can create an infinite number of pipes, and those pipes are far larger than a single-threaded JDBC thing."

How Databricks stacks up against other PostgreSQL providers

The Mooncake acquisition positions Databricks directly against cloud providers' managed PostgreSQL offerings, particularly Google's AlloyDB and Amazon's Aurora. 

All three systems offer separation of storage and compute, but Databricks executives argue their architecture provides fundamental advantages. Shamgunov emphasized that Databricks combines analytics and operational models, with both ends controlled and deeply integrated, leading to faster data movement and lower latency.

The competitive dynamic is nuanced. Databricks competes with all three major cloud providers while also partnering with them. 

"We also work with Google closely to get data from the lakehouse joint customers doing that," Xin noted. "So it's not just a competitive situation. I mean, we compete with all the CSPs at the same time."

Under Databricks' ownership, Databricks is now also competing aggressively on price. Before the acquisition, the cheapest paid monthly Neon service was $25, which has now dropped dramatically, down to only $5.

"We did the opposite of what maybe a lot of people thought we would do, which is after the acquisition, we lowered the price," Xin said.

What this means for enterprises

For organizations managing thousands of operational PostgreSQL databases alongside data lakes, the immediate impact is clear. Development teams won't need to wait for data engineering to build and maintain pipelines before accessing operational data for analytics or AI workloads. 

For data platform teams, this means rethinking infrastructure management from the ground up. 

Instead of maintaining complex pipeline orchestration between operational and analytical systems, teams can focus on governance, access controls and workload optimization across a unified platform. That shift frees up engineering resources while reducing the surface area for data quality issues and pipeline failures. For enterprises building agent-driven applications, the unified architecture removes the dependency on data engineering as a prerequisite for launching new workloads.

“I think what agents expect now is the ability to iterate very quickly, and then the infrastructure should give agents fairly uniform access to data,” Shamgunov said.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Databricks Mooncake PostgreSQL ETL Lakehouse AI 数据分析 数据管道 数据库 人工智能 数据集成 云原生
相关文章