Databricks 收购 Mooncake，简化 PostgreSQL 数据分析

Many enterprises running PostgreSQL databases for their applications face the same expensive reality. When they need to analyze that operational data or feed it to AI models, they build ETL (Extract, Transform, Load) data pipelines to move it into analytical systems. Those pipelines require dedicated data engineering teams, break frequently and create delays measured in hours or days between when data is written to a database and when it becomes available for analytics.

For companies with large numbers of PostgreSQL instances, this infrastructure tax is massive. More critically, it wasn't designed for a world where AI agents generate and deploy applications at machine speed, creating new tables, events and workflows faster than any data engineering team can keep up.

Databricks is making a bet that this architecture is fundamentally broken. The company is acquiring Mooncake, an early-stage startup focused on bridging PostgreSQL with lakehouse formats, to eliminate the need for ETL pipelines entirely. Financial terms of the deal are not being publicly disclosed. The technology promises to make operational data instantly available for analytics and AI workloads, with performance improvements ranging from 10x to 100x faster for common data movement operations.

The acquisition, announced today, comes just months after Databricks acquired Neon, a serverless PostgreSQL provider. But the speed of this second deal reveals something more urgent. Nikita Shamgunov, who joined Databricks as VP of Engineering after leading Neon, told Databricks co-founder and chief architect Reynold Xin that Databricks should buy Mooncake on his literal first day at the company.

"On day one, when we closed, Nikita said, 'Hey, we're gonna buy this company,'" Xin told VentureBeat in an exclusive interview. "And then I was like, 'Hey, you don't know where the bathrooms are yet.' And then over time... I got to know more about what the company does. Like, wow, Nikita was 100% correct. It's such a no-brainer to do it."

The agent infrastructure gap

What made Mooncake urgent is the continued acceleration of agentic AI.

"Eighty percent of Neon customers' databases were already created by agents," Xin said, describing the shift happening across Neon's platform. "And that's actually supported by the separation of storage and compute architecture that the Neon team pioneered."

This creates a fundamental problem. Agents that build applications expect to work with PostgreSQL, which is a transactional database. However, when those same applications need to run analytics, the data must exist in columnar formats optimized for analytical queries. Historically, this required building and maintaining ETL pipelines. These are expensive, brittle systems that break frequently and require dedicated data engineering teams.

"I don't think it's fair for agents to do the ETL part as part of building those next-generation applications," Shamgunov told VentureBeat. "I think what agents expect now is the ability to iterate very quickly, and then the infrastructure should give agents fairly uniform access to data."

Beyond the cake, the Moonlink architecture advantage

Mooncake has several technologies in its portfolio. There is the 'pgmooncake' extension that enables analytical workloads to run on PostgreSQL. Then there is the moonlink component that Shamgunov describes as an acceleration tier. It enables real-time transformation between row-oriented PostgreSQL data and columnar analytical formats without traditional ETL pipelines.

"Moonlink allows you to basically create a mirror of your OLTP data in a columnar representation in Iceberg and Delta," Shamgunov explained. "Moonlink also supports an acceleration tier as well. So in many places, you accumulate latencies when you query the data lake by metadata lookups, or s3 both on the way in and on the way out."

The performance implications are dramatic. For operations such as moving data from the data lake into what Databricks calls Lakebase (its OLTP database category), Shamgunov said the improvements range from "10 to 100 times faster" to "almost unlimited times faster" for embarrassingly parallel operations, like data format transformations.

For Xin, it's all about expanding the size of the pipe.

"Imagine, in the past, OLTP databases always had a single little pipe, the pipe might be a JDBC driver and it's very narrow. It's fast, but it's very low throughput," Xin explained as an analogy. "With Mooncake and the other things we're developing, we now can create an infinite number of pipes, and those pipes are far larger than a single-threaded JDBC thing."

How Databricks stacks up against other PostgreSQL providers

The Mooncake acquisition positions Databricks directly against cloud providers' managed PostgreSQL offerings, particularly Google's AlloyDB and Amazon's Aurora.

All three systems offer separation of storage and compute, but Databricks executives argue their architecture provides fundamental advantages. Shamgunov emphasized that Databricks combines analytics and operational models, with both ends controlled and deeply integrated, leading to faster data movement and lower latency.

The competitive dynamic is nuanced. Databricks competes with all three major cloud providers while also partnering with them.

"We also work with Google closely to get data from the lakehouse joint customers doing that," Xin noted. "So it's not just a competitive situation. I mean, we compete with all the CSPs at the same time."

Under Databricks' ownership, Databricks is now also competing aggressively on price. Before the acquisition, the cheapest paid monthly Neon service was $25, which has now dropped dramatically, down to only $5.

"We did the opposite of what maybe a lot of people thought we would do, which is after the acquisition, we lowered the price," Xin said.

What this means for enterprises

For organizations managing thousands of operational PostgreSQL databases alongside data lakes, the immediate impact is clear. Development teams won't need to wait for data engineering to build and maintain pipelines before accessing operational data for analytics or AI workloads.

For data platform teams, this means rethinking infrastructure management from the ground up.

Instead of maintaining complex pipeline orchestration between operational and analytical systems, teams can focus on governance, access controls and workload optimization across a unified platform. That shift frees up engineering resources while reducing the surface area for data quality issues and pipeline failures. For enterprises building agent-driven applications, the unified architecture removes the dependency on data engineering as a prerequisite for launching new workloads.

“I think what agents expect now is the ability to iterate very quickly, and then the infrastructure should give agents fairly uniform access to data,” Shamgunov said.

The agent infrastructure gap

Beyond the cake, the Moonlink architecture advantage

How Databricks stacks up against other PostgreSQL providers

What this means for enterprises

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签