Databricks Lakebase：湖屋的托管PostgreSQL OLTP引擎

Announced at the Databricks Data & AI Summit 2025 on the 2025-06-11.

Essentially an OLTP (Postgres) for the Lakehouse, powered by [Neon]. Although it might only be the compute, a fully managed Postgres, nothing based on Object storage.

It’s just a managed, scalable Postgres inside the Databricks environment.

This gonna get better, or is already the result of recent acquisition of Neon. See Data Engineering Acquisitions.

“Databricks Lakebase is a fully-managed PostgreSQL OLTP engine that lives inside the Databricks Data Intelligence Platform. You provision it as a database instance (a new compute type) and get Postgres semantics—row-level transactions, indexes, JDBC/psql access—while the storage and scaling are handled for you by Databricks.” - docs

Image from Lakebase | Databricks

# Architecture

https://www.ssp.sh/brain/img_Lakebase_1750251162045.webp1750251162045.webp">
Data + AI Summit 2025 - Keynote Recap - YouTube

# Key Capabilities

From Daniel Beach on Lakebase from Databricks. - by Daniel Beach:

Postgres–compatible: standard drivers, psql, extensions roadmap.Managed change-data-capture into Delta Lake so OLTP data stays in sync with BI models.Unified governance via Unity Catalog roles & privileges.Lakehouse hooks: can feed Feature Engineering & Serving, SQL Warehouses, Databricks Apps, and RAG pipelines out of the same rows. docs.databricks.comElastic scale: separate storage and compute lets you grow read/write throughput without dumping and re-importing data.

#
Source Lakebase from Databricks. - by Daniel Beach

This solution is only a solution for Databricks, and does not matter if it’s Postgres, Spark, Photon Engine, or Modern OLAP Systems, or anything else, because as a user of Databricks, using their UI or clusters, it doesn’t look different. It’s the Compute and Storage Separation.

It’s only makes a different, if you’d self host your SQL Query Engine, but that’s not what Lakebase is. It’s a close-source Postgres as far as I can see. It’s an abstraction to have less complexity for Lakehouses, but more for Databricks.

It really goes to show how Declarative Data Stacks are the future by abstraction complexity away. The Databricks lakehouse is a a declarative data stack as well, but a closed-source one. And as we learned, with decalarative data stacks we can exchange the compute, the engine. And Lakebase is just another compute for your Lakehouse if you will, that is much less complex, as it’s on Postgres.

Actually Daniel Beach agrees with me on that one:

It’s clear that Databricks, as per normal, has integrated this well into their Platform. Right now a “Databricks instance” is just a new type of compute you can select.

Essentially a managed Postgres with data retention, high availability (HA) and other features out of the box.

# In the End

It’s just a managed Postgres as part of Databricks? Is it different from running Postgres service on Azure?

I guess the users of Databricks don’t know/mind which compute they use (Photon, Spark, Postgres), as long it’s cheap and fast :)

But I’m not sure why they didn’t call it “Managed Postgres”.

It’s not open-source besides they are saying to focus on “Openness”. Yes, Postgres is open-source, but running it and integrating it is proprietary to Databricks—therefore, a “manage Postgres”. Am I missing something? Bsky

What am I missing?
Please let me know what I’m missing, this is what I have observed at first glance.

# Limitations

As of 2025-06-13 as per Databricks:

A workspace allows a maximum of ten instances.Each instance supports up to 1000 concurrent connections.The logical size limit across all databases in an instance is 2 TB.Database instances are scoped to a single workspace. Users are able to see these tables in Catalog Explorer if they have the required Unity Catalog permissions from other workspaces attached to the same metastore, but they cannot access the table contents.

# Integration

# Python

 1 2 3 4 5 6 7 8 910111213141516171819202122232425import psycopg2from databricks.sdk import WorkspaceClientimport uuidw = WorkspaceClient()instance = w.database.get_database_instance(name=instance_name)cred = w.database.generate_database_credential(request_id=str(uuid.uuid4()), instance_names=[instance_name])# Connection parametersconn = psycopg2.connect(    host = instance.read_write_dns,    dbname = "databricks_postgres",    user = "<YOUR USER>",    password = cred.token,    sslmode = "require")# Execute querywith conn.cursor() as cur:    cur.execute("SELECT version()")    version = cur.fetchone()[0]    print(version)conn.close()

Source

# References

It sounds similar to DuckLake recently announced, where they used a relational database, like Postgres, to manage metadata of the lake and catalog. But looking at it more, it really isn’t.

Origin: Lakebase from Databricks. - by Daniel Beach
References:
Created 2025-06-13

# Architecture

# Key Capabilities

# In the End

# Limitations

# Integration

# Python

# References

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签