Second Brain: Crafted, Curated, Connected, Compounded on 10月02日
Databricks Lakebase:湖屋的托管PostgreSQL OLTP引擎
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Databricks Lakebase 是在 Databricks 数据智能平台中运行的完全托管式 PostgreSQL OLTP 引擎。用户可以将其作为数据库实例进行配置,获得 PostgreSQL 语义,如行级事务、索引和 JDBC/psql 访问,同时存储和扩展由 Databricks 管理。它支持标准驱动程序、psql 和扩展计划,并可通过 Unity Catalog 角色和权限进行统一治理。Lakebase 可与 Lakehouse 集成,用于特征工程和交付、SQL 数据仓库、Databricks 应用程序和 RAG 管道,并具有弹性扩展功能,允许用户在不重新导入数据的情况下增加读写吞吐量。

📊 Databricks Lakebase 是一个完全托管的 PostgreSQL OLTP 引擎,集成在 Databricks 数据智能平台中,提供行级事务、索引和 JDBC/psql 访问等 PostgreSQL 语义,同时由 Databricks 负责存储和扩展的管理。

🔧 用户可以将其作为数据库实例进行配置,支持标准驱动程序、psql 和扩展计划,并通过 Unity Catalog 角色和权限进行统一治理,确保数据安全和权限控制。

🔗 Lakebase 与 Lakehouse 深度集成,可用于特征工程和交付、SQL 数据仓库、Databricks 应用程序和 RAG 管道,实现数据在不同工作负载之间的无缝流动和共享。

📈 它具有弹性扩展功能,允许用户在不重新导入数据的情况下增加读写吞吐量,支持从单一实例到多实例的平滑扩展,满足不同规模应用的需求。

🚀 Lakebase 是 Databricks 闭源平台的一部分,用户无需关心底层计算引擎的具体细节,只需关注数据和应用逻辑,即可快速构建和部署数据密集型应用。

Announced at the Databricks Data & AI Summit 2025 on the 2025-06-11.

Essentially an OLTP (Postgres) for the Lakehouse, powered by [Neon]. Although it might only be the compute, a fully managed Postgres, nothing based on Object storage.

It’s just a managed, scalable Postgres inside the Databricks environment.

This gonna get better, or is already the result of recent acquisition of Neon. See Data Engineering Acquisitions.

“Databricks Lakebase is a fully-managed PostgreSQL OLTP engine that lives inside the Databricks Data Intelligence Platform. You provision it as a database instance (a new compute type) and get Postgres semantics—row-level transactions, indexes, JDBC/psql access—while the storage and scaling are handled for you by Databricks.” - docs


Image from Lakebase | Databricks

# Architecture

https://www.ssp.sh/brain/img_Lakebase_1750251162045.webp1750251162045.webp">
Data + AI Summit 2025 - Keynote Recap - YouTube

# Key Capabilities

From Daniel Beach on Lakebase from Databricks. - by Daniel Beach:

    Postgres–compatible: standard drivers, psql, extensions roadmap.Managed change-data-capture into Delta Lake so OLTP data stays in sync with BI models.Unified governance via Unity Catalog roles & privileges.Lakehouse hooks: can feed Feature Engineering & Serving, SQL Warehouses, Databricks Apps, and RAG pipelines out of the same rows. docs.databricks.comElastic scale: separate storage and compute lets you grow read/write throughput without dumping and re-importing data.

#
Source
Lakebase from Databricks. - by Daniel Beach

This solution is only a solution for Databricks, and does not matter if it’s Postgres, Spark, Photon Engine, or Modern OLAP Systems, or anything else, because as a user of Databricks, using their UI or clusters, it doesn’t look different. It’s the Compute and Storage Separation.

It’s only makes a different, if you’d self host your SQL Query Engine, but that’s not what Lakebase is. It’s a close-source Postgres as far as I can see. It’s an abstraction to have less complexity for Lakehouses, but more for Databricks.

It really goes to show how Declarative Data Stacks are the future by abstraction complexity away. The Databricks lakehouse is a a declarative data stack as well, but a closed-source one. And as we learned, with decalarative data stacks we can exchange the compute, the engine. And Lakebase is just another compute for your Lakehouse if you will, that is much less complex, as it’s on Postgres.

Actually Daniel Beach agrees with me on that one:

It’s clear that Databricks, as per normal, has integrated this well into their Platform. Right now a “Databricks instance” is just a new type of compute you can select.

Essentially a managed Postgres with data retention, high availability (HA) and other features out of the box.

# In the End

It’s just a managed Postgres as part of Databricks? Is it different from running Postgres service on Azure?

I guess the users of Databricks don’t know/mind which compute they use (Photon, Spark, Postgres), as long it’s cheap and fast :)

But I’m not sure why they didn’t call it “Managed Postgres”.

It’s not open-source besides they are saying to focus on “Openness”. Yes, Postgres is open-source, but running it and integrating it is proprietary to Databricks—therefore, a “manage Postgres”. Am I missing something? Bsky

What am I missing?

Please let me know what I’m missing, this is what I have observed at first glance.

# Limitations

As of 2025-06-13 as per Databricks:

    A workspace allows a maximum of ten instances.Each instance supports up to 1000 concurrent connections.The logical size limit across all databases in an instance is 2 TB.Database instances are scoped to a single workspace. Users are able to see these tables in Catalog Explorer if they have the required Unity Catalog permissions from other workspaces attached to the same metastore, but they cannot access the table contents.

# Integration

# Python

 1 2 3 4 5 6 7 8 910111213141516171819202122232425
import psycopg2from databricks.sdk import WorkspaceClientimport uuidw = WorkspaceClient()instance = w.database.get_database_instance(name=instance_name)cred = w.database.generate_database_credential(request_id=str(uuid.uuid4()), instance_names=[instance_name])# Connection parametersconn = psycopg2.connect(    host = instance.read_write_dns,    dbname = "databricks_postgres",    user = "<YOUR USER>",    password = cred.token,    sslmode = "require")# Execute querywith conn.cursor() as cur:    cur.execute("SELECT version()")    version = cur.fetchone()[0]    print(version)conn.close()

Source

# References

It sounds similar to DuckLake recently announced, where they used a relational database, like Postgres, to manage metadata of the lake and catalog. But looking at it more, it really isn’t.


Origin: Lakebase from Databricks. - by Daniel Beach
References:
Created 2025-06-13

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Databricks Lakebase 托管PostgreSQL OLTP引擎 湖屋 数据智能平台
相关文章