Second Brain: Crafted, Curated, Connected, Compounded on 10月02日
数据网格:弥合数据孤岛,赋能领域专家
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

数据网格(Data Mesh)是一种新兴的数据管理理念,旨在解决组织内数据孤岛和团队间数据协作的挑战。它通过将数据所有权和管理权下放给领域专家,鼓励去中心化的数据所有权,同时通过统一的通用基础设施实现互联互通。这种方法促进了跨团队的数据共享和理解,优化了组织工作流程,并为数据管理提供了更灵活、可扩展的解决方案。数据网格的起源可以追溯到2019年,并已被多家知名科技公司采纳和实践。

📊 **去中心化数据所有权与领域驱动**:数据网格的核心在于将数据的所有权和管理权交还给产生数据的业务领域专家。这打破了传统集中式数据团队的瓶颈,使各领域能够独立、高效地管理和提供其数据产品,确保数据与业务的紧密结合。

🔗 **互联互通的基础设施与数据产品**:虽然提倡去中心化,但数据网格并非混乱无序。它依赖于一个统一的、通用的基础设施层,支持数据产品的发现、访问和治理。数据产品被视为组织内的标准交付单元,确保了数据的一致性和可信度。

💡 **弥合团队鸿沟,促进协作**:数据网格旨在解决不同数据团队之间的隔阂,通过建立清晰的数据接口和共享模型,促进组织内对数据的共同理解和有效使用。这种协作模式优化了整体数据工作流程,提升了数据价值的实现效率。

🚀 **演进与实践**:数据网格的概念由Zhamak Dehghani于2019年提出,并在后续几年不断完善。如今,它已被Netflix、Intuit等众多公司采纳,成为应对复杂数据挑战的一种重要方法论,并且还在不断演进,例如与数据网格(Data Fabric)等概念的比较和融合。

In the evolving landscape of data management, Data Meshes emerge as a crucial concept, trying to bridge the gaps between isolated data teams. Their core value lies in fostering a shared understanding and usage of data across diverse teams within an organization.

By effectively interlinking platforms, Data Meshes facilitate seamless data transfer, enhancing the organizational workflow. This approach combats the typical disconnect in data handling, offering a solution that balances decentralized resources with a unifying common infrastructure. It empowers domain experts, granting them ownership and control over their data domains.

For deeper insights, consider exploring the foundational paper on this topic, the succinct explanation in What the Heck is a Data Mesh?! or the visually engaging Data Mesh Architecture. Practical applications and perspectives are well-articulated in Data Mesh in Practice.


A nice of the problem of a central data platform team | Share on LinkedIn by Ole Olesen-Bagneux

# History

The term data mesh was first defined by Zhamak Dehghani in 2019 while she was working as a principal consultant at the technology company Thoughtworks. Dehghani introduced the term in 2019 and then provided greater detail on its principles and logical architecture throughout 2020. The process was predicted to be a “big contender” for companies in 2022. Data meshes have been implemented by companies such as Zalando, Netflix, Intuit, VistaPrint, PayPal and others.

In 2022, Dehghani left Thoughtworks to found Nextdata Technologies to focus on decentralized data. Source

# Fleeting Thoughts

From the Netflix Technology blog:

I agree that naming is confusing, but it’s just an unfortunate timing: the development of “Data Mesh” (DM) platform at Netflix started around the same time, Zhamak Dehghani an first defined the term in the “Beyond the lake” talk in 2018. As we state, “we define Data Mesh as a general purpose data movement and processing platform for moving data between Netflix systems at scale”, nothing more, nothing less. RW Data Mesh — A Data Movement and Processing Platform @ Netflix by Netflix Technology Blog Netflix TechBlog

From Reddit - Dive into anything:

    “Data Mesh: a Data Warehouse that has surpassed Dunbar’s number.”“Data Lake: a collection of data that has reached its lowest point.”

More on Why data pipeline should not be outside of data product.

# Demystifying Data Mesh

    Is Data Mesh essential for everyone? Probably not. For a critical perspective, see Behind the Hype: Why Data Mesh Is Not Right For You - YouTube.
    Tweet
      Another one by Max: “I’m with you. It feels like a bit of everything trendy served in a cocktail of buzzwords. I read Kimball and Inmon 2 decades ago, been a practitioner at many very modern data forward companies (FB, Airbnb, Lyft) , and I feel very disoriented reading about data mesh.
        I don’t think its because I don’t get it, but because putting a label on the distributed chaos going on in these organizations doesn’t make it an intelligible concept. Tweet
Alternatives to Data Mesh include Software-Defined Asset by Dagster.For a comprehensive overview, refer to Data Mesh Architecture and various other perspectives listed below.The concept has its skeptics. For instance, a comment on Twitter expresses confusion and doubt about labeling the chaotic, distributed data environments in modern organizations.The most practical insights into Data Mesh might be found in Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake - YouTube, though it may not fully capture the claimed paradigm shift.
    It falls very short of the paradigm shift data mesh claims to be. Infra described there seemed median-ish on the data maturity curve at best compared to 100+ companies I connected within the context of my work on Airflow & Superset
Matthew Darwin’s insights on the Firebolt approach to architectures in Firebolt and Data Mesh | Firebolt are also noteworthy.

# Data Mesh is just a Microservice?

Is Data Mesh an adaptation of the Microservices architectural principles applied to data management?

Microservices vs Data Mesh

# Additional Resources

# Data Fabric

Gartner calls it Data Fabric due to their reasons to not call it Data Mesh.

More also on the differences on Gartner: Data fabric and data mesh: same or different:

    The total cost to deliver either one may ultimately be similar relative to design and deployment. However, the more augmented data management capabilities included in a Data Fabric improve the cost model for ongoing improvement and maintenance.Data Mesh and Data Fabric benefit from one another, either adapting to or leveraging best practices.Both Data Fabric and Data Mesh materialized from mature data management practices and are based on over 50 years of data management technology advances.

Ole Olesen-Bagneux presented the Meta Grid as the 3rd wave of decentralization - the first being microservices, and the 2nd being data mesh.


References: RW How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh Reverse ETL
Last Modified: 2021-10-28
Created 2021-10-28

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

数据网格 Data Mesh 数据管理 去中心化 领域驱动 数据产品 数据治理 数据架构
相关文章