Second Brain: Crafted, Curated, Connected, Compounded on 10月02日 21:18
数据仓库建模:Inmon与Kimball方法的对比
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文深入探讨了两种主流的数据仓库建模方法:Bill Inmon的企业级数据仓库(EDW)和Ralph Kimball的维度建模(Star Schema)。Inmon方法强调构建一个高度规范化的企业级数据模型,以消除数据冗余,随后创建面向特定部门的数据集市。而Kimball方法则侧重于业务流程,通过构建星型模式(Star Schema)的数据集市,以支持分析查询的灵活性和易用性。文章还提及了Data Vault和One Big Table等其他建模技术,并分析了两种主要方法在逻辑设计、物理实现、团队协作和性能优化等方面的异同,为选择最适合的数据仓库架构提供了参考。

⭐ **Inmon的企业级数据仓库(EDW)方法**:此方法以构建一个高度规范化的企业级数据模型为核心,旨在最大程度地减少数据冗余,确保数据的集成性和一致性。在此基础上,再针对特定业务部门的需求创建面向分析的数据集市(Data Marts),以方便用户访问和查询。

⭐ **Kimball的维度建模(Star Schema)方法**:该方法将重点放在识别和支持关键的业务流程和分析需求上。它通过构建星型模式(Star Schema)来组织数据,这种模式由一个中心事实表(Fact Table)和围绕它的多个维度表(Dimension Tables)组成,非常适合用于OLAP(联机分析处理)查询,易于理解和使用。

⭐ **两种方法的对比与融合**:文章比较了Inmon和Kimball方法在数据规范化程度、数据冗余、设计侧重点(企业级集成 vs. 业务流程支持)、实现复杂性以及对用户易用性等方面的差异。同时,也指出可以借鉴两者的优点,例如在数据仓库蓝图(Data Warehouse Blueprints)中提出的混合方法,结合Inmon的集中式集成和Kimball的维度化数据集市。

⭐ **其他建模技术**:除了Inmon和Kimball,文章还简要提及了Data Vault建模和One Big Table(OBT)等其他数据仓库架构和建模技术,为读者提供了更广泛的参考视野,了解不同的数据组织和管理策略。

⭐ **核心原则与实际应用**:无论采用哪种方法,都强调了数据建模需要考虑数据的一致性(conformed dimensions)、可扩展性(adding new data without breaking existing reports)、可维护性以及用户查询的灵活性。Kimball的方法尤其注重通过维度建模来简化分析,并确保跨业务部门的数据一致性。

    Emphasizes a highly normalized structure. Begins directly with the Core, which remains normalized and closely aligned with the source system.Denormalization occurs in the Data Marts, adhering more to a Snowflake Schema.
    Initiates with a staging area, where the Core is already denormalized.
      Here, the data mart effectively becomes the data warehouse.
    Known as a bottom-up approach, it focuses on creating a Star Schema.Creator of Dimensional Modeling.

Although both methodologies have nuanced differences, they ultimately share many similarities. Personally, I lean towards Kimball’s approach. It allows for staging data and then normalizing from that point, avoiding the replication of the OLTP source system’s structure beyond the staging phase.

Drawing from the early Trivaids best practice architecture (as detailed in Data Warehouse Blueprints- Business Intelligence in der Praxis (Dani Schnider, Claus Jordan, Peter Welker, Joachim Wehner)), I’ve found that leveraging the strengths of both Inmon and Kimball’s methods can be highly effective.

# Core of Kimball according to People from the Field

The “core” of Kimball from Josh Andrews:

    Users will need to query data in any number of different ways, but fully normalized data models are way too complex for non-developers to deal with. So we need a simpler way to model data for analytics.2We need to think about dimensions in a conformed way. if you are slicing a bunch of facts by, say, customer, it needs to be the same, common list of customers no matter how many different things you slice it by. Otherwise users across the business eventually start getting inconsistent results and then everybody loses faith in the accuracy of the data in the warehouse. And no, denormalizing all of your dimensions down into every fact is not a good solution when you have 30 facts and 30 dimensions in a DW of respectable size.We need to model data in such a way that when we add new data it doesn’t break existing reports and analytics. This means making sure we think thoroughly about the grain of all things ahead of time. If we change the existing columns or the grain, especially the grain, it’s going to break a lot of existing stuff. Well, maybe you can change things if you work in a small shop, but in an Enterprise Data Warehouse with hundreds or thousands of users, getting everybody to change their existing analytics takes literally years.But, after #3 - we are also going to constantly find new data and reports. So we have to make sure that we have modeled in a way that allows us to add new data.We need a consistent conceptual way to get people to think about events we analyze (facts) versus the things we are analyzing them by so that we aren’t reinventing the wheel every time we go to build data models. I can tell everytime I see a DW or analytics solution built by somebody who doesn’t understand this and doesn’t follow some approach (whether Kimball or Inmon) because it’s invariably some crazy denormalization approach that is internally inconsistent and impossible to explain in an organized way. Or even worse they’ve just imported all the fully normalized datasets from the source and point the reports right at the source tables.Users will invariably want to use multiple different BI and query tools to access the same dataset. And you can try to herd them in a certain direction (like Looker) but you’ll never get them all. So the DW should be built so that, to the best extent possible, users can still query with the tool of their choice. Which means not relying on BI tools, like Looker, to perform major further transformations of the data in the BI layer. Minor transformations, documentation, labels, etc.? Sure.Database performance is much better now than it was when Kimball was written , but Kimball’s model of conformed dimensional design was not primarily concerned about database performance. I will admit it is helpful in managing performance and he discusses performance and storage space, but that was a secondary benefit and regardless of that the primary use cases for it are still relevant.

Follow up from Justin Lane on Slack:

I find it useful to recognise the difference between logical and physical data modelling.  Developing a Kimball model can be very beneficial to help organise thinking/design activity around the concepts of dimensions and facts.  This is especially useful when you have a team of people working on the development/maintenance of your warehouse.  The logical model can help to establish a shared understanding and facilitate discussion. Separating dimensions from facts helps to achieve this.  The physical implementation can be optimised for the target platform(s) and you may end up with flat/wide/denormalised structures.  These are (should be) the physical implementation of your logical model.

And follow up by Kosti Hokkanen here:

see also Classical Architecture of Data Warehouse

# Comparative Insights:

    Key comparisons include:
      Enterprise Data Warehouse Modeling (Inmon) vs Star Schema (Kimball)Data Vault ModelingOne Big Table (OBT)
    Enterprise Data Warehouse Modeling as Bill Inmon styleStar Schema like Ralph Kimballdata vault modeling andOne Big Table
    Source, Orignally youtu.be/IdCmMkQLvGA?

    Highlight from RW Difference Between Kimball and Inmon:

      Kimball: Introduced by Ralph Kimball, this approach prioritizes identifying business processes and the pertinent questions a Data Warehouse must address. Documentation and analysis of these elements precede data consolidation from various sources (data marts) into a staging area. The data is then transformed into an OLAP cube via ETL processes.Inmon: Bill Inmon’s method begins with a comprehensive corporate data model, pinpointing critical areas like customers, products, and vendors. This model informs the creation of a detailed logical model, which then leads to a physically normalized model with minimized data redundancy. To make this complex model more accessible for business applications, department-specific data marts are developed.

    # Another comparison

    Chapter 2 of Adamson’s Star Schema from


    Origin:
    References:
    Snowflake vs- Star-Schema auf Core
    Created 2023-03-23

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

数据仓库 建模 Inmon Kimball Star Schema Dimensional Modeling Data Warehouse Modeling EDW Data Marts Data Vault One Big Table
相关文章