A data vault is a Data Modeling approach used to build a data warehouse for enterprise-scale analytics. The data vault has three types of entities: hubs, links, and satellites.
Hubs represent core business concepts, links represent relationships between hubs, and satellites store information about hubs and the relationships between them.
# Features
The Data Vault methodology represents a dynamic and flexible approach to managing Big Data and evolving data connection points in your Data Warehouse. Recently, there has been a significant shift towards using Data Vaults as governed Data Lakes. This shift addresses the key challenges we’ve identified in Data Warehousing:
- Adapting to changing business environmentsHandling massive data setsReducing the complexities of Data Warehouse designEnhancing accessibility for business users by modeling close to the business domainAllowing seamless integration of new data sources without affecting the existing architecture
This method is proving to be highly effective and efficient, facilitating easier design, build, population, and modification of Data Warehouses. This is where Data Warehouse Automation can be particularly beneficial.
# Why Data Vault 2.0?
Data Vault 2.0 is the prescriptive, industry-standard methodology for turning raw data into actionable intelligence, leading to tangible business outcomes. Follow our proactive, proven recipe and transform your raw data into information that will allow you to produce the results that your business finds most valuable.
Video about “Behind the Hype: Should you ever build a Data Vault in a Lakehouse?”
Write-optimized approach (opposed to snowflake for querying) Video Lin
# When to Use
- Managing numerous disparate data sourcesAccommodating frequent schema changes (DDL) in source OLTP databases
# Layers
- ? Lanzing Zone (LZN)Raw Data Vault (RDV)Business Data Vault (BDV)Universal Data Model (UDM)
# Difference between 1.0 and 2.0
Data Vault 1.0, introduced by Dan Linstedt in the early 2000s, established the core principles:
- Hub, Link, and Satellite structureBusiness keys in HubsRelationships captured in LinksDescriptive data in SatellitesFocus on historical tracking and auditability
Data Vault 2.0, released around 2013, built upon 1.0 by adding:
- Integration with big data platforms and NoSQL databasesSupport for unstructured and semi-structured dataAdvanced hash key implementation for performanceMore emphasis on parallel loading and scalabilityIncorporation of virtualization conceptsMethodologies for handling real-time data streamsIntroduction of point-in-time and bridge tables as first-class citizensMore formal governance and documentation requirements
# Dan Linstedt vs. Hans Hultgren:
Dan Linstedt is the original creator of the Data Vault methodology. His approach tends to be more focused on:
- Technical implementation detailsPerformance optimizationStrict adherence to core Data Vault principlesEnterprise scalabilityIntegration with modern data platforms
Hans Hultgren has been a significant contributor to Data Vault evolution, with his approach emphasizing:
- Business alignment and modeling practicesPractical implementation guidanceMore flexible interpretation of some Data Vault rulesFocus on teaching and making concepts accessibleIntegration with agile methodologies
# Raw vs. Business Vault
Raw Vault is the first layer where data is loaded from source systems, following strict Data Vault modeling principles:
- It maintains full history and auditability of source dataData is stored in its original form without business transformationsUses Hubs (unique business keys), Links (relationships), and Satellites (descriptive attributes)Focuses on capturing and preserving source data exactly as received
Business Vault serves as a transformation layer that:
- Can be a Logical Data Model, not physical database objectsContains derived business rules and calculationsImplements data quality rules and business definitionsMay combine data from multiple Raw Vault entitiesCreates business-friendly views and structuresCan include Point-in-Time (PIT) and Bridge tables for easier queryingSometimes implements slowly changing dimensions (SCD) logic
Origin: Data Modeling Techniques
References: Dimensional Modeling
