Second Brain: Crafted, Curated, Connected, Compounded on 10月02日
事务日志在现代开放表格式中的作用
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

事务日志是现代开放表格式(如Delta Lake、Apache Hudi和Apache Iceberg)的基石,支持ACID事务、元数据管理、时间旅行功能和数据湖的并发操作。本文探讨了这些格式中事务日志的实现方式,包括Delta Lake的JSON日志文件和Parquet检查点、Apache Iceberg的分层架构和快照机制,以及Apache Hudi的时间线架构和文件组织方法。这些日志确保了数据的一致性和可靠性,同时提供了灵活的更新和查询优化功能。

📝 事务日志作为操作记录:事务日志按顺序记录自表创建以来的所有操作,是表状态和历史记录的单一事实来源。

🔒 ACID事务支持:通过确保原子性、一致性、隔离性和持久性,事务日志使ACID事务成为可能,保护数据免受并发操作的影响。

📊 元数据管理:事务日志跟踪模式、分区和文件信息,提供元数据管理系统,支持表结构的演变和查询优化。

⏳ 时间旅行功能:通过记录表状态的历史快照,事务日志使查询历史表状态成为可能,为数据恢复和分析提供支持。

🔄 并发控制:事务日志通过乐观并发控制机制管理多个并发读写操作,确保数据一致性和系统性能。

Transaction logs are the fundamental backbone of modern Open Table Formats, enabling ACID transactions, metadata management, time travel capabilities, and concurrent operations on Data Lakes. This note explores how transaction logs are implemented across three major open table formats: Delta Lake, Apache Hudi, and Apache Iceberg.

# Core Concepts of Transaction Logs

A transaction log is an ordered record of all operations performed on a table since its creation. It serves as:

    Single Source of Truth: The definitive record of a table’s state and historyACID Transaction Enabler: Ensures atomicity, consistency, isolation, and durabilityMetadata Management System: Tracks schema, partitioning, and file informationConcurrency Controller: Manages multiple simultaneous reads and writesTime Travel Facilitator: Enables querying historical table states

# Examples

# Delta Lake Transaction Log

Delta Lake Transaction Log Structure:

 1 2 3 4 5 6 7 8 91011121314
my_table/├── _delta_log/            # Transaction log directory│   ├── 00000000000000000000.json  # First commit│   ├── 00000000000000000001.json  # Second commit│   ├── 00000000000000000002.json  # Third commit│   ├── ...│   ├── 00000000000000000010.checkpoint.parquet  # Checkpoint file (every 10 commits)│   └── ...├── date=2019-01-01/       # Optional partition directories│   └── file-1.parquet     # Data files└── ...

See detail deep dive in: Transaction Log (Delta Lake).

# Apache Iceberg Transaction Log

Structure and Implementation

    Layered Architecture: Comprises catalog layer, metadata layer, and data layerMetadata Files: Store global table metadata (schema, partitioning, properties)Snapshots: Represent table state at specific points in timeManifest Files: Track data files, including their locations, sizes, and statisticsAtomic Swaps: Table state updates create new metadata files replaced via atomic swaps

Key Functions

    Catalog Operations: Atomic operations at the catalog level ensure transaction correctnessOptimistic Concurrency: Uses sequence numbers to maintain consistency with concurrent transactionsMetadata Logging: Records history of metadata changes for rollback capabilitiesSchema Evolution: Supports schema changes without table rewrites

Apache Iceberg Transaction Log Structure:

 1 2 3 4 5 6 7 8 91011121314
my_table/├── metadata/              # Metadata directory│   ├── version-hint.text  # Points to latest metadata file│   ├── v1.metadata.json   # First version metadata file│   ├── v2.metadata.json   # Second version metadata file│   ├── snap-<uuid>.avro   # Manifest list for first snapshot│   ├── snap-<uuid>.avro   # Manifest list for second snapshot│   └── <uuid>.avro        # Manifest file with data file details├── data/                  # Data files directory│   └── <uuid>.parquet     # Actual data file└── ...

More details:

# Apache Hudi Transaction Log

Structure and Implementation

    Timeline-Based Architecture: Organizes transactions as actions along a timelineFile Organization: Uses directory-based approach with timestamped files and log files tracking changesMetadata Table: Tracks file information for query optimization (default since v0.11.0)Commit Files: Uses files with naming convention [timestamp].[transaction state] to track transaction states

Key Functions:

    Record-Level Index: Maintains mapping between record keys and file groupsOptimistic Concurrency: File-level, log-based concurrency control based on instant timesAsynchronous Operations: Supports background operations like compaction without blocking ingestionCopy-on-Write vs. Merge-on-Read: Offers two table types with different performance characteristics

Apache Hudi Transaction Log Structure:

 1 2 3 4 5 6 7 8 91011121314151617181920212223
my_table/├── .hoodie/               # Metadata directory│   ├── hoodie.properties  # Table configuration│   ├── 20230101120000.commit        # Commit metadata (successful)│   ├── 20230101130000.commit.requested  # Transaction state: requested│   ├── 20230101130000.commit.inflight   # Transaction state: in progress│   ├── 20230101140000.deltacommit      # Delta commit for MOR tables│   ├── 20230101150000.rollback         # Failed transaction rollback│   ├── 20230101160000.clean            # Cleaning operation│   ├── 20230101170000.compaction       # Compaction operation│   ├── metadata/          # Metadata table (since v0.11.0)│   ├── aux/               # Auxiliary files│   └── .heartbeat/        # Heartbeat management├── partition=value/       # Partition directories│   ├── file1_v1.parquet   # Base file (COW table)│   ├── file1_v2.parquet   # Updated base file after update│   ├── file2.parquet      # Another base file│   ├── file2.log.1        # Delta log file (MOR table)│   └── file2.log.2        # Another delta log file└── ...

More details:

    Apache Hudi Concepts - Official documentation explaining Hudi’s timeline, file organization, and table structure.

# DuckLake Transaction Table

Since a short while, we also have DuckLake. This stores the metadata in a SQL database instead of metadata on disk.

# DuckLake Table Structure

The data model as of 2025-06-05 looks like this:

Check out more at DuckLake.

# Comparison of Transaction Log Approaches

Feature Delta Lake Apache Hudi Apache Iceberg
Concurrency Control Optimistic concurrency control with mutual exclusion and retry mechanism File-level, log-based concurrency control ordered by start instant times Sequence number-based optimistic concurrency control
Metadata Management JSON log files with Parquet checkpoints every 10 commits Timeline-based approach with metadata table for query optimization Layered approach with catalog pointing to metadata files
Update Handling Breaks operations into atomic commits recorded sequentially Offers Copy-on-Write and Merge-on-Read approaches for different performance needs Supports eager data file rewrites or delete deltas for faster updates
Performance Characteristics Efficient for append-heavy workloads with Spark integration Excels at update-heavy workloads with upserts and record-level indexing Offers strong query performance with optimized metadata handling
Time Travel Supports via transaction log processing Supports via timeline-based snapshots Supports via versioned metadata and snapshots
Origins Developed by Databricks Developed by Uber Developed by Netflix
Primary Integration Apache Spark Multiple engines with Spark, Flink, and Hive focus Multi-engine with strong Spark, Flink, Trino support
Schema Evolution Supported with column additions/deletions Supported with schema tracking Extensive support with in-place evolution

See also more on Open Table Formats, The Open Table Format Revolution, and Composable Open Data Platform: How Iceberg and Open Table Formats Are Reshaping Data Architecture.


Origin: Data Lake Table Format
References:
Created 2025-04-29

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

事务日志 开放表格式 Delta Lake Apache Hudi Apache Iceberg ACID事务 元数据管理 时间旅行
相关文章