事务日志在现代开放表格式中的作用

Transaction logs are the fundamental backbone of modern Open Table Formats, enabling ACID transactions, metadata management, time travel capabilities, and concurrent operations on Data Lakes. This note explores how transaction logs are implemented across three major open table formats: Delta Lake, Apache Hudi, and Apache Iceberg.

# Core Concepts of Transaction Logs

A transaction log is an ordered record of all operations performed on a table since its creation. It serves as:

Single Source of Truth

ACID Transaction Enabler

Metadata Management System

Concurrency Controller

Time Travel Facilitator

# Examples

# Delta Lake Transaction Log

Delta Lake Transaction Log Structure:

 1 2 3 4 5 6 7 8 91011121314my_table/│├── _delta_log/            # Transaction log directory│   ├── 00000000000000000000.json  # First commit│   ├── 00000000000000000001.json  # Second commit│   ├── 00000000000000000002.json  # Third commit│   ├── ...│   ├── 00000000000000000010.checkpoint.parquet  # Checkpoint file (every 10 commits)│   └── ...│├── date=2019-01-01/       # Optional partition directories│   └── file-1.parquet     # Data files│└── ...

See detail deep dive in: Transaction Log (Delta Lake).

# Apache Iceberg Transaction Log

Structure and Implementation

Layered Architecture

Metadata Files

Snapshots

Manifest Files

Atomic Swaps

Key Functions

Catalog Operations

Optimistic Concurrency

Metadata Logging

Schema Evolution

Apache Iceberg Transaction Log Structure:

 1 2 3 4 5 6 7 8 91011121314my_table/│├── metadata/              # Metadata directory│   ├── version-hint.text  # Points to latest metadata file│   ├── v1.metadata.json   # First version metadata file│   ├── v2.metadata.json   # Second version metadata file│   ├── snap-<uuid>.avro   # Manifest list for first snapshot│   ├── snap-<uuid>.avro   # Manifest list for second snapshot│   └── <uuid>.avro        # Manifest file with data file details│├── data/                  # Data files directory│   └── <uuid>.parquet     # Actual data file│└── ...

More details:

# Apache Hudi Transaction Log

Structure and Implementation

Timeline-Based Architecture

File Organization

Metadata Table

Commit Files

Key Functions:

Record-Level Index

Optimistic Concurrency

Asynchronous Operations

Copy-on-Write vs. Merge-on-Read

Apache Hudi Transaction Log Structure:

 1 2 3 4 5 6 7 8 91011121314151617181920212223my_table/│├── .hoodie/               # Metadata directory│   ├── hoodie.properties  # Table configuration│   ├── 20230101120000.commit        # Commit metadata (successful)│   ├── 20230101130000.commit.requested  # Transaction state: requested│   ├── 20230101130000.commit.inflight   # Transaction state: in progress│   ├── 20230101140000.deltacommit      # Delta commit for MOR tables│   ├── 20230101150000.rollback         # Failed transaction rollback│   ├── 20230101160000.clean            # Cleaning operation│   ├── 20230101170000.compaction       # Compaction operation│   ├── metadata/          # Metadata table (since v0.11.0)│   ├── aux/               # Auxiliary files│   └── .heartbeat/        # Heartbeat management│├── partition=value/       # Partition directories│   ├── file1_v1.parquet   # Base file (COW table)│   ├── file1_v2.parquet   # Updated base file after update│   ├── file2.parquet      # Another base file│   ├── file2.log.1        # Delta log file (MOR table)│   └── file2.log.2        # Another delta log file│└── ...

More details:

Apache Hudi Concepts

# DuckLake Transaction Table

Since a short while, we also have DuckLake. This stores the metadata in a SQL database instead of metadata on disk.

# DuckLake Table Structure

The data model as of 2025-06-05 looks like this:

Check out more at DuckLake.

# Comparison of Transaction Log Approaches

Feature	Delta Lake	Apache Hudi	Apache Iceberg
Concurrency Control	Optimistic concurrency control with mutual exclusion and retry mechanism	File-level, log-based concurrency control ordered by start instant times	Sequence number-based optimistic concurrency control
Metadata Management	JSON log files with Parquet checkpoints every 10 commits	Timeline-based approach with metadata table for query optimization	Layered approach with catalog pointing to metadata files
Update Handling	Breaks operations into atomic commits recorded sequentially	Offers Copy-on-Write and Merge-on-Read approaches for different performance needs	Supports eager data file rewrites or delete deltas for faster updates
Performance Characteristics	Efficient for append-heavy workloads with Spark integration	Excels at update-heavy workloads with upserts and record-level indexing	Offers strong query performance with optimized metadata handling
Time Travel	Supports via transaction log processing	Supports via timeline-based snapshots	Supports via versioned metadata and snapshots
Origins	Developed by Databricks	Developed by Uber	Developed by Netflix
Primary Integration	Apache Spark	Multiple engines with Spark, Flink, and Hive focus	Multi-engine with strong Spark, Flink, Trino support
Schema Evolution	Supported with column additions/deletions	Supported with schema tracking	Extensive support with in-place evolution

Origin: Data Lake Table Format
References:
Created 2025-04-29

# Core Concepts of Transaction Logs

# Examples

# Delta Lake Transaction Log

# Apache Iceberg Transaction Log

# Apache Hudi Transaction Log

# DuckLake Transaction Table

# DuckLake Table Structure

# Comparison of Transaction Log Approaches

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签