ByteByteGo 11月04日 23:33
Datadog如何构建高性能时间序列数据库Monocle
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文深入探讨了Datadog工程团队如何从零开始构建Monocle,一个用于其实时指标平台的定制化时间序列存储引擎。面对海量数据处理挑战,Datadog将存储拆分为长期和实时两部分,并创新性地将数据(时间戳和值)与元数据(标签)分离存储。通过利用Apache Kafka实现数据分发、写入日志和复制,以及在Monocle中采用Rust语言、'thread-per-core'并发模型和LSM-Tree存储结构,Datadog大幅提升了数据查询的速度和系统的稳定性。文章还介绍了其为应对高并发查询而设计的Admission Control和Cost-Based Scheduling机制,并展望了未来将点和标签共置于单一系统的演进方向。

📈 **核心架构分离与Kafka整合**:Datadog为处理海量指标数据,将存储系统拆分为“长期指标存储”和“实时指标存储”,后者处理99%的查询。其创新之处在于将数据(时间戳和值)与元数据(标签)分离存储于不同的数据库。整个实时系统围绕Apache Kafka构建,利用Kafka实现数据分发、作为写前日志(WAL)确保数据安全,并处理数据复制,从而实现了高稳定性和速度,避免了数据库节点间的复杂协调。

🚀 **Monocle引擎与Rust实现**:Datadog自主研发了名为Monocle的存储引擎,并使用Rust语言和Tokio框架编写。Monocle的核心设计在于将复杂的时间序列标签集哈希化为一个单一的数字,实现快速查找。其采用“thread-per-core”并发模型,每个CPU核心独立处理数据,消除锁和协调开销。存储结构基于LSM-Tree,并进行了Arena Allocator和Time-Based File Pruning等优化,以支持高写入负载并加速查询。

🛡️ **应对高并发与未来演进**:为解决“thread-per-core”模型下查询可能因单个慢速线程而受阻的问题,Datadog引入了Admission Control(准入控制)来限制系统负载,以及基于成本的调度系统(CoDel算法)来管理查询延迟,确保系统在压力下仍能保持响应。未来,Datadog计划将点和标签整合到单一数据库中,并转向列式存储格式,以进一步提升查询性能和分析能力。

Build a more sustainable on-call experience (Sponsored)

Keeping your systems reliable shouldn’t come at the expense of your team. This practical guide from Datadog shows how to design sustainable on-call processes that reduce burnout and improve response.

Get step-by-step best practices so you can:

Get the guide


Disclaimer: The details in this post have been derived from the details shared online by the Datadog Engineering Team and the P99 Conference Organizers. All credit for the technical details goes to the Datadog Engineering Team. The links to the original articles and sources are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them.

In the world of cloud monitoring, Datadog operates at a massive scale. The company’s platforms must ingest billions of data points every single second from millions of hosts around the globe.

This constant flow of information creates an immense engineering challenge: how do you store, manage, and query this data in a way that is not only lightning-fast but also cost-effective?

For the Datadog Engineering Team, the answer was to build their own solution from the ground up.

In this article, we will understand how the Datadog engineering team built Monocle, their custom-built time series storage engine to power their real-time metrics platform. This article analyzes the technical decisions and clever optimizations behind the database.


Move faster with AI: Write code you can trust (Sponsored)

AI is speeding things up, but all that new code creates a bottleneck — who’s verifying the quality and security? Don’t let new technical debt and security risks slip past. Sonar’s automated review gives you the trust you need in every line of code, human- or AI-written.

With SonarQube, your team can:

Get started with SonarQube today to fuel AI-enabled development and build trust into all code.

Learn more


High-Level Metrics Platform Architecture

Before diving into the custom database engine, it is important to understand where it fits.

Their custom engine, named Monocle, is just one specialized component within a much larger “Metrics Platform.” This platform is the entire system responsible for collecting, processing, storing, and serving all of its customer metrics.

The journey of a single data point begins at the “Metrics Edge.” This component acts as the front door, receiving the flood of data from millions of customer systems. From there, it is passed to a “Storage Router.” Just as the name suggests, this router’s main job is to process the incoming data and intelligently decide where it needs to be stored.

This is where Datadog’s first major design decision becomes clear.

The Datadog Engineering Team recognized that not all data queries are the same. An engineer asking for a performance report from last year has very different needs than an automated alert checking for a failure in the last 30 seconds. To serve both, they split their storage into two massive, specialized systems.

A time series data point has two parts:

The Datadog Engineering Team made the critical decision to store these two parts in separate, specialized databases:

Using Kafka

Perhaps the most unique architectural decision the Datadog Engineering Team made is how their database clusters are organized. In many traditional distributed databases, the server nodes (the individual computers in the cluster) constantly talk to each other. They “chatter” to coordinate who is doing what, to copy data between themselves (a process called replication), and to figure out what to do if one of them fails.

Datadog’s RTDB nodes do not do this.

Instead, the entire system is designed around Apache Kafka. Here, Kafka acts as a central place where all new data is written first before it even touches the database. This Kafka-centric design is the key to the cluster’s stability and speed.

See the diagram below that shows an RTDB cluster and the role of Kafka:

The Datadog Engineering Team uses Kafka to perform three critical functions that the database nodes would otherwise have to do themselves.

Monocle: The Custom Built Engine in Rust

At the heart of each RTDB node is Monocle, Datadog’s custom-built storage engine. See the diagram below:

This is where the team’s pursuit of performance gets truly impressive. While earlier versions of the platform used RocksDB, a popular and powerful open-source database engine, the team ultimately decided to build its own. By creating Monocle from scratch, they could tailor every single decision to their specific needs, unlocking a new level of efficiency.

Monocle is written in Rust, a modern programming language famous for its safety guarantees and “C-like” performance. It is built on Tokio, a popular framework in the Rust ecosystem for writing high-speed, asynchronous applications that can handle many tasks at once without getting bogged down.

The Core Data Model: Hashing Tags

Monocle’s key innovation is its simple data model. As mentioned, a time series is defined by its tags, like “system.cpu.user”, “host:web-01”, and “env:prod”. This set of tags is what makes a series unique. However, these tag sets can be long and complex to search.

The Datadog Engineering Team simplified this dramatically. Instead of working with these complex strings, Monocle hashes the entire set of tags for a series, turning it into a single, unique number. The database then just stores data in a simple map:

(Organization, Metric Name, Tag Hash) -> (A list of [Timestamp, Value] pairs)

This design is incredibly fast because finding all the data for any given time series becomes a direct and efficient lookup using that single hash. The separate Index Database is responsible for the human-friendly part: it tells the system that a query for env: prod corresponds to a specific list of “Tag Hashes.”

Inside Monocle

Monocle’s speed comes from two main areas: its concurrency model and its storage structure.

Monocle uses what is known as a “thread-per-core” or “shared-nothing” architecture. You can imagine each CPU core in the server has its own dedicated worker, which operates in total isolation. Each worker has its own data, its own cache, and its own memory. They do not share anything.

When new data comes in from Kafka, it is hashed. The system then sends that data to the specific queue for the one worker that “owns” that hash. Since each worker is the only one who can ever access their own data, there is no need for locks, no coordination, and no waiting. This eliminates a massive performance bottleneck common in traditional databases, where different threads often have to wait for each other to access the same piece of data.

See the diagram below:

Monocle’s storage layout is a Log-Structured Merge-Tree (LSM-Tree). This is a design that is extremely efficient for write-heavy workloads like Datadog’s.

Here are the main concepts associated with LSM Trees:

The Datadog Engineering Team added two critical optimizations to this design:

Staying Fast Under Pressure

Handling so many queries with the “thread-per-core” design creates a unique challenge.

Since a query is fanned out to all workers, it is only as fast as its slowest worker. If one worker is busy performing a background task, like a heavy data compaction, it can stall the entire query for all the other workers. This is a classic computer science problem known as head-of-line blocking.

To solve this, the team built a two-layer system to manage the query load and stay responsive.

Conclusion

The Datadog Engineering Team’s work on Monocle is far from over. They are already planning the next evolution of their platform, which involves two major changes.

To achieve this, the team will move to a columnar database format.

In a columnar database, data is stored by columns instead of rows. This means a query can read only the specific tags and values it needs, which is a massive speedup for analytics.

This is a complex undertaking that will likely require a complete redesign of their “thread-per-core” model, but it highlights Datadog’s drive to push the boundaries of performance.

References:


SPONSOR US

Get your product in front of more than 1,000,000 tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.

Space Fills Up Fast - Reserve Today

Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing sponsorship@bytebytego.com.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Datadog Monocle 时间序列数据库 Time Series Database Kafka Rust 数据库架构 Database Architecture 可观测性 Observability
相关文章