ByteByteGo 09月25日 18:01
Netflix Tudum架构演进:从CQRS+Kafka到RAW Hollow
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Netflix Tudum平台最初采用CQRS架构通过Kafka连接写路径和读路径,但编辑预览存在延迟。为解决此问题,工程师引入RAW Hollow内存对象存储,将数据直接加载到应用内存中,实现近实时预览和架构简化。新架构消除了外部键值存储、缓存和Kafka依赖,将首页构建时间从1.4秒降至0.4秒,编辑预览近乎即时,同时内存占用显著减少。

🔧 Tudum平台早期采用CQRS架构,通过Kafka连接编辑工具(写路径)和访客界面(读路径),但读路径的缓存刷新机制导致编辑更新延迟数分钟,影响协作效率。

🧠 RAW Hollow是Netflix开发的内存对象存储,特点是分布式部署、数据压缩(三年数据约130MB)和按请求一致性控制,专为低延迟场景设计,替代了原有的外部Cassandra存储和缓存层。

⚡ 新架构将读路径从多跳网络管道转变为内存本地查找模型:编辑更新后,RAW Hollow将数据同步到所有服务实例内存,服务直接O(1)时间访问最新数据,无需网络调用或缓存刷新,实现即时预览。

💾 内存优先设计带来三重效益:首页构建时间从1.4秒降至0.4秒,编辑预览近乎实时,且完整三年数据仅占130MB内存(传统格式约四分之一),大幅降低存储成本和扩展难度。

Build MCP servers that perform with Gram by Speakeasy (Sponsored)

AI agents get confused by MCP servers which include too many tools, lack crucial context, and are simple API mirrors. Without development, your carefully designed APIs causes agent headaches.

Gram fixes this. It's an open source platform where you can curate tools: add context, design multi-step tools, and deploy your MCP server in minutes.

Transform your APIs into agent-ready infrastructure that is ready to scale with OAuth 2.1 support, centralized management, and hosted infrastructure.

Start building your MCP server →


Disclaimer: The details in this post have been derived from the official documentation shared online by the Netflix Engineering Team. All credit for the technical details goes to the Netflix Engineering Team.  The links to the original articles and sources are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them.

When Netflix launched Tudum as its official home for behind-the-scenes stories, fan interviews, and interactive experiences, the engineering challenge was clear: deliver fresh, richly formatted content to millions of viewers at high speed, while giving editors a seamless way to preview updates in real time. 

The initial architecture followed a classic CQRS (Command Query Responsibility Segregation) pattern, separating the “write path” for editorial tools from the “read path” for visitors. Kafka connected these paths, pushing read-optimized data into backend services for page construction. 

The approach was scalable and reliable, but not without trade-offs.

As Tudum grew, editors noticed a frustrating delay between saving an update and seeing it live in previews. The culprit was a chain of sequential processes and cache refresh cycles that, while suitable for production visitors, slowed down the creative workflow. 

To solve this, Netflix engineers replaced the read-path’s external key-value store and per-request I/O with RAW Hollow: a compressed, distributed, in-memory object store embedded directly in the application. 

The result was near-instant editorial preview, simpler infrastructure, and a major drop in page construction time for end users. In this article, we will look at the evolution of this design decision and how Netflix went about implementing it.

Early Design

Netflix’s Tudum platform had to support two fundamentally different workflows:

To keep these workflows independent and allow each to scale according to its needs, Netflix adopted a CQRS (Command Query Responsibility Segregation) architecture. 

See the diagram below for a general overview of CQRS.

The write store contains the raw editorial data (internal CMS objects with IDs, metadata, and references), while the read store contains a fully “render-ready” version of the same data, such as resolved movie titles instead of IDs, CDN-ready image URLs instead of internal asset references, and precomputed layout elements.

As mentioned, Kafka served as the bridge between the two paths. When an editor made a change, the CMS emitted an event to Tudum’s ingestion layer. This ingestion pipeline performed the following steps:

The processed content was published to a Kafka topic.

A Data Service Consumer subscribed to this topic, reading each new or updated page element. It wrote this data into a Cassandra-backed read store, structured for fast retrieval. Finally, an API layer exposed these read-optimized entities to downstream consumers such as the Page Construction Service (which assembles full pages for rendering), personalization services, and other internal tools.

See the diagram below:

This event-driven design ensured that editorial changes would eventually appear on the Tudum website without impacting write-side performance, while also allowing Netflix to scale the read and write paths independently.

The Pain of Eventual Consistency

While the CQRS-with-Kafka design was robust and scalable, it introduced a workflow bottleneck that became increasingly visible as Tudum’s editorial output grew.

Every time an editor made a change in the CMS, that change had to travel through a long chain before it appeared in a preview environment or on the live site. Here is a quick look at the various steps involved:

This near-cache was a key contributor to the delay. Technically speaking, the near-cache is a small, per-instance, memory layer that sits in front of the read store. However, rather than refreshing instantly for every update, it operated on a scheduled per-key refresh policy. Each key has a timer. When the timer fires, the instance refreshes that key from the backing store.  While this approach was designed for production traffic efficiency, it meant that fresh edits often waited for the next scheduled refresh cycle before appearing.

As content volume and the number of page elements increased, these refresh cycles stretched longer. A page is assembled from multiple fragments, each with its key and timer. They do not refresh together. This meant that the more elements a page had, the more staggered the refresh completion became, leading to inconsistent preview states. In other words, some elements got updated, but others remained stale.

The result was that editors had to sometimes wait minutes to see their changes reflected in a preview, even though the system had already processed and stored the update. 

For a platform like Tudum, where timing is critical for publishing stories tied to new releases and events, this delay disrupted editorial flow and complicated collaboration between writers, editors, and designers.

The Solution: RAW Hollow

To eliminate the bottlenecks in Tudum’s read path, Netflix engineers turned to RAW Hollow: a compressed, distributed, in-memory object store designed for scenarios where datasets are small-to-medium in size, change infrequently, and must be served with extremely low latency.

Unlike the earlier setup, where read services fetched data from an external Cassandra-backed key-value store (with network calls, cache layers, and refresh cycles), RAW Hollow keeps the entire dataset loaded directly into the memory of every application instance that needs it. This means all lookups happen in-process, avoiding the I/O and cache-invalidation complexities of the old approach.

The key characteristics of RAW Hollow in the Tudum context are as follows:

For Tudum, adopting RAW Hollow meant removing the Page Data Service, its near-cache, the external key-value store, and even Kafka from the read path. Instead, the Hollow client was embedded directly inside each microservice that needed content. This collapsed the number of sequential operations, tightened the feedback loop for editors, and simplified the architecture by removing multiple moving parts.

The result was a big shift: instead of “store to fetch to cache to refresh,” the system now operates on “load once into memory to serve instantly to propagate changes.” 

The New TUDUM Design

After adopting RAW Hollow, Netflix rebuilt Tudum’s read path to remove the layers that were slowing down editorial previews and adding unnecessary complexity. 

The new design still follows the CQRS principle (separating the editorial content creation from the visitor-facing content), but the way data moves through the read side is now radically simplified.

See the diagram below:

Here’s what changed in the architecture:

The new flow works as follows:

This re-architecture shifted Tudum’s read path from a multi-hop network-bound pipeline to a memory-local lookup model. In essence, Netflix took the scalability and separation of CQRS but stripped away the read path’s I/O-heavy plumbing, replacing it with a memory-first, embedded data model.

Conclusion

The shift from a Kafka and Cassandra with a cache-based read path to a RAW Hollow in-memory model produced immediate and measurable improvements for Tudum. Some of the key benefits were as follows:

Netflix’s re-architecture of Tudum’s read path shows how rethinking data access patterns can yield outsized gains in performance, simplicity, and developer experience.

By combining the scalability of CQRS with the speed of an in-memory, compressed object store like RAW Hollow, they created a system that serves both editorial agility and end-user responsiveness.

The lessons here are broadly applicable:

References:


ByteByteGo Technical Interview Prep Kit

Launching the All-in-one interview prep. We’re making all the books available on the ByteByteGo website.

What's included:

Launch sale: 50% off


SPONSOR US

Get your product in front of more than 1,000,000 tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.

Space Fills Up Fast - Reserve Today

Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing sponsorship@bytebytego.com.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Netflix Tudum CQRS RAW Hollow 内存对象存储 架构演进 低延迟设计 编辑工具
相关文章