ByteByteGo 前天 00:41
Databricks 优化 Kubernetes 服务负载均衡
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Databricks 团队面临 Kubernetes 默认负载均衡机制在高流量、长连接场景下的局限性,如流量不均、尾部延迟高和资源利用率低。为解决此问题,他们开发了一种客户端侧负载均衡系统,将负载均衡决策从基础设施层转移到客户端。该系统通过一个轻量级控制平面(Endpoint Discovery Service)实时发现服务实例,客户端直接获取健康实例列表并进行每请求级别的智能路由,而非依赖连接级别的轮询。此方法显著改善了流量分布,降低了延迟,提高了资源利用率,并支持如 Power of Two Choices 和区域亲和性等高级负载均衡策略。该系统已成功集成到内部 RPC 客户端框架,并扩展至 Envoy 以管理入站流量,实现了内部和外部流量的统一、高效管理。

🚦 **Kubernetes 默认负载均衡的局限性**:在处理 gRPC 等基于 HTTP/2 的长连接服务时,Kubernetes 的 ClusterIP、CoreDNS 和 kube-proxy 机制在连接建立时进行一次性轮询选择后端 Pod,导致连接生命周期内流量集中在少数 Pod,造成流量倾斜、尾部延迟升高和资源利用不均。这使得服务难以实现高可用和高性能。

💡 **客户端侧负载均衡解决方案**:Databricks 将负载均衡逻辑移至客户端,通过自定义 RPC 客户端订阅一个实时 Endpoint Discovery Service (EDS) 控制平面。客户端直接获取并维护健康后端 Pod 的内存列表,实现基于每个请求的智能路由,绕过了 DNS 解析和 kube-proxy 的瓶颈,提供了更精细、更实时的流量管理。

🚀 **高级负载均衡策略的实现**:该系统支持多种智能路由策略,如“Power of Two Choices”(从两个随机选取的健康端点中选择负载较低的发送请求),以及区域亲和性路由,优先将流量发送到同一区域的 Pod 以降低延迟。这些策略能够根据实时服务状态动态调整流量,显著提升了性能和可靠性。

🌐 **统一的入站流量管理**:Databricks 将 EDS 控制平面扩展至 Envoy,实现了内部服务间通信和外部入站流量的统一负载均衡。Envoy 通过 EDS 获取实时端点信息,确保外部流量同样能够导向健康的后端 Pod,从而在整个平台范围内实现了更一致、更高效的流量路由管理。

📈 **显著的性能提升与经验教训**:实施客户端侧负载均衡后,Databricks 实现了 Pod 间的均匀流量分布,降低了 20% 的 Pod 数量,并显著提升了 P90 和尾部延迟的可预测性。但也暴露出冷启动问题(已通过慢启动机制缓解)和基于 CPU/内存指标路由的局限性(已转向基于健康信号)。

Is your team building or scaling AI agents?(Sponsored)

One of AI’s biggest challenges today is memory—how agents retain, recall, and remember over time. Without it, even the best models struggle with context loss, inconsistency, and limited scalability.

This new O’Reilly + Redis report breaks down why memory is the foundation of scalable AI systems and how real-time architectures make it possible.

Inside the report:

Download the report


Disclaimer: The details in this post have been derived from the details shared online by the Databricks Engineering Team. All credit for the technical details goes to the Databricks Engineering Team. The links to the original articles and sources are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them.

Kubernetes has become the standard platform for running modern microservices. It simplifies how services talk to each other through built-in networking components like ClusterIP services, CoreDNS, and kube-proxy. These primitives work well for many workloads, but they start to show their limitations when traffic becomes high volume, persistent, and latency sensitive.

Databricks faced exactly this challenge. Many of their internal services rely on gRPC, which runs over HTTP/2 and keeps long-lived TCP connections between clients and servers. Under Kubernetes’ default model, this leads to uneven traffic distribution, unpredictable scaling behavior, and higher tail latencies.

By default, Kubernetes uses ClusterIP services, CoreDNS, and kube-proxy (iptables/IPVS/eBPF) to route traffic:

Since the selection happens only once per TCP connection, the same backend pod keeps receiving traffic for the lifetime of that connection. For short-lived HTTP/1 connections, this is usually fine. However, for persistent HTTP/2 connections, the result is traffic skew: a few pods get overloaded while others stay idle.

For Databricks, this created several operational issues:

The Databricks Engineering Team needed something smarter: a Layer 7, request-level load balancer that could react dynamically to real service conditions instead of relying on connection-level routing decisions.

In this article, we will learn how they built such a system and the challenges they faced along the way.

The Core Solution

To overcome the limitations of the default Kubernetes routing, the Databricks Engineering Team shifted the load balancing responsibility from the infrastructure layer to the client itself. Instead of depending on kube-proxy and DNS to make connection-level routing decisions, they built a client-side load balancing system supported by a lightweight control plane that provides real-time service discovery.

This means the application client no longer waits for DNS to resolve a service or for kube-proxy to pick a backend pod. Instead, it already knows which pods are healthy and available. When a request is made, the client can choose the best backend at that moment based on up-to-date information.

Here’s a table that shows the difference between the default Kubernetes LB and Databricks client-side LB:

By removing DNS from the critical path, the system gives each client a direct and current view of available endpoints. This allows smarter, per-request routing decisions instead of static, per-connection routing. The result is more even traffic distribution, lower latency, and better use of resources across pods.

This approach also gives Databricks greater flexibility to fine-tune how traffic flows between services, something that is difficult to achieve with the default Kubernetes model.

Custom Control Plane - Endpoint Discovery Service

A key part of the intelligent load balancing system is its custom control plane. This component is responsible for keeping an accurate, real-time view of the services running inside the Kubernetes cluster. Instead of depending on DNS lookups or static routing, the control plane continuously monitors the cluster and provides live endpoint information to clients.

See the diagram below:

Here is how it works:

This design has several benefits:

Client Integration with RPC Frameworks

For any load-balancing system to work at scale, it has to be easy for application teams to adopt. Databricks solved this by directly integrating the new load-balancing logic into their shared RPC client framework, which is used by most of their internal services.

Since many Databricks services are written in Scala, the engineering team was able to build this capability once and make it available to all services without extra effort from individual teams.

Here is how the integration works:

Advanced Load Balancing Strategies

One of the biggest advantages of the client-side load balancing system at Databricks is its flexibility. Since the routing happens inside the client and is based on real-time data, the system can use more advanced strategies than the basic round-robin or random selection used by kube-proxy.

These strategies allow the client to make smarter routing decisions for every request, improving performance, reliability, and resource efficiency.

Power of Two Choices (P2C)

The Power of Two Choices algorithm is simple but powerful. When a request comes in, the client:

This approach avoids both random traffic spikes and overloaded pods. It balances traffic more evenly than round-robin while keeping the logic lightweight and fast. Databricks found that P2C works well for the majority of its services.

Zone-Affinity Routing

In large, distributed Kubernetes clusters, network latency can increase when traffic crosses zones.

To minimize this, the team uses zone-affinity routing:

Pluggable Strategies

The architecture is designed to be extensible. The team can easily add new load-balancing strategies without changing the overall system. For example:

xDS Integration with Envoy for Ingress Traffic

The Databricks Engineering Team didn’t limit its intelligent load balancing system to internal traffic. They also extended their Endpoint Discovery Service (EDS) control plane to work with Envoy, which manages external ingress traffic. This means that both internal service-to-service communication and traffic coming into the cluster from outside follow the same set of routing rules.

Here’s how this works:

Conclusion

The shift to client-side load balancing brought measurable benefits to Databricks’ infrastructure. After deploying the new system, the traffic distribution across pods became uniform, eliminating the issue of a few pods being overloaded while others sat idle.

This led to stable latency profiles, with P90 and tail latencies becoming much more predictable, and a 20 percent reduction in pod count across multiple services.

The improved balance meant Databricks could achieve better performance without over-provisioning resources.

The rollout also surfaced some important lessons:

Looking ahead, Databricks is working on cross-cluster and cross-region load balancing to scale this system globally using flat L3 networking and multi-region EDS clusters. The team is also exploring advanced AI-aware strategies, including weighted load balancing for specialized backends. These future improvements are aimed at handling even larger workloads, supporting AI-heavy applications, and maintaining high reliability as their platform grows.

Through this architecture, Databricks has demonstrated a practical way to overcome the limitations of the default Kubernetes load balancing and build a flexible, efficient, and scalable traffic management system.

References:


SPONSOR US

Get your product in front of more than 1,000,000 tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.

Space Fills Up Fast - Reserve Today

Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing sponsorship@bytebytego.com.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Kubernetes Load Balancing Databricks Microservices gRPC Service Discovery Client-side Load Balancing Endpoint Discovery Service Envoy High Availability Performance Optimization AI Agents Memory in AI
相关文章