Mashable 09月22日
Cloudflare因自身API故障导致服务中断
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

近日,网络服务提供商Cloudflare遭遇了一次由自身API故障引发的服务中断事件。此次事件虽不像去年6月那般波及范围广泛,但也导致了包括Spotify、Google、Snapchat等众多依赖Cloudflare服务的网站暂时无法访问。据Cloudflare工程副总裁Tom Lianza和开发者平台工程副总裁Joaquin Madruga在博客中解释,问题根源在于仪表盘(Dashboard)中的一个bug,该bug导致了对Tenant Service API的重复调用。一个“有问题的对象”被错误地添加到依赖数组中,导致API被频繁执行,最终使得Tenant Service过载,影响了其他API和仪表盘的正常运行,引发了5xx错误。

⚠️ Cloudflare近期经历了一次服务中断,其根本原因是自身仪表盘(Dashboard)中的一个bug。该bug导致了对Tenant Service API的重复、不必要的调用,使得API在单次渲染中被执行多次,而非一次。

💥 此故障的直接触发因素是一个“有问题的对象”被错误地添加到依赖数组中。当这个对象被重建并被视为新对象时,它会触发API的重新运行,最终导致Tenant Service API过载。

🌐 Tenant Service API的过载进而影响了其他API和Cloudflare仪表盘的正常运行。这是因为Tenant Service是API请求授权逻辑的关键组成部分,其失效导致了API请求返回5xx错误,进而引发了广泛的服务中断。

😔 Cloudflare已就此次事件对用户造成的干扰表示歉意,并承诺将继续调查问题,以改进系统和流程,防止类似事件再次发生。

Cloudflare, a platform that provides network services, was the victim of a DDoS attack last week. It was also accidentally the cause of it.

You might remember Cloudflare was linked to a massive outage in June of this year. When Cloudflare went down, so did sites like Spotify, Google, Snapchat, Discord, Character.ai, and more, all of which rely on Cloudflare's services. That time, the disruption was sparked by a Google Cloud outage. Earlier this month, Cloudflare had another blunder, albeit much less disruptive than its outage from the summer — but this time, it did it to itself.

"We had an outage in our Tenant Service API which led to a broad outage of many of our APIs and the Cloudflare Dashboard," Tom Lianza, the vice president of engineering for Cloudflare and Joaquin Madruga, the vice president of engineering for the developer platform at Cloudflare, wrote in a Sept. 13 blog post. "The incident’s impact stemmed from several issues, but the immediate trigger was a bug in the dashboard."

The bug, according to Lianza and Madruga, caused "repeated, unnecessary calls to the Tenant Service API." On accident, Cloudflare included a "problematic object in its dependency array" which was recreated, treated as new, caused it to re-run, and, eventually, the "API call executed many times during a single dashboard render instead of just once."

"When the Tenant Service became overloaded, it had an impact on other APIs and the dashboard because Tenant Service is part of our API request authorization logic. Without Tenant Service, API request authorization can not be evaluated. When authorization evaluation fails, API requests return 5xx status codes," the blog reads.

Everything is back on track at Cloudflare for now.

"We’re very sorry about the disruption," the blog post reads. "We will continue to investigate this issue and make improvements to our systems and processes."

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Cloudflare DDoS API故障 服务中断 网络安全 Cloudflare outage API failure service disruption network security
相关文章