cs.AI updates on arXiv.org 10月23日 12:21
云服务故障管理中的根本原因分析综述
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文综述了云服务故障管理中的根本原因分析(RCA),探讨了不同研究对RCA任务的界定差异,提出了一种基于目标驱动的框架,对135篇RCA论文进行分类和整合,并讨论了RCA的开放挑战和未来方向。

arXiv:2510.19593v1 Announce Type: cross Abstract: Root Cause Analysis (RCA) is a crucial aspect of incident management in large-scale cloud services. While the term root cause analysis or RCA has been widely used, different studies formulate the task differently. This is because the term "RCA" implicitly covers tasks with distinct underlying goals. For instance, the goal of localizing a faulty service for rapid triage is fundamentally different from identifying a specific functional bug for a definitive fix. However, previous surveys have largely overlooked these goal-based distinctions, conventionally categorizing papers by input data types (e.g., metric-based vs. trace-based methods). This leads to the grouping of works with disparate objectives, thereby obscuring the true progress and gaps in the field. Meanwhile, the typical audience of an RCA survey is either laymen who want to know the goals and big picture of the task or RCA researchers who want to figure out past research under the same task formulation. Thus, an RCA survey that organizes the related papers according to their goals is in high demand. To this end, this paper presents a goal-driven framework that effectively categorizes and integrates 135 papers on RCA in the context of cloud incident management based on their diverse goals, spanning the period from 2014 to 2025. In addition to the goal-driven categorization, it discusses the ultimate goal of all RCA papers as an umbrella covering different RCA formulations. Moreover, the paper discusses open challenges and future directions in RCA.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

云服务 故障管理 根本原因分析 目标驱动 RCA
相关文章