The GitHub Blog 10月09日 10:22
GitHub九月服务性能回顾
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

GitHub在九月份经历了三次影响服务性能的事件。9月15日,Copilot因功能标志部署失误导致部分功能不可用,持续25分钟。9月23日至24日,邮件发送延迟,总计约130分钟,高峰期延迟50分钟,原因为流量激增导致邮件服务器资源争用。9月29日,Copilot API因内部依赖升级中的配置错误出现间歇性404响应,持续67分钟。GitHub已采取相应措施改进系统稳定性和监控能力。

🚦 **Copilot功能受限**:9月15日,Copilot因一次功能标志的部署问题,导致其大部分功能在25分钟内出现可用性下降,用户收到了403错误。根本原因是速率限制逻辑中存在一个未被发现的边缘情况,功能标志的本意是为部分用户缩减速率限制,却意外导致了速率限制配置失效,影响了100%的请求。问题通过回滚功能标志得以解决,并已通过增加流量异常监控和增强速率限制扩展测试来提升系统韧性。

📧 **邮件发送延迟**:在9月23日至24日期间,GitHub的电子邮件发送服务出现了显著延迟。虽然两次事件的总影响时长约为130分钟,但客户体验到的最长延迟为50分钟。此问题源于异常高的流量导致了部分出站邮件服务器的资源争用。为应对此情况,GitHub已更新配置以更好地分配容量,并改进了监控能力以提高检测效率。

API **Copilot API 404错误**:9月29日,Copilot API在67分钟内经历了部分降级,导致GitHub MCP服务器请求出现间歇性的404错误响应,峰值时影响了约2%的请求。该问题的根源在于一次内部依赖的升级暴露了服务中的配置错误。通过回滚升级并修复配置问题,该事件得到解决,未来将通过改进文档和发布流程来预防类似情况的发生。

In September, we experienced three incidents that resulted in degraded performance across GitHub services.

September 15 17:55 UTC (lasting 25 minutes)

On September 15, 2025, between 17:55 and 18:20 UTC, Copilot experienced degraded availability for the majority of the features. This was due a partial deployment of a feature flag to a global rate limiter. The flag triggered behavior that unintentionally limited 100% of requests, returning 403 errors. The issue was resolved by reverting the feature flag which resulted in immediate recovery.

The root cause of the incident was from an undetected edge case in our rate limiting logic. The flag was meant to scale down rate limiting for a subset of users, but unintentionally put our rate limiting configuration into an invalid state.

The issue has been resolved, and we are enhancing system resilience by adding traffic anomaly monitors for early issue detection and increasing coverage of rate limit scaling tests to strengthen pre-production validation.

September 24 14:02 UTC (lasting 50 minutes)

On September 23, 2025, between 15:29 UTC and 17:38 UTC, and also on September 24, 2025, between 14:02 UTC and 15:12 UTC, email deliveries were delayed, resulting in significant delays for most types of email notifications. While the overall incident impact from the two incidents totaled ~130 minutes, the peak delays experienced by customers was ~50 minutes. This occurred due to an unusually high volume of traffic, which caused resource contention on some of our outbound email servers.

We have updated the configuration to better allocate capacity when there is a high volume of traffic and are also updating our monitors to improve our detection capabilities.

September 29 16:26 UTC (lasting 67 minutes)

On September 29, 2025, between 16:26 UTC and 17:33 UTC, the Copilot API experienced a partial degradation, causing intermittent erroneous 404 responses for an average of 0.2% of GitHub MCP server requests, peaking at times around 2% of requests. The issue stemmed from an upgrade of an internal dependency, which exposed a misconfiguration in the service.

We resolved the incident by rolling back the upgrade to address the misconfiguration. We fixed the configuration issue and will improve documentation and rollout process to prevent similar issues.


Please follow our status page for real-time updates on status changes and post-incident recaps. To learn more about what we’re working on, check out the GitHub Engineering Blog.

The post GitHub Availability Report: September 2025 appeared first on The GitHub Blog.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

GitHub 服务性能 可用性 Copilot 邮件延迟 API错误 September 2025
相关文章