Mashable 10月23日 16:39
亚马逊AWS服务中断引发互联网大范围故障
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

美国东部时间周一,亚马逊的AWS云服务遭遇重大中断,导致包括联合航空、Snapchat、麦当劳、Verizon和Venmo等众多网站和服务出现用户报告问题激增。最初的问题似乎与域名系统(DNS)有关,影响了用户对存储在AWS数据库中数据的访问。尽管AWS声称DNS问题已得到缓解,但随后又出现了网络连接问题,进一步影响了服务恢复。这次中断再次凸显了现代互联网基础设施对少数大型科技公司的依赖性,以及当这些关键支柱出现故障时可能造成的广泛影响,甚至引发了关于大型科技公司是否“过大”的讨论。

⚠️ 亚马逊AWS云服务于周一经历了一次严重中断,最初是由于域名系统(DNS)问题,导致依赖AWS的众多互联网服务和网站出现故障,用户无法正常访问。

📉 尽管亚马逊声称已缓解了最初的DNS问题,但随后又报告了美国东部-1地区多个AWS服务的网络连接问题,这表明修复过程复杂,且可能引发新的问题,导致服务恢复缓慢且不稳定。

🌐 此次AWS中断事件波及广泛,包括联合航空、Snapchat、麦当劳、Verizon和Venmo等知名服务都出现了用户报告问题激增的情况,充分暴露了现代互联网生态系统对少数核心云服务提供商的高度依赖性。

🗣️ 亚马逊AWS的频繁中断引发了关于互联网基础设施弹性和大型科技公司市场份额的讨论,甚至有观点认为,单一公司如果能够导致整个互联网的大范围故障,可能意味着其规模过大,需要进行拆分以降低风险。

🛡️ 网络安全专家指出,本次AWS中断并非由网络攻击等恶意行为导致,更可能是一次内部技术故障,但也提醒人们,像AWS这样拥有庞大且复杂的系统,任何细微的问题都可能引发严重的连锁反应。

UPDATE Tuesday, 1:41 p.m. ET: With Amazon's AWS issues fully resolved, the online world was left to parse through the postmortem on Tuesday.

It's concerning and, yet, unsurprising to see how fragile the internet's ecosystem can prove. When a central pillar like AWS goes down, it topples large chunks of the internet with it. We've seen it before with Google Cloud, Microsoft, CrowdStrike, and others.

The modern internet is vast but delicate. As many news outlets pointed out, a few big tech companies hold vast market share, and when those services go down, the downstream effects can be troubling. And that's exactly how it played out on Monday.

UPDATE Tuesday, 9:30 a.m. ET: While Amazon's AWS services were fully restored by Tuesday, the fallout of the massive outage is still becoming clear.

Issues with a single service caused major disruptions to the basic things that make our lives functional. Canvas crashed, disrupting learning nationwide. Lloyds Bank customers lost access to their accounts. Some United Airlines flyers couldn't check in or view their reservations. People's alarms didn't go off. There are too many examples to list — it was a full meltdown.

To some, Monday was an example of Big Tech being too big. If an AWS outage can cause such widespread issues, that may be a problem.

"If a company can break the entire internet, they are too big. Period," wrote Democratic Sen. Elizabeth Warren on X. "It's time to break up Big Tech."

UPDATE Monday, 8:20 p.m. ET: Amazon provided more updates on how it repaired its AWS services and noted, "By 3:01 PM [PT, or 6:01 p.m. ET), all AWS services returned to normal operations. Some services such as AWS Config, Redshift, and Connect continue to have a backlog of messages that they will finish processing over the next few hours. We will share a detailed AWS post-event summary."

UPDATE Monday, 5:05 p.m. ET: The latest updates from Amazon indicated its AWS services were progressing toward full resolution.

"Service recovery across all AWS services continues to improve," the company wrote. It noted it was continuing to "reduce throttles" on certain affected tools.

UPDATE Monday, 3:41 p.m. ET: Amazon indicated its AWS services were well on the way to fully recovering.

"We continue to observe recovery across all AWS services," the company wrote. It did note customers may still face "intermittent function errors" with Lambda, its serverless compute service.

AWS saw a major outage in the early hours of Monday morning, a temporary recovery, and then further issues as the East Coast neared midday. You can read the full explanation of the outages in both the original story and our regular updates to this article, but, in short, any problem with AWS means major issues for large swaths of the internet. Sites and services such as United Airlines, Snapchat, McDonald's, Verizon, Venmo, and countless others all saw spikes in user-reported issues on Downdetector.

While the internet is vast, there are a few pillars — AWS perhaps chief among them — that can lead to large, disruptive downstream effects should they experience problems.

UPDATE Monday, 3:01 p.m. ET: Amazon said its continued efforts to remedy issues with its AWS services appeared to be working, noting it saw "decreasing networking connectivity issues," according to the most recent update on its status page.

Users still reported a relatively high number of issues with AWS on Downdetector, though many third-party services apparently affected by the AWS outage appeared to be recovering.

It's been a tremendously turbulent Monday for AWS. The popular cloud platform saw a major outage in the early morning hours, briefly recovered, and then experienced new problems around midday.

(Disclosure: Downdetector is owned by Ziff Davis, the same parent company as Mashable.)

UPDATE Monday, 2:15 p.m. ET: Amazon said its efforts to fix its connectivity issues appear to be working. Its widely popular AWS cloud platform suffered renewed issues starting around midday, just hours after a major outage during the early hours of Monday morning.

The company wrote its "mitigations to resolve launch failures" were progressing and that it expected "launch errors and network connectivity issues to subside" as it worked to apply fixes more widely.

UPDATE Monday, 1:15 p.m. ET: Amazon wrote it was working to fix connectivity issues that arose midday Monday ET, hours after a major outage in the early hours of the day.

"We continue to apply mitigation steps for network load balancer health and recovering connectivity for most AWS services," read the latest update from the AWS status page.

Mike Chapple, an IT professor at the University of Notre Dame, said that further issues surfacing after the initial outage is not necessarily a surprising development.

"While this is disruptive, it isn't unusual. The process of fixing a serious IT infrastructure issue often creates new problems, and fixes often need to be rolled out across a large number of systems over time," Chapple said in an emailed statement to Mashable. "As engineers work to steady the system, operations slowly stabilize and things return to normal.  Think of it like a utility outage that occurs in a large city.  The power might flicker on and off a few times as repair crews do their work.  We're seeing something similar now with AWS."

UPDATE Monday, 12:15 p.m. ET: Amazon said it was homing in on the underlying issue that caused renewed issues with AWS on Monday.

"We have narrowed down the source of the network connectivity issues that impacted AWS Services," read the latest update from the AWS status page. "The root cause is an underlying internal subsystem responsible for monitoring the health of our network load balancers."

It was not yet clear when outages and issues would be fully resolved.

UPDATE Monday, 11:45 a.m. ET: Amazon confirmed AWS was experiencing more issues late Monday morning, just hours after the issue was apparently resolved. The company wrote it was investigating "the root cause for the network connectivity issues that are impacting AWS services such as DynamoDB, SQS, and Amazon Connect," in its most recent update to the AWS status page.

Meanwhile, widespread service disruptions across the internet continued. User-reported issues have spiked for a number of popular services, according to Downdetector, including FanDuel, Snapchat, Apple Music, Asana, Verizon, and many more. The renewed AWS problems appeared to be significant and once again causing problems for large numbers of users.


A service disruption at Amazon Web Services (AWS), Amazon's popular cloud hosting and data service, caused massive problems for internet users starting their workweek on Monday. Since AWS powers huge portions of the internet, the list of services and sites that suffered outages on Monday was pretty staggering.

According to user-reported issues at the site Downdetector, affected services include United Airlines, AT&T, Fortnite, Disney+, HBO Max, Signal, Snapchat, McDonald's, Verizon, Venmo, and many more. (Disclosure: Downdetector is owned by Ziff Davis, the same parent company as Mashable.) Amazon services like Prime and Alexa were affected, too. In short: Almost anyone could've been affected in some way.

Nearly everything we own is internet-connected — our fridges are WiFi-enabled billboards — meaning an AWS outage can disrupt large swaths of lives.

Nearing midday, it appeared the issue was over. But then Amazons's AWS Health Dashboard indicated problems had resurfaced.

"We have confirmed multiple AWS services experienced network connectivity issues in the US-EAST-1 Region," read an update around 10:30 a.m. ET. "We are seeing early signs of recovery for the connectivity issues and are continuing to investigate the root cause."

It appeared AWS was seeing issues again, though not on the scale of the outage in the earlier hours. Some services, such as Venmo and Boost Mobile, saw a corresponding jump in user-reported issues on Downdetector.

Amazon previously said that problem had either fully resolved or was resolving. Mashable reached out for comment and was directed to the AWS Health Dashboard. At about 6:35 a.m. ET the AWS Health Dashboard indicated the main issue was resolved, though problems may persist as things got up and running. That could, perhaps, hint at the new problems that surfaced.

"The underlying DNS issue has been fully mitigated, and most AWS Service operations are succeeding normally now," the 6:35 a.m. ET update read. "Some requests may be throttled while we work toward full resolution."

What caused the AWS outage?

The exact reason AWS initially went down remains unknown, but we have an idea. Services using AWS were unable to access DynamoDB, an Amazon-run database, because the Domain Name System (DNS) had a problem. The DNS effectively translates website names into IP addresses. So when Amazon wrote on its Health Dashboard that the DNS issue had been "fully mitigated," it's saying the real problem was fixed.

"Amazon had the data safely stored, but nobody else could find it for several hours, leaving apps temporarily separated from their data," Mike Chapple, an IT professor at University of Notre Dame, told CNN. "It's as if large portions of the internet suffered temporary amnesia."


Featured Video For You
The tech startup on a mission to decarbonise concrete

Rafe Pilling, the director of threat intelligence at the cybersecurity firm Sophos, told The Guardian that the incident didn't appear to be a cyberattack or anything nefarious, which is aligned with Amazon's statements.

"When anything like this happens the concern that it’s a cyber incident is understandable," he told the U.K. outlet. "AWS has a far-reaching and intricate footprint, so any issue can cause a major upset."

It's likely Amazon will, at a later time, explain what happened Monday further. It's unclear how the 10:35 a.m. ET "network connectivity issues" are related, if at all, to the initial issue with the DNS, though it feels reasonable to assume issues could arise as services worked to return to normal.

Why is an AWS outage such a big deal?

In short: AWS is a central pillar of the modern internet. Without it, things crash. As major companies gobbled up market share, it actually made the infrastructure on the internet remarkably fragile — an issue with AWS, or Google, or Microsoft, or Crowdstrike means issues for tons of users.

Advocates even argue that such reliance on these big players is a free speech issue.

"We urgently need diversification in cloud computing," said Dr. Corinne Cath-Speth, head of digital human rights organization Article 19, according to The Guardian. "The infrastructure underpinning democratic discourse, independent journalism, and secure communications cannot be dependent on a handful of companies."

The long and short of it: If something goes wrong with AWS, a lot goes wrong everywhere else.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AWS 亚马逊 云服务 互联网中断 技术故障
相关文章