亚马逊AWS服务中断引发互联网大范围故障

UPDATE Tuesday, 1:41 p.m. ET: With Amazon's AWS issues fully resolved, the online world was left to parse through the postmortem on Tuesday.

It's concerning and, yet, unsurprising to see how fragile the internet's ecosystem can prove. When a central pillar like AWS goes down, it topples large chunks of the internet with it. We've seen it before with Google Cloud, Microsoft, CrowdStrike, and others.

The modern internet is vast but delicate. As many news outlets pointed out, a few big tech companies hold vast market share, and when those services go down, the downstream effects can be troubling. And that's exactly how it played out on Monday.

UPDATE Tuesday, 9:30 a.m. ET: While Amazon's AWS services were fully restored by Tuesday, the fallout of the massive outage is still becoming clear.

Issues with a single service caused major disruptions to the basic things that make our lives functional. Canvas crashed, disrupting learning nationwide. Lloyds Bank customers lost access to their accounts. Some United Airlines flyers couldn't check in or view their reservations. People's alarms didn't go off. There are too many examples to list — it was a full meltdown.

To some, Monday was an example of Big Tech being too big. If an AWS outage can cause such widespread issues, that may be a problem.

"If a company can break the entire internet, they are too big. Period," wrote Democratic Sen. Elizabeth Warren on X. "It's time to break up Big Tech."

This Tweet is currently unavailable. It might be loading or has been removed.

UPDATE Monday, 8:20 p.m. ET: Amazon provided more updates on how it repaired its AWS services and noted, "By 3:01 PM [PT, or 6:01 p.m. ET), all AWS services returned to normal operations. Some services such as AWS Config, Redshift, and Connect continue to have a backlog of messages that they will finish processing over the next few hours. We will share a detailed AWS post-event summary."

UPDATE Monday, 5:05 p.m. ET: The latest updates from Amazon indicated its AWS services were progressing toward full resolution.

"Service recovery across all AWS services continues to improve," the company wrote. It noted it was continuing to "reduce throttles" on certain affected tools.

UPDATE Monday, 3:41 p.m. ET: Amazon indicated its AWS services were well on the way to fully recovering.

"We continue to observe recovery across all AWS services," the company wrote. It did note customers may still face "intermittent function errors" with Lambda, its serverless compute service.

AWS saw a major outage in the early hours of Monday morning, a temporary recovery, and then further issues as the East Coast neared midday. You can read the full explanation of the outages in both the original story and our regular updates to this article, but, in short, any problem with AWS means major issues for large swaths of the internet. Sites and services such as United Airlines, Snapchat, McDonald's, Verizon, Venmo, and countless others all saw spikes in user-reported issues on Downdetector.

While the internet is vast, there are a few pillars — AWS perhaps chief among them — that can lead to large, disruptive downstream effects should they experience problems.

UPDATE Monday, 3:01 p.m. ET: Amazon said its continued efforts to remedy issues with its AWS services appeared to be working, noting it saw "decreasing networking connectivity issues," according to the most recent update on its status page.

Users still reported a relatively high number of issues with AWS on Downdetector, though many third-party services apparently affected by the AWS outage appeared to be recovering.

It's been a tremendously turbulent Monday for AWS. The popular cloud platform saw a major outage in the early morning hours, briefly recovered, and then experienced new problems around midday.

(Disclosure: Downdetector is owned by Ziff Davis, the same parent company as Mashable.)

UPDATE Monday, 2:15 p.m. ET: Amazon said its efforts to fix its connectivity issues appear to be working. Its widely popular AWS cloud platform suffered renewed issues starting around midday, just hours after a major outage during the early hours of Monday morning.

The company wrote its "mitigations to resolve launch failures" were progressing and that it expected "launch errors and network connectivity issues to subside" as it worked to apply fixes more widely.

UPDATE Monday, 1:15 p.m. ET: Amazon wrote it was working to fix connectivity issues that arose midday Monday ET, hours after a major outage in the early hours of the day.

"We continue to apply mitigation steps for network load balancer health and recovering connectivity for most AWS services," read the latest update from the AWS status page.

Mike Chapple, an IT professor at the University of Notre Dame, said that further issues surfacing after the initial outage is not necessarily a surprising development.

"While this is disruptive, it isn't unusual. The process of fixing a serious IT infrastructure issue often creates new problems, and fixes often need to be rolled out across a large number of systems over time," Chapple said in an emailed statement to Mashable. "As engineers work to steady the system, operations slowly stabilize and things return to normal. Think of it like a utility outage that occurs in a large city. The power might flicker on and off a few times as repair crews do their work. We're seeing something similar now with AWS."

What caused the AWS outage?

The exact reason AWS initially went down remains unknown, but we have an idea. Services using AWS were unable to access DynamoDB, an Amazon-run database, because the Domain Name System (DNS) had a problem. The DNS effectively translates website names into IP addresses. So when Amazon wrote on its Health Dashboard that the DNS issue had been "fully mitigated," it's saying the real problem was fixed.

"Amazon had the data safely stored, but nobody else could find it for several hours, leaving apps temporarily separated from their data," Mike Chapple, an IT professor at University of Notre Dame, told CNN. "It's as if large portions of the internet suffered temporary amnesia."

Featured Video For You

The tech startup on a mission to decarbonise concrete

Rafe Pilling, the director of threat intelligence at the cybersecurity firm Sophos, told The Guardian that the incident didn't appear to be a cyberattack or anything nefarious, which is aligned with Amazon's statements.

"When anything like this happens the concern that it’s a cyber incident is understandable," he told the U.K. outlet. "AWS has a far-reaching and intricate footprint, so any issue can cause a major upset."

It's likely Amazon will, at a later time, explain what happened Monday further. It's unclear how the 10:35 a.m. ET "network connectivity issues" are related, if at all, to the initial issue with the DNS, though it feels reasonable to assume issues could arise as services worked to return to normal.

Why is an AWS outage such a big deal?

In short: AWS is a central pillar of the modern internet. Without it, things crash. As major companies gobbled up market share, it actually made the infrastructure on the internet remarkably fragile — an issue with AWS, or Google, or Microsoft, or Crowdstrike means issues for tons of users.

Advocates even argue that such reliance on these big players is a free speech issue.

"We urgently need diversification in cloud computing," said Dr. Corinne Cath-Speth, head of digital human rights organization Article 19, according to The Guardian. "The infrastructure underpinning democratic discourse, independent journalism, and secure communications cannot be dependent on a handful of companies."

The long and short of it: If something goes wrong with AWS, a lot goes wrong everywhere else.

What caused the AWS outage?

Why is an AWS outage such a big deal?

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签