An AWS outage knocked out your internet while you slept, and the problems continue

The Internet started the week the way many of us often want to: by refusing to go to work. An Amazon Web Services outage left large portions of the Internet unavailable Monday morning. Sites and services including Snapchat, Fortnite, Venmo, PlayStation Network, and, as expected, Amazon, were intermittently unavailable through the start of the day.

The outage began shortly after midnight PT, and took about 3.5 hours for Amazon to fully resolve. Social networks and streaming services were among more than 1,000 businesses affected, while vital services such as online banking were also halted.

The issues appeared to be largely resolved as the US East Coast went live, but they spiked again after 8 a.m. PT as work began on the West Coast.

AWS, a cloud services provider owned by Amazon, powers large parts of the Internet. So when it went down, it took away many of the services we know and love about it. As is the case with quickly and Mass strike Service Outages Over the past few years, AWS outages show how much the Internet relies on the same infrastructure — and how quickly our access to the sites and services we rely on is revoked when something goes wrong.

Relying on a small number of major companies to support the Internet is like putting all our eggs in a small handful of baskets. When it works, it’s great, but it only takes one thing to go wrong and the Internet is brought to its knees in a matter of minutes.

How common are AWS outages?

Just after midnight PT on October 20, AWS first recorded an issue on its site Service status pageSaying it is “investigating increased error rates and response times for multiple AWS services in the US East region.” At around 2 a.m. PT, it said it had identified the potential root cause of the issue. Within half an hour, I began implementing mitigation measures that showed significant signs of recovery.

“The underlying DNS issue has been fully mitigated, and most AWS service operations are now succeeding normally,” AWS said at 3:35 a.m. PT. The company did not respond to a request for further comment other than to return us to the AWS Health Dashboard.

But as of 8:43 a.m. PT, many services were still affected, and the AWS status page showed the severity as “degraded.” In a post at the time, AWS noted: “We are limiting new EC2 instance launch requests to aid recovery and actively working on mitigations.”

Chart showing Amazon Web Services outages reported on Downdetector — The AWS outage first peaked before dawn on Monday in the US, then declined, then rose again around midday.

Downdetector/screenshot by CNET

At a time when AWS says it first started noticing error rates, Downdetector saw reports starting to rise across several online services, including banks, airlines and phone companies. While AWS has resolved the issue, some of these reports have seen a decrease, while others have not yet returned to normal. (Disclosure: Downdetector is owned by the same parent company as CNET, Ziff Davis.)

At around 4 a.m. PT, Reddit was still down, while services including Ring, Verizon, and YouTube were still seeing a large number of reported issues. Reddit finally came back online at around 4:30 a.m. PT, according to its status page, which was then verified by us.

In total, Downdetector saw more than 6.5 million reports, with 1.4 million coming from the US, 800,000 from the UK, and the rest largely spread across Australia, Japan, the Netherlands, Germany and France. Downdetector added that more than 1,000 businesses in total were affected.

“This type of outage, where basic Internet service takes down a wide range of online services, only happens a few times a year,” Daniel Ramirez, Ookla’s Downdetector product manager, told CNET. “It’s probably becoming a little more frequent as companies are encouraged to rely entirely on cloud services and their data architectures are designed to get the most out of a particular cloud platform.”

What caused the AWS outage?

AWS did not immediately share full details about why the Internet fell off a cliff this morning. Then at 8:43 AM PT, I provided this brief description: “The root cause is the underlying internal subsystem responsible for monitoring the health of our network load balancers.”

Earlier today, it attributed the outage to a “DNS issue.” DNS stands for Domain Name System and refers to the service that translates human-readable Internet addresses (for example, CNET.com) into machine-readable IP addresses that connect browsers to websites.

A screenshot of the Downdetector page showing an AWS outage affecting sites and services including Reddit, Snapchat, Ring, Roblox, and Fortnite. — The internet has been brought to its knees as several sites reported outages early Monday, according to Downdetector.

Downdetector/screenshot by CNET

When a DNS error occurs, the translation process cannot be performed, resulting in the connection being interrupted. Domain Name System (DNS) errors are common on the Internet, but they usually occur on a small scale, affecting individual sites or services. But given the widespread use of AWS, a DNS error can lead to equally widespread consequences.

According to Amazon, the issue is geographically rooted in the US East 1 region, which refers to an area in northern Virginia where many… Data centers based. It is an important site for Amazon, as well as many other Internet companies, and supports services that extend across the United States and Europe.

“The lesson here is flexibility,” said Luke Kehoe, an industry analyst at Okla. “Many organizations still concentrate critical workloads in a single cloud region. Distributing critical applications and data across multiple regions and availability zones can significantly reduce the blast radius of future incidents.”

Was the AWS outage due to a cyberattack?

DNS issues can be caused by malicious actors, but there is no evidence at this point to suggest that this is the case for the AWS outage.

However, technical bugs can pave the way for hackers to find and exploit vulnerabilities when companies roll back and defenses fail, according to Marius Bridis, CTO at NordVPN. “This is as much a cybersecurity issue as it is a technical issue,” he said in a statement. “True online security isn’t just about keeping hackers out, it’s also about ensuring you can stay connected and protected when systems fail.”

Bridis added that in the coming hours, people should be on the lookout for scammers hoping to take advantage of people’s awareness of service outages. You should be very wary of phishing attacks and emails asking you to change your password to protect your account.

How common are AWS outages?

What caused the AWS outage?

Was the AWS outage due to a cyberattack?

Leave a ReplyCancel Reply