What massive AWS outage reveals about the internet


Huge cloud Outage caused by Amazon Web Services The main US-EAST-1 area, centered near the US Capitol in northern Virginia, caused widespread disruption of websites and platforms around the world on Monday morning. Amazon’s main e-commerce platform and other properties including Ring the doorbells and Alexa smart assistant It suffered outages and interruptions throughout the morning, as did the Meta communications platform WhatsApp, Opic’s AAIP, PayPal Venmo Payment platform, multiple web services from Epic Games, several British government websites, and many more.

The outages were caused by Amazon’s DynamoDB APIs in the US-EAST-1 region, AWS said in a statement. Status updates That the problem was specifically related to DNS resolution issues. A Domain Name System is a basic Internet service that essentially acts as an automatic phone book lookup to translate web URLs such as “www.wired.com” into digital server IP addresses so that users’ web browsers display the correct content. DNS “resolution” problems occur when DNS servers don’t connect these dots accurately and, to maintain the phone book analogy, provide the wrong numbers for a given name, or vice versa.

“Based on our investigation, the issue appears to be related to a DNS resolution of the DynamoDB API endpoint on US-EAST-1,” AWS wrote in status updates on Monday. Shortly after, the company added: “If you are still having trouble resolving DynamoDB service endpoints in US-EAST-1, we recommend clearing your DNS cache.”

An AWS spokesperson did not immediately respond when asked for details about the nature of the failure. DNS resolution issues It could be Harmful– known as DNS hijacking— but there’s no indication that Monday’s AWS outages were outrageous.

“When the system couldn’t correctly determine which server to connect to, cascading failures disrupted online services,” says Duffy Ottenheimer, a longtime security operations and compliance director and vice president at data infrastructure firm Inrupt. “Today’s AWS outage is a classic availability issue, and we need to start seeing it as a failure of data integrity.”

The problems started around 3 a.m. ET. By 5:22 a.m. ET, AWS had implemented “initial mitigations” that were taking effect. At 6:35 a.m. ET, Amazon said it had fully addressed the underlying technical issues but that “some services will have a backlog, which may take additional time to fully address.”

Leave a Reply

Your email address will not be published. Required fields are marked *