Mashable: What We Can Learn From Amazon’s Cloud Collapse

Call it Cloudgate, Cloudpocalyse or whatever you’d like, but the extended collapse of Amazon Elastic Cloud Compute (EC2) is both a setback for cloud computing and an opportunity for us to figure out how to stop it from happening again.

Amazon may be best-known for its online shopping site, but it also has a substantial cloud computing business. It provides a scalable, flexible and particularly efficient solution for companies to store and deliver massive amounts of content. Its model of only paying for what you consume was a radical innovation when it launched in 2006.

In fact, Amazon Web Services has been so affordable and reliable that thousands of companies from Foursquare to Netflix utilize the company’s cloud computing technology and servers to run their businesses. They put their faith in Amazon’s cloud because there was no reason to think that it would falter. One of cloud computing’s key tenants is reliability through redundancy of both servers and data centers.

Then on Wednesday, Amazon’s northern Virginia data center started experiencing problems that caused major latency and connectivity issues. The trouble was apparently due to excessive re-mirroring of its Elastic Block Storage (EBS) volumes — this essentially created countless new backups of the EBS volumes that took up Amazon’s storage capacity and triggered a cascading effect that caused downtime on hundreds (or more likely thousands) of websites for almost 24 hours.

More of the Mashable article from Ben Paar

Alex Carroll

Alex Carroll

Managing Member at Lifeline Data Centers
Alex, co-owner, is responsible for all real estate, construction and mission critical facilities: hardened buildings, power systems, cooling systems, fire suppression, and environmentals. Alex also manages relationships with the telecommunications providers and has an extensive background in IT infrastructure support, database administration and software design and development. Alex architected Lifeline’s proprietary GRCA system and is hands-on every day in the data center.