Hurricane-like storms knocked an Amazon data center in Ashburn, Virginia, offline Friday night, and a chunk of the Internet felt it. The incident temporarily cut off a number of popular internet services, including Netflix, Pinterest, Heroku, and Instagram.
I have written often about the advantages of cloud computing, including lots of redundancy and high uptimes, so this news is especially interesting. In theory, big outages like this aren’t supposed to happen. Amazon is supposed to keep the data centers up and running all the time. In fact, Amazon has done a good job of keeping things running, but not a perfect job. Friday's outage proves that even Amazon has a few chinks left in the armor.
In reality, all cloud-based data centers have outages all the time. After all, these cloud-based data centers run the same hardware and (mostly) the same software as any other data centers. Hardware and software failures can happen. In fact, in any data center with tens of thousands of servers online, several hardware failures will occur every day. Amazon is no different from anyone else when it comes to failures. Amazon even tells its customers to plan for this to happen, and to be ready to roll over to a new data center whenever there’s an outage.
What is different in cloud computing, however, is the redundancy available. Every server should have multiple other servers in other locations running as "hot spares," ready to go on short notice. While outages can and will occur, such outages should be very brief and also should be invisible to most users. The failure Friday night wasn't surprising as a power failure. However, it was surprising that the switch to backup servers in other locations did not work properly.
It looks like an Amazon Elastic Load Balancing (ELB) service, designed to spread processing loads across data centers, failed during the outage. Without that ELB service working properly, the Netflix and Pintrest services hosted by Amazon crashed. The "hot spares" were available and ready to go, but the Elastic Load Balancing service never notified them to start operating.
Perhaps the silliest comment I later read was from a not-well-known blogger who stated that corporations are going to need to re-think their use of the cloud. He claimed the cloud is not reliable.
Not reliable? Compared to what?
No doubt that the cloud will fail again. However, the present cloud has already proven to be more reliable than any corporation's private data center. Even with a rare outage or two, Amazon's uptime remains at ninety-nine-point-nine-nine-nine-something-or-other percent. Did your company's privately-owned data center do better than Amazon?
If you enjoyed this article, please share it with others. Tweet it, share it on Google+, Facebook or on your preferred social network.
Republishing of this article in newsletters, blogs, and elsewhere is allowed and encouraged, with a few minor restrictions. Details may be found at http://goo.gl/hoHH1.
Of course, if you haven’t done so already, you should join my email newsletter mailing list to stay current on my latest articles and announcements. You can also cancel at any time within seconds.
You also might like to leave a comment below.
