• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
FinanceFrom the Crowd

Learning the Right Lessons from the Amazon Outage

Fortune Editors
By
Fortune Editors
Fortune Editors
Down Arrow Button Icon
Fortune Editors
By
Fortune Editors
Fortune Editors
Down Arrow Button Icon
April 26, 2011, 2:20 PM ET

Post Author: Brad Feld. Bio:

As most nerds know, Skynet gained self-awareness last week and decided as its first act to mess with Amazon Web Services, creating havoc for anyone that wanted to check-in on the Internet to their current physical location. In hindsight Skynet eventually figured out this was a bad call on its part as it actually wants to know where every human is at any given time. However, Skynet is still trying to get broader adoption of Xbox Live machines, so the Sony Playstation Network appears to still be down.

After all the obvious “oh my god, AWS is down” articles followed by the “see – I told you the cloud wouldn’t work” articles, some thoughtful analysis and suggestions have started to appear. Over the weekend, Dave Jilk, the CEO of Standing Cloud (I’m on the board) asked if I was going to write something about this and – if not – did I want him to write a guest post for me. Since I’ve used my weekend excess of creative energy building a Thing-O-Matic 3D Printer in an effort to show the machines that I come in peace, I quickly took him up on his offer.

Following are Dave’s thoughts on learning the right lessons from the Amazon outage.

Much has already been written about the recent Amazon Web Services outage that has caused problems for a few high-profile companies. Nevertheless, at Standing Cloud we live and breathe the infrastructure-as-a-service (IaaS) world every day, so I thought I might have something useful to add to the discussion. In particular, some media and naysayers are emphasizing the wrong lessons to be learned from this incident.


Wrong lesson #1: The infrastructure cloud is either not ready for prime time, or never will be.

Those who say this simply do not understand what the infrastructure cloud is. At bottom, it is just a way to provision virtual servers in a data center without human involvement. It is not news to anyone who uses them that virtual servers are individually less reliable than physical servers; furthermore, those virtual servers run on physical servers inside a physical data center. All physical data centers have glitches and downtime, and this is not the first time Amazon has had an outage, although it is the most severe.

What is true is that the infrastructure cloud is not and never will be ready to be used exactly like a traditional physical data center that is under your control. But that is obvious after a moment’s reflection. So when you see someone claiming that the Amazon outage shows that the cloud is not ready, they are just waving an ignorance flag.


Wr

ong lesson #2: Amazon is not to be trusted.

On the contrary, the AWS cloud has been highly reliable on the whole. They take downtime seriously and given the volume of usage and the amount of time they have been running it (since 2006), it is not surprising that they would eventually have a major outage of some sort. Enterprises have data center downtime, and back in the day when startups had to build their own, so did they. Some data centers are run better than others, but they all have outages.

What is of more concern are rumors I have heard that Amazon does not actually use AWS for Amazon.com. That doesn’t affect the quality of their cloud product directly, but given that they have lured customers with the claim that they do use it, this does impact our trust in relation to their marketing integrity. Presumably we will eventually find out the truth on that score. In any case, this issue is not related to the outage itself.

Having put the wrong lessons to rest, here are some positive lessons that put the nature of this outage into perspective, and help you take advantage of IaaS in the right way and at the right time.


Right lesson #1: Amazon is not infallible, and the cloud is not magic.

This is just the flip side of the “wrong lessons” discussed above. If you thought that Amazon would have 100% uptime, or that the infrastructure cloud somehow eliminates concerns about downtime, then you need to look closer at what it really is and how it works. It’s just a way to deploy somewhat less reliable servers, quickly and without human intervention. That’s all. Amazon (and other providers) will have more outages, and cloud servers will fail both individually and en masse.

Your application and deployment architecture may not be ready for this. However, I would claim that if it is not, you are assuming that your own data center operators are infallible. The architectural changes required to accommodate the public IaaS cloud are a good idea even if you never move the application there. That’s why smart enterprises have been virtualizing their infrastructure, building private clouds, and migrating their applications to operate in that environment. It’s not just a more efficient use of hardware resources, it also increases the resiliency of the application.


Right lesson #2: Amazon is not the only IaaS provider, and your application should be able to run on more than one.

This requires a bias alert: cloud portability is one of the things Standing Cloud enables for the applications it manages. If you build/deploy/manage an application using our system, it will be able to run on many different cloud providers, and you can move it easily and quickly.

We built this capability, though, because we believed that it was important for risk mitigation. As I have already pointed out, no data center is infallible and outages are inevitable. Further, It is not enough to have access to multiple data centers – the Amazon outage, though focused on one data center, created cascading effects (due to volume) in its other data centers. This, too, was predictable.

Given the inevitability of outages, how can one avoid downtime? My answer is that an application should be able to run on more than one, or many, different public cloud services. This answer has several implications:

  • You should avoid reliance on unique features of a particular IaaS provider if they affect your application architecture. Amazon has built a number of features that other providers do not have, and if you are committed to Amazon they make it very easy to be “locked in.” There are two ways to handle this: first, use a least-common-denominator approach; second, find a substitution for each such feature on a “secondary” service.
  • Your system deployment must be automated. If it is not automated, it is likely that it will take you longer to re-deploy the application (either in a different data center or on a different cloud service) than it will take for the provider to bring their service back up. As we have seen, that can take days. I discuss automation more below.
  • Your data store must be accessible from outside your primary cloud provider. This is the most difficult problem, and how you accomplish it depends greatly on the nature of your data store. However, backups and redundancy are the key considerations (as usual!). All data must be in more than one place, and you need to have a way to fail over gracefully. As the Amazon outage has shown, a “highly reliable” system like their EBS (Elastic Block Storage) is still not reliable enough to avoid downtime.


Right lesson #3: Cloud deployments must be automated and should take cloud server reliability characteristics into account.

Even though I have seen it many times, I am still taken aback when I talk to a startup that has used Amazon just like a traditional data center using traditional methods. Their sysadmins go into the Amazon console, fire up some servers, manually configure the deployment architecture (often using Amazon features that save them time but lock them in), and hope for the best. Oh, they might burn an AMI and save it on S3, in case the server dies (which only works as long as nothing changes). If they need to scale up, they manually add another server and manually add it to the load balancer queue.

This type of usage treats IaaS as mostly a financing alternative. It’s a way to avoid buying capital equipment and conserving financial resources when you do not know how much computing infrastructure you will need. Even the fact that you can change your infrastructure resources rapidly really just boils down to not having to buy and provision those resources in advance. This benefit is a big one for capital-efficient lean startups, but on the whole the approach is risky and suboptimal. The Amazon outage illustrates this: companies that used this approach were stuck during the outage, but at another level they are still stuck with Amazon because their server configurations are implicit.

Instead, the best practice for deploying applications – in the cloud but also anywhere, is by automating the deployment process. There should be no manual steps in the deployment process. Although this can be done using scripts, even better is to use a tool like Chef, Puppet, or cfEngine to take advantage of abstractions in the process. Or use RightScale, Kaavo, CA Applogic, or similar tools to not only automate but organize your deployment process. If your application uses a standard N-tier architecture, you can potentially use Standing Cloud without having to build any automation scripts at all.

Automating an application deployment in the cloud is a best practice with numerous benefits, including:

  • Free redundancy. Instead of having an idle redundant data center (whether cloud or otherwise), you can simply re-deploy your application in another data center or cloud service using on-demand resources. Some of the resources (e.g., a replicated data store) might need to be available at all times, but most of the deployment can be fired up only when it is needed.
  • Rapid scalability. In theory you can get this using Amazon’s auto-scaling features, Elastic Beanstalk, and the like. But these require access to AMIs that are stored on S3 or EBS. We’ve learned our lesson about that, right? Instead, build a general purpose scalability approach that takes advantage of the on-demand resources but keeps it under your control.
  • Server failover can be treated just like scalability. Virtual servers fail more frequently than physical servers, and when they do, there is less ability to recover them. Consequently, a good automation procedure treats scalability and failover the same way – just bring up a new server.
  • Maintainability. A server configuration that is created manually and saved to a “golden image” has numerous problems. Only the person who built it knows what is there, and if that person leaves or goes on vacation, it can be very time consuming to reverse-engineer it. Even that person will eventually forget, and if there are several generations of manual configuration changes (boot the golden image, start making changes, create a new golden image), possibly by different people, you are now locked into that image. All these issues become apparent when you need to upgrade O/S versions or change to a new O/S distribution. In contrast, a fully automated deployment is not only a functional process with the benefits mentioned above, it also serves as documentation.

In summary, let the Amazon Web Services outage be a wake-up call… not to fear the IaaS cloud, but to know it, use it properly, and take advantage of its full possibilities.




Brad has been an early stage investor and entrepreneur for over twenty years. Prior to co-founding Foundry Group, he co-founded Mobius Venture Capital and, prior to that, founded Intensity Ventures, a company that helped launch and operate software companies. Brad is also a co-founder of TechStars.

About the Author
Fortune Editors
By Fortune Editors
See full bioRight Arrow Button Icon

Latest in Finance

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Fortune Secondary Logo
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • World's Most Admired Companies
  • See All Rankings
  • Lists Calendar
Sections
  • Finance
  • Fortune Crypto
  • Features
  • Leadership
  • Health
  • Commentary
  • Success
  • Retail
  • Mpw
  • Tech
  • Lifestyle
  • CEO Initiative
  • Asia
  • Politics
  • Conferences
  • Europe
  • Newsletters
  • Personal Finance
  • Environment
  • Magazine
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
  • Group Subscriptions
About Us
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in Finance

trump
PoliticsWhite House
America’s paying more at the pump. Trump’s new Air Force One jet donated by Qatar is nearly ready
By Jonathan J. Cooper and The Associated PressMay 2, 2026
2 hours ago
croatia
Travel & Leisuretourism
War in Iran has Croatia’s tourist hotspot wondering: will Dubrovnik host another 4 million visitors in 2026?
By Darko Bandic and The Associated PressMay 2, 2026
2 hours ago
shoplift
EconomyGen Z
Gen Z is rebelling against the economy with ‘disillusionomics,’ tackling near 6-figure debt by turning life into a giant list of income streams
By Jacqueline MunisMay 2, 2026
2 hours ago
Suze Orman once said earning more than $800,000 would make her ‘sick to my stomach’—but that turning down Oprah Winfrey cured her self-doubt
SuccessHow I made my first million
Suze Orman once said earning more than $800,000 would make her ‘sick to my stomach’—but that turning down Oprah Winfrey cured her self-doubt
By Orianna Rosa RoyleMay 2, 2026
3 hours ago
Pope Leo XIV encourages wealthy U.S. Catholics to keep donating after Papal Foundation approves most grants in its history
PoliticsPope
Pope Leo XIV encourages wealthy U.S. Catholics to keep donating after Papal Foundation approves most grants in its history
By Nicole Winfield and The Associated PressMay 2, 2026
3 hours ago
Berkshire’s cash pile hits $397.4 billion as profit more than doubles, but annual meeting attendance falls sharply without Warren Buffett as CEO
InvestingBerkshire Hathaway
Berkshire’s cash pile hits $397.4 billion as profit more than doubles, but annual meeting attendance falls sharply without Warren Buffett as CEO
By Josh Funk and The Associated PressMay 2, 2026
3 hours ago

Most Popular

Scott Bessent on financial literacy: 'it drives me crazy' to see young men in blue-collar construction jobs playing the lottery
Personal Finance
Scott Bessent on financial literacy: 'it drives me crazy' to see young men in blue-collar construction jobs playing the lottery
By Fatima Hussein and The Associated PressMay 1, 2026
1 day ago
China dominates the world's lithium supply. The U.S. just found 328 years' worth in its own backyard
North America
China dominates the world's lithium supply. The U.S. just found 328 years' worth in its own backyard
By Jake AngeloApril 30, 2026
2 days ago
A Chick-fil-A worker got fired and then showed up behind the register to allegedly refund himself over $80,000 in mac and cheese
Law
A Chick-fil-A worker got fired and then showed up behind the register to allegedly refund himself over $80,000 in mac and cheese
By Catherina GioinoMay 1, 2026
24 hours ago
Current price of oil as of May 1, 2026
Personal Finance
Current price of oil as of May 1, 2026
By Joseph HostetlerMay 1, 2026
1 day ago
Apple cofounder Ronald Wayne—whose stake would be worth up to $400 billion had he not sold it in 1976—says that at 91, he has no regrets
Success
Apple cofounder Ronald Wayne—whose stake would be worth up to $400 billion had he not sold it in 1976—says that at 91, he has no regrets
By Preston ForeApril 27, 2026
5 days ago
The U.S. economy is booming — just not where 50 million Americans live
Commentary
The U.S. economy is booming — just not where 50 million Americans live
By Derek KilmerMay 1, 2026
1 day ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.