Airbnb Didn’t Have to Fail
Today part of Amazon Web Services failed, taking down with it a slew of startups that all run on Amazon’s Cloud infrastructure. Airbnb was one of the biggest, but also Heroku, Reddit, Minecraft, Flipboard & Coursera were down with it.
As it’s, you might think what the heck happened, and why should we care? Well, this is comparatively a stress-free way to earn some money from our property and Airbnb rentals are also cheaper than others.
In this article, we will explain the reason why Airbnb has failed and what it can do to return to its normal position. Without any further ado, let’s get started!
Root Cause Of Why Airbnb Has Failed
The AWS service allows companies like Airbnb to build web applications, and host them on servers owned and managed by Amazon. The so-called raw iron of this army of computing power sits in data centers. Each data center is a zone, and there are many in each of their service regions including US East (Northern Virginia), US West (Oregon), US West (Northern California), EU (Ireland), Asia Pacific (Singapore), Asia Pacific (Tokyo), South America (Sao Paulo), and AWS GovCloud.
Today one of those data centers in the Northern Virginia region had a failure. What does this mean? Essentially firms like Airbnb that hosted their applications ONLY in Northern Virginia experienced outages.
As it turns out, Amazon has a service level agreement of 99.95% availability. We’ve long since said goodbye to the five nines. HA is overrated.
How Can Airbnb Be Resilient?
Here are some factors that Airbnb should consider to get their position back:
1. Use Redundancy
Although there are lots of pieces and components to a web infrastructure, two big ones are web servers and database servers. Turns out Airbnb could make both of these tiers redundant. How do we do it?
On the database side, you can use Amazon’s multi-az or alternately read replicas. Each has different service characteristics so you’ll have to evaluate your application to figure out what will work for you.
Then there is the option to host MySQL or Percona directly on Amazon servers yourself and use replication.
Using redundant components like placing web servers and databases in multiple regions, Airbnb could avoid an Amazon outage like Monday’s that affected only Northern Virginia.
Now that you’re using multiple zones and regions for your database the hard work is completed. Webservers can be hosted in different regions easily, and don’t require complicated replication to do it.
2. Have A Browsing-Only Mode
Another step Airbnb can take to be resilient is to build a browsing-only mode into their application. Often, we hear about this option for performing maintenance without downtime. But it’s even more valuable during a situation like this. In a real outage, you don’t have control over how long it lasts or WHEN it happens. So, a browsing-only mode can provide real insurance.
For a site like Airbnb, this would mean the entire website was up and operating. Customers could browse and view listings, only when they went to book a room would they encounter an error. This would be a very small segment of their customers and a much less painful PR problem.
Facebook has experienced intermittent outages of its service. People hardly notice because they’ll often only see a message when they are trying to comment on someone’s wall post, send a message, or upload a photo. The site is still operating, but not allowing changes. That’s what a browsing-only mode affords you.
A browsing-only mode can make a big difference, keeping most of the site up even when transactions or publishing are blocked.
Drupal, an open-source CMS system that powers sites like Adweek.com, TheHollywoodReporter.com, and Economist.com uses this technology. It supports a browsing-only mode out of the box. An Amazon outage like this one would only stop editors from publishing new stories temporarily. A huge win to sites that get 50 to 100 million with-an-m pageviews per month.
3. Web Applications Need Feature Flags
Feature flags give you an on/off switch. Build them into heavy-duty parts of your site, and you can disable those in an emergency. Host components multiple availability zones for extra peace of mind.
4. Consider Netflix’s Simian Army
Netflix takes a very progressive approach to availability. They bake redundancy and automation right into all of their infrastructure. Then they run an app called the Chaos Monkey which essentially causes outages, randomly. If resilience from constantly falling and getting back up can’t make you stronger, I don’t know what can!
5. Use Multiple Cloud Providers
If all of the above isn’t enough for you, taking it further you’d do as Enstratus recommends and use multiple cloud providers. Not being beholden to one company could help in more situations than just these types of service disruptions too.
Basic EC2 Best Practices mean building redundancy into your infrastructure. Multiple cloud providers simply take that one step further.
Frequently Asked Questions (FAQs)
What Is The Failure Rate Of Airbnb?
According to Airbnb, the failure rate of them is fewer than 0.1% of stays resulting in a reported safety issue. However, it is still a lot of trips with bad endings with more than 200 million bookings a year.
What Is The Biggest Issue With Airbnb?
According to research on more than 125,000 Airbnb complaints, there are 72% of the issues are related to poor customer service and 22% are related to scams.
Is Airbnb Not Profitable?
Yes, Airbnb is still profitable. Because there must be a way for you to make money as the tourist market is HUGE. All you need to do is, pay attention to the location.
Conclusion
Airbnb is now growing continually it is better to focus on previous mistakes to evolve in this ever-changing market. The main reason why Airbnb had failed and what they can do build resilience are discussed in this article. For any further queries regarding this topic, our comment box is always open for you. Thanks for reading!