21-05-2013, 11:54 AM
Amazon Web Services – Building Fault-Tolerant Applications on AWS
Amazon Web Services.pdf (Size: 857.04 KB / Downloads: 11)
Introduction
Software has become a vital aspect of everyday life in nearly every part of the world. No matter where we are, we interact with software–whether that is by using our mobile phone, withdrawing money from an automated bank machine, or even by just stopping at a traffic light.
Because software has become such an integral part of our daily lives, a great deal of work has to be done to ensure that this software remains operational and available.
Generally speaking, this area of study is known as fault-tolerance, the ability for a system to remain in operation even if some of the components used to build the system fail.
Although it’s true that essential systems must be available at all times, we also expect a much wider range of software to always be available to us. For example, we may want to visit an e-commerce site to purchase a product. Whether it is at 9:00am on a Monday morning or 3:00am on a holiday, we expect that the site will be available and ready to accept our purchase. The cost of not meeting these expectations can be crippling to many businesses. Even with very conservative assumptions, it is estimated that a busy e-commerce site could lose thousands of dollars for every minute it is unavailable. This is just one example of why businesses and organizations strive to develop software systems that can survive faults.
Amazon Web Services (AWS) provides a platform that is ideally suited for building fault-tolerant software systems. However, this attribute is not unique to our platform. Given enough resources and time, one can build a fault-tolerant software system on almost any platform. The AWS platform is unique because it enables you to build fault-tolerant systems that operate with a minimal amount of human interaction and up-front financial investment.
Failures Shouldn’t
When a server crashes or a hard disk runs out of room in an on-premises datacenter environment, administrators are notified immediately, because these are noteworthy events that require at least their attention — if not their intervention as well. The ideal state in a traditional, on-premises datacenter environment tends to be one where failure notifications are delivered reliably to a staff of administrators who are ready to spring into action in order to solve the problem. Many organizations are able to reach this state of IT nirvana – however, doing so typically requires extensive experience, up-front financial investment, and significant human resources.
This is not the case when using the platform provided by Amazon Web Services. Ideally, failures in an application built on our platform can be dealt with automatically by the system itself, and as a result, are fairly uninteresting events.
Amazon Web Services gives you access to a vast amount of IT infrastructure–computing, storage, and communications–that you can allocate automatically (or nearly automatically) to account for almost any kind of failure. You are only charged for resources that you actually use, so there is no up-front financial investment to be made.
Amazon Machine Images
Amazon Elastic Compute Cloud (Amazon EC2) is a web service within Amazon Web Services that provides computing
resources – literally server instances – that you use to build and host your software systems. Amazon EC2 is a natural
entry point to Amazon Web Services for your application development. You can build a highly reliable and fault-tolerant
system using multiple EC2 instances—using the tools and ancillary services such as Auto Scaling and Elastic Load
Balancing.
On the surface, Amazon EC2 instances are very similar to traditional hardware servers. Amazon EC2 instances use
familiar operating systems like Linux, Windows, or OpenSolaris. As such, they can accommodate nearly any kind of
software that can run on those operating systems. Amazon EC2 instances have IP addresses so the usual methods of
interacting with a remote machine (e.g., SSH or RDP) can be used.
The template that you use to define your service instances is called an Amazon Machine Image (AMI). This template
basically contains a software configuration (i.e., operating system, application server, and applications) and is applied to
an instance type1.
Auto Scaling
The concept of automatically provisioning and scaling compute resources is a crucial aspect of any well-engineered, fault-tolerant application running on the Amazon Web Services platform. Auto Scaling3 is a powerful option that you can very easily apply to your application.
Auto Scaling enables you to automatically scale your Amazon EC2 capacity up or down. You can define rules that determine when more (or fewer) server instances are needed, such as:
1. When the number of functioning server instances is above (or below) a certain number, launch (or terminate) server instances
2. When the resource utilization (i.e. CPU, network or disk) of the server instance fleet is above (or below) a certain threshold, launch (or terminate) server instances. Such metrics will be collected from the Amazon CloudWatch service, which monitors Amazon EC2 instances.