22-02-2013, 09:43 AM
CLOUD COMPUTING A SEMINAR REPORT
CLOUD COMPUTING.pdf (Size: 394.75 KB / Downloads: 113)
Abstract
Computers have become an indispensable part of life. We need
computers everywhere, be it for work, research or in any such field. As the use
of computers in our day-to-day life increases, the computing resources that we
need also go up. For companies like Google and Microsoft, harnessing the
resources as and when they need it is not a problem. But when it comes to
smaller enterprises, affordability becomes a huge factor. With the huge
infrastructure come problems like machines failure, hard drive crashes,
software bugs, etc. This might be a big headache for such a community. Cloud
Computing offers a solution to this situation.
Cloud computing is a paradigm shift in which computing is moved away
from personal computers and even the individual enterprise application server
to a ‘cloud’ of computers. A cloud is a virtualized server pool which can
provide the different computing resources of their clients. Users of this system
need only be concerned with the computing service being asked for. The
underlying details of how it is achieved are hidden from the user. The data and
the services provided reside in massively scalable data centers and can be
ubiquitously accessed from any connected device all over the world.
Introduction
The Greek myths tell of creatures plucked from the surface of the Earth and
enshrined as constellations in the night sky. Something similar is happening today in
the world of computing. Data and programs are being swept up from desktop PCs and
corporate server rooms and installed in “the compute cloud”. In general, there is a
shift in the geography of computation.
What is cloud computing exactly? As a beginning here is a definition
“An emerging computer paradigm where data and services
reside in massively scalable data centers in the cloud and
can be accessed from any connected devices over the
internet”
Like other definitions of topics like these, an understanding of the term cloud
computing requires an understanding of various other terms which are closely related
to this. While there is a lack of precise scientific definitions for many of these terms,
general definitions can be given.
Cloud Computing
A definition for cloud computing can be given as an emerging computer
paradigm where data and services reside in massively scalable data centers in the
cloud and can be accessed from any connected devices over the internet.
Cloud computing is a way of providing various services on virtual machines
allocated on top of a large physical machine pool which resides in the cloud. Cloud
computing comes into focus only when we think about what IT has always wanted - a
way to increase capacity or add different capabilities to the current setting on the fly
without investing in new infrastructure, training new personnel or licensing new
software. Here ‘on the fly’ and ‘without investing or training’ becomes the keywords
in the current situation. But cloud computing offers a better solution.
We have lots of compute power and storage capabilities residing in the
distributed environment of the cloud. What cloud computing does is to harness the
capabilities of these resources and make available these resources as a single entity
which can be changed to meet the current needs of the user. The basis of cloud
computing is to create a set of virtual servers on the available vast resource pool and
give it to the clients. Any web enabled device can be used to access the resources
through the virtual servers. Based on the computing needs of the client, the
infrastructure allotted to the client can be scaled up or down.
Need for Cloud Computing
What could we do with 1000 times more data and CPU power? One simple
question. That’s all it took the interviewers to bewilder the confident job applicants at
Google. This is a question of relevance because the amount of data that an application
handles is increasing day by day and so is the CPU power that one can harness.
There are many answers to this question. With this much CPU power, we
could scale our businesses to 1000 times more users. Right now we are gathering
statistics about every user using an application. With such CPU power at hand, we
could monitor every single user click and every user interaction such that we can
gather all the statistics about the user. We could improve the recommendation systems
of users. We could model better price plan choices. With this CPU power we could
simulate the case where we have say 1,00,000 users in the system without any
glitches.
There are lots of other things we could do with so much CPU power and data
capabilities. But what is keeping us back. One of the reasons is the large scale
architecture which comes with these are difficult to manage. There may be many
different problems with the architecture we have to support. The machines may start
failing, the hard drives may crash, the network may go down and many other such
hardware problems. The hardware has to be designed such that the architecture is
reliable and scalable. This large scale architecture has a very expensive upfront and
has high maintenance costs. It requires different resources like machines, power,
cooling, etc. The system also cannot scale as and when needed and so is not easily
reconfigurable.
Map Redu
Map Reduce is a software framework developed at Google in 2003 to support
parallel computations over large (multiple petabyte) data sets on clusters of
commodity computers. This framework is largely taken from ‘map’ and ‘reduce’
functions commonly used in functional programming, although the actual semantics
of the framework are not the same. It is a programming model and an associated
implementation for processing and generating large data sets. Many of the real world
tasks are expressible in this model. MapReduce implementations have been written in
C++, Java and other languages.
Programs written in this functional style are automatically parallelized and
executed on the cloud. The run-time system takes care of the details of partitioning
the input data, scheduling the program’s execution across a set of machines, handling
machine failures, and managing the required inter-machine communication. This
allows programmers without any experience with parallel and distributed systems to
easily utilize the resources of a largely distributed system.
Google File System
Google File System (GFS) is a scalable distributed file system developed by
Google for data intensive applications. It is designed to provide efficient, reliable
access to data using large clusters of commodity hardware. It provides fault tolerance
while running on inexpensive commodity hardware, and it delivers high aggregate
performance to a large number of clients.
Files are divided into chunks of 64 megabytes, which are only extremely rarely
overwritten, or shrunk; files are usually appended to or read. It is also designed and
optimized to run on computing clusters, the nodes of which consist of cheap, "commodity" computers, which means precautions must be taken against the high
failure rate of individual nodes and the subsequent data loss. Other design decisions
select for high data throughputs, even when it comes at the cost of latency.
The nodes are divided into two types: one Master node and a large number of
Chunkservers. Chunkservers store the data files, with each individual file broken up
into fixed size chunks (hence the name) of about 64 megabytes, similar to clusters or
sectors in regular file systems. Each chunk is assigned a unique 64-bit label, and
logical mappings of files to constituent chunks are maintained. Each chunk is
replicated several times throughout the network, with the minimum being three, but
even more for files that have high demand or need more redundancy.
Hadoop
Hadoop is a framework for running applications on large cluster built of
commodity hardware. The Hadoop framework transparently provides applications
both reliability and data motion. Hadoop implements the computation paradigm
named MapReduce which was explained above. The application is divided into many
small fragments of work, each of which may be executed or re-executed on any node
in the cluster. In addition, it provides a distributed file system that stores data on the
compute nodes, providing very high aggregate bandwidth across the cluster. Both
MapReduce and the distributed file system are designed so that the node failures are
automatically handled by the framework. Hadoop has been implemented making use
of Java. In Hadoop, the combination of the entire JAR files and classed needed to run
a MapReduce program is called a job. All of these components are themselves
collected into a JAR which is usually referred to as the job file. To execute a job, it is
submitted to a jobTracker and then executed.
Cloud Computing Services
Even though cloud computing is a pretty new technology, there are many
companies offering cloud computing services. Different companies like Amazon,
Google, Yahoo, IBM and Microsoft are all players in the cloud computing services
industry. But Amazon is the pioneer in the cloud computing industry with services
like EC2 (Elastic Compute Cloud) and S3 (Simple Storage Service) dominating the
industry. Amazon has an expertise in this industry and has a small advantage over the
others because of this. Microsoft has good knowledge of the fundamentals of cloud
science and is building massive data centers. IBM, the king of business computing
and traditional supercomputers, teams up with Google to get a foothold in the clouds.
Google is far and away the leader in cloud computing with the company itself built
from the ground up on hardware.
IBM Google University Academic Initiative
Google and IBM came up with an initiative to advance large-scale distributed
computing by providing hardware, software, and services to universities. Their idea
was to prepare students "to harness the potential of modern computing systems," the
companies will provide universities with hardware, software, and services to advance
training in large-scale distributed computing. The two companies aim to reduce the
cost of distributed computing research, thereby enabling academic institutions and
their students to more easily contribute to this emerging computing paradigm. Eric
Schmidt, CEO of Google, said in a statement.
Conclusion
Cloud computing is a powerful new abstraction for large scale data processing
systems which is scalable, reliable and available. In cloud computing, there are large
self-managed server pools available which reduces the overhead and eliminates
management headache. Cloud computing services can also grow and shrink according
to need. Cloud computing is particularly valuable to small and medium businesses,
where effective and affordable IT tools are critical to helping them become more
productive without spending lots of money on in-house resources and technical
equipment. Also it is a new emerging architecture needed to expand the Internet to
become the computing platform of the future.