01-10-2012, 11:09 AM
Input Output behavior of supercomputing applications
Input Output behavior.pdf (Size: 183.26 KB / Downloads: 31)
INTRODUCTION :-
This paper describes the collection and analysis of supercomputer I/O traces and the
characteristics and working concepts. From this study we may be familiar with the Conventional File
Systems, Parallelism and multiprocessing, Message Passing Concepts, HPC, Applications Tracing
methods etc. we will also have information about the new additions in this field too. Over the last few
years, CPUs have seen tremendous gains in performance. I/O systems and memory systems, however,
have not enjoyed the same rate of increase. As a result, supercomputer applications are generating
more data, but I/O systems are becoming less able to co-operate with this huge volume of
information. A supercomputer is a computer that performs at or near the currently highest operational
rate for computers. A supercomputer is typically used for scientific and engineering applications that
must handle very large databases or do a great amount of computation (or both). The work done by a
supercomputer is known as supercomputing.
What is supercomputing?
Supercomputing usually refers to compute intensively software applications performed on the
world’s fasted computers. Typical supercomputing applications include scientific programs such as
simulations of the earth’s climate for weather forecasting, molecular dynamics for drug design, and
aero dynamics for car and aircraft design. Today’s fastest supercomputer operates at a speed of about
35 teraflops, that is 35 trillion floating point operations per second (a floating point operation is a
mathematical operation, such as multiplication or division, on two high precision numbers). 35 trillion
(35,000,000,000,000) operations per second sounds like a lot (it actually blows your mind), but you
may be surprised that your new desktop computer is only about 1000 times slower. That is, if you get
1000 of them to work together efficiently, you have a supercomputer. And that is exactly what many
research institutions have done: putting together clusters of Linux based computers running on high
performance Intel processors to obtain an affordable supercomputer
History of supercomputing
The history of supercomputing goes back to the 1960s when a series of computers at Control
Data Corporation (CDC) were designed by Seymour Cray to use innovative designs and parallelism to
achieve superior computational peak performance. The CDC 6600, released in 1964, is generally
considered the first supercomputer.
Cray left CDC in 1972 to form his own company. Four years after leaving CDC, Cray
delivered the 80 MHz Cray 1 in 1976, and it became one of the most successful supercomputers in
history. The Cray-2 released in 1985 was an 8 processor liquid cooled computer and Fluorinert was
pumped through it as it operated. It performed at 1.9 gigaflops and was the world's fastest until 1990.
While the supercomputers of the 1980s used only a few processors, in the 1990s, machines
with thousands of processors began to appear both in the United States and in Japan, setting new
computational performance records. Fujitsu's Numerical Wind Tunnel supercomputer used 166 vector
processors to gain the top spot in 1994 with a peak speed of 1.7 gigaflops per processor. The Hitachi
SR2201 obtained a peak performance of 600 gigaflops in 1996 by using 2048 processors connected
via a fast three dimensional crossbar network. The Intel Paragon could have 1000 to 4000 Intel
i860 processors in various configurations, and was ranked the fastest in the world in 1993.
Why Supercomputing so important?
High performance or super-computing offers much greater capacity for solving larger, single
problems, and most of these are ones that could not be solved using current, more conventional,
computer systems. These problems require a reliable and stable machine so that a very large number
of components can work flawlessly together during one 'run' to obtain a final answer. And whilst size
is important, the machine is also required to have a memory capacity (of both random access memory
and disk space) to carry the load of the assigned task. Thus, a supercomputer is one which can deliver
more computing capacity than most other systems can currently achieve.
Because of their high-speed vector processing ability, supercomputers are ideally suited to
problems that require manipulations of large arrays of data such as computational chemistry,
computational fluid dynamics, structural dynamics, and seismology, to name a few. These problems
all require large numbers of floating-point computations, which are usually vectorizable, over large
data sets: from hundreds of megabytes up to tens or hundreds of gigabytes for some seismic
computations. In most cases, the application performs multiple iterations over the data set, as when
simulating a model through time.
Conventional File Systems
Multiprocessors are always exacerbating a problem that, as the number of disks and tape
drives in the I/O system, and thus aggregate I/O bandwidth, increase. Bandwidth is not usually scaled
up at the same rate as the aggregate processing speed, however Caching, the most effective method
for reducing I/O bandwidth requirements, has been widely used in conventional file systems. It
succeeds because of the properties commonly exhibited by many workstation and minicomputer
applications. Prefetching (Gather items and bring them to a particular location ahead of time) data into
a cache also reduces the instantaneous demand on an I/O system by spreading out demand and by
predicting I/O references. This has the effect of reducing the number of requests, though not
necessarily the amount of data transferred. In sequential reads and writes accounted for over 90% of
the accesses to files which were either read or written, but not both, and about 67% of total data
transferred.
Another method of reducing I/Os from cache to disk is delayed writes. Delayed writes require
a write-behind cache policy, which allows a program to continue executing after writing data to the
cache without waiting for the data to be written to disk. In Sprite, data is not written back to disk for
30 to 60 seconds. Every 30 seconds, all data in the cache that is older than 30 seconds is written to
disk, allowing the operating system to group all the writes. This allows temporary files which exist for
less than 30 seconds, such as those generated by compilers, to be deleted and thus never written to
disk. The 30 second delay is itself a file system parameter, and balances the reduced bandwidth to
disk against the risk of losing data by not writing it to disk immediately. By totally eliminating disk
I/Os associated with such files, the required bandwidth from cache to disk is reduced further.
High-performance computing (HPC)
High-performance computing (HPC) is the use of parallel processing for running advanced
application programs efficiently, reliably and quickly. The term applies especially to systems that
function above a teraflop or 1012 floating-point operations per second. The term HPC is occasionally
used as a synonym for supercomputing, although technically a supercomputer is a system that
performs at or near the currently highest operational rate for computers. Some supercomputers work
at more than a petaflop or 1015 floating-point operations per second.
The most common users of HPC systems are scientific researchers, engineers and academic
institutions. Some government agencies, particularly the military, also rely on HPC for complex
applications. High-performance systems often use custom-made components in addition to so-called
commodity components. As demand for processing power and speed grows, HPC will likely interest
businesses of all sizes, particularly for transaction processing and data warehouses. An
occasional techno-fiends might use an HPC system to satisfy an exceptional desire for advanced
technology.
Applications Traced
This study was an analysis of the I/O patterns of actual applications. We gathered traces from
a variety of applications running on Cray computers(Cray Inc. is a supercomputer manufacturer
based in Seattle, Washington.). We chose to trace applications with high I/O rates, both in megabytes
per second and I/Os per second. While many supercomputer applications do not perform large
amounts of I/O, we decided to concentrate on applications that do a lot of I/O.. Those that perform
little I/O are easy to characterize, as will be shown with the two traces that had low levels of I/O.
The traces fell into several categories. Most were computational fluid dynamics (CFD)
problems, which are concerned with modeling the flows of fluids, such as water and air. However,
each modeled different physical objects and made use of different algorithms. Several of the programs
were climate models, while others modeled vortices around a moving blade. One program solved a
structural dynamics problem, and one did polynomial factorization. In Table 1, we summarize some
basic information about the applications. Running time is the amount of CPU time each program
required. All of the other numbers are relative to this time, not elapsed wall clock time. Total I/O
done is the total amount of data the program read and wrote, and number of I/Os is the number of
read and write calls the program made to the file system. The total size of the data set, which was the
sum of the sizes of all the files the program accessed, is listed under total data size.