30-04-2012, 01:09 PM
data placement issues in grid environment
Data Palcement Issue in Grid Computing Environment.docx (Size: 363.86 KB / Downloads: 43)
INTRODUCTION
We know that most IT departments are being forced to do more with less. Budgets are tight, resources are thin, and skilled human resources can be scarce or expensive. Almost every organization is sitting on top of enormous, unused computing capacity, widely distributed. Mainframes are idle 40% of the time. UNIX servers serve something less than 10% of the time. And most PCs do nothing for 95% of a typical day. Grid computing is an emerging technology, where we can unite a pool of servers, PCs, storage systems and networks into one large system to deliver nontrivial qualities of service. For an end user or application it looks like one big virtual computing system. Grid computing is a network of computation. Grid technology allows organizations to use numerous computers to solve problems by sharing computing resources
While Grid job scheduling has received much attention in the recent years, relatively few researchers have studied data placement issues. Although job management is important in Grid computing, data management and placement is likely to be among the most challenging issues for future Grid applications.
WHAT IS A GRID
The name Grid computing originates from the comparison of a computational Grid with the power grid, which provides access to electricity on demand through wall sockets. Users do not have to concern themselves with how and where the electricity is coming from. In the overall Grid vision, a Grid is a system that provides access to computational resources on demand without requiring knowledge about how and where these resources are located. One of the pioneers in the field of Grid computing, created a three point checklist that defines a Grid as a system that:
Types of Grid
Often, grids are categorized by the type of solutions that they best address. The three primary types of grids are computational grid scavenging grid and data grid. There are no hard boundaries between these grid types and often grids may be a combination of two or more of these. The type of grid environment that we are using will affect many of our decisions about the applications that are developed to run in a grid environment.
DATA PLACEMENT ISSUES AND SOLUTIONS
Data placement issues are mostly a result of the distributed nature of the Grid and arise from the commonly used functionalities offered by data placement services. The herein discussed issues are storage discovery, storage allocation, file name virtualization, data replication, file consistency control, reliable file transfer, job-aware data placement optimization and ensuring system state consistency. The issues and their possible solutions are formulated without loss of generality to make it possible to adapt the ideas to other systems than the DPS. In the following sections the term user refers to a human user of the system whereas client refers to either a human user or a client application.
Storage Allocation
The motivating scenario for storage allocation is similar to the one for storage discovery. A user wants to upload one or more files to a set of SEs. In order to do this there has to be enough space on the SEs. As multiple users can upload files to an SE concurrently, the storage may become full, even though all users assert that free space is available before starting their uploads. The problem is that the space requirements are not met during the whole file transfer and that the aggregated space requirements for multiple users could not be fulfilled by the SE. Therefore, it should be possible to allocate space for a file transfer. This should be possible both on a per file basis and a bulk basis. The latter is to allocate storage space for a set of files instead of allocating once per file.