27-03-2014, 11:29 AM
TEN ACTIONS WHEN GRID SCHEDULING
TEN ACTIONS WHEN GRID.pdf (Size: 89.07 KB / Downloads: 48)
Abstract
In this chapter we present a general architecture or plan for scheduling on a
Grid. A Grid scheduler (or broker) must make resource selection decisions in an
environment where it has no control over the local resources, the resources are
distributed, and information about the systems is often limited or dated. These
interactions are also closely tied to the functionality of the Grid Information
Services. This Grid scheduling approach has three phases: resource discovery,
system selection, and job execution. We detail the steps involved in each phase.
INTRODUCTION
More applications are turning to Grid computing to meet their computa-
tional and data storage needs. Single sites are simply no longer efficient for
meeting the resource needs of high-end applications, and using distributed re-
sources can give the application many benefits. Effective Grid computing is
possible, however, only if the resources are scheduled well.
Grid scheduling is defined as the process of making scheduling decisions
involving resources over multiple administrative domains. This process can
include searching multiple administrative domains to use a single machine or
scheduling a single job to use multiple resources at a single site or multiple
sites. We define a job to be anything that needs a resource – from a bandwidth
request, to an application, to a set of applications (for example, a parameter
sweep). We use the term resource to mean anything that can be scheduled:
a machine, disk space, a QoS network, and so forth. In general, for ease of
use, in this chapter we refer to resources in terms associated with compute
resources; however, nothing about the approach is limited in this way.
GRID INFORMATION SERVICE
The decisions a scheduler makes are only as good as the information pro-
vided to it. Many theoretical schedulers assume one has 100 percent of the
information needed, at an extremely fine level of detail, and that the informa-
tion is always correct. In Grid scheduling, this is far from our experience. In
general we have only the highest level of information. For example, it may
be known that an application needs to run on Linux, will produce output files
somewhere between 20 MB and 30 MB, and should take less than three hours
but might take as long as five.
Authorization Filtering
The first step of resource discovery for Grid scheduling is to determine the
set of resources that the user submitting the job has access to. In this regard,
computing over the Grid is no different from remotely submitting a job to a
single site: without authorization to run on a resource, the job will not run. At
the end of this step the user will have a list of machines or resources to which
he or she has access. The main difference that Grid computing lends to this
problem is sheer numbers. It is now easier to get access to more resources,
although equally difficult to keep track of them. Also, with current GIS imple-
mentations, a user can often find out the status of many more machines than
where he or she has accounts on. As the number of resources grows, it simply
does not make sense to examine those resources that are not authorized for use.
System Selection
Given a group of possible resources (or a group of possible resource sets),
all of which meet the minimum requirements for the job, a single resource (or
single resource set) must be selected on which to schedule the job. This selec-
tion is generally done in two steps: gathering detailed information and making
a decision. We discuss these two steps separately, but they are inherently inter-
twined, as the decision process depends on the available information.
Preparation Tasks
The preparation stage may involve setup, staging, claiming a reservation, or
other actions needed to prepare the resource to run the application. One of the
first attempts at writing a scheduler to run over multiple machines at NASA
was considered unsuccessful because it did not address the need to stage files
automatically.
Most often, a user will run scp, ftp or a large file transfer protocol such
as GridFTP [ABB 02] to ensure that the data files needed are in place. In
a Grid setting, authorization issues, such as having different user names at
different sites or storage locations, as well as scalability issues, can complicate
this process.
CONCLUSION
This chapter defines the steps a user currently follows to make a scheduling
decision across multiple administrative domains. This approach to scheduling
on a Grid comprises three main phases: (1) resource discovery, which gener-
ates a list of potential resources; (2) information gathering and choosing a best
set of resources; and (3) job execution, which includes file staging and cleanup.
While many schedulers have begun to address the needs of a true Grid-level
scheduler, none of them currently supports the full range of actions required.
Throughout this chapter we have directed attention to complicating factors that
must be addressed for the next generation of schedulers to be more successful
in a complicated Grid setting.