31-01-2013, 04:53 PM
Reputation Based Scheduling in Unreliable Distributed Infrastructures
1Reputation Based Scheduling.pdf (Size: 585.95 KB / Downloads: 21)
Introduction
Scheduling parallel applications in a distributed environment, such as a cluster of workstations, remains an important and problem. One of the main research issues is effectively exploiting idle resources and to timeshare the system fairly among the processes.
Local scheduling, where each workstation independently schedules its processes, is a time-sharing option for its ease of construction, scalability, fault-tolerance, etc. Meanwhile, coordinated scheduling of parallel jobs across the nodes of a multiprocessor (co-scheduling) is also indispensable in a distributed system. Without co-scheduling, the processes constituting a parallel job might suffer high communication latencies because of processor thrashing. By coordinated scheduling across cooperating processes, each local scheduler is able to make independent decisions that tend to schedule the processes of a parallel application in a coordinated manner across processors, in order to fully exploit the computing resource of a distributed system.
There are several on-going research projects in this area, such as the NOW (Network of Workstations) project at UC Berkeley, the HPVM (High Performance Virtual Machine) project at UCSD/UIUC, and Hector at Mississippi State University. In this paper, we will discuss the main scheduling and co-scheduling techniques in these projects, comment on their pros and cons, and then make our conclusion.
Overview of the Scheduling Mechanism
In general, job scheduling is composed of at least two inter-dependent steps: the allocation of processes to workstations (space-sharing) and the scheduling of the processes over time (time-sharing), while there exist several optional complementary steps to further improve the performance.
When a job is submitted to the system, job placement will be done, i.e., to decide which workstations to run the job cooperatively (space-sharing). Along with the job submission, a description of the attributes of the job is also submitted to the system in order to specify the resource requirement, such as memory size requirement, expected CPU time, deadline time, etc. In the meantime, the system always maintains an information table, either distributed or centralized, to record the current resource status of each workstation, e.g., CPU load, free memory size, etc. Then, a matchmaking frame will do matching work to find the most suitable set of workstations to meet the requirement of the job. This job is then decomposed into small components, i.e. processes, which are distributed to those assigned workstation. On each individual workstation, the local scheduler allocates some time-slices to the process based on some policies so as to achieve the scheduling requirement, such as response time and fairness.
Properties of a Good Scheduler
The research activities for finding out properties vary widely in a number of dimensions:
i. Support for heterogeneous resources.
ii. Placement of objective function(s)
iii. Scalability
iv. Co-scheduling methods, and assumptions about system configuration.
Based on the experience accumulated during these activities, it is believed that a good scheduler should have the following properties:
a. General purpose: a scheduling approach should make few assumptions about and have few restrictions to the types of applications that can be executed. Interactive jobs, distributed and parallel applications, as well as non-interactive batch jobs, should all be supported with good performance.
This property is a straightforward one, but to some extent difficult to achieve. Because different kinds of jobs have different attributes, their requirements to the scheduler may contradict. For example, a real-time job, requiring short-time response, prefers space-sharing scheduling; a non-interactive batch job, requiring high-throughput, may prefer time-sharing scheduling. To achieve the general purpose, a trade-off may have to be made.
Local Scheduling
In a distributed system, local scheduling means how an individual workstation should schedule those processes assigned to it in order to maximize the overall performance. It seems that local scheduling is the same as the scheduling approach on a stand-alone workstation. However, they are different in many aspects. In a distributed system, the local scheduler may need global information from other workstations to achieve the optimal overall performance of the entire system. For example, in the extended stride scheduling of clusters, the local schedulers need global ticket information in order to achieve fairness across all the processes in the system.
Types of Local Scheduling techniques:
a. Proportional-sharing scheduling approach, in which the resource consumption rights of each active process are proportional to the relative shares that it is allocated.
b. Predictive scheduling, which is adaptive to the CPU load and resource distribution of the distributed system.
Proportional-Sharing Schedule
The traditional priority-based schedulers are difficult to understand and give more processing time to users with many jobs, which lead to unfairness among users. Numerous researches have been trying to find a scheduler that is easy to implement and can solve the problem of allocating resources to users fairly over time. In this environment, proportional-share scheduling was brought out to effectively solve this problem. With proportional-share scheduling, the resource consumption rights of each active process are proportional to the relative shares that it is allocated.
Predictive Scheduling
Predictive scheduling differs from other scheduling approaches in that it provides intelligence, adaptively and proactivity so that the system implementing predictive scheduling can adapt to new architectures and/or algorithms and/or environmental changes automatically.
Reputation based scheduling
We present, here a different approach towards scheduling to increase the efficiency of scheduling. RBS (Reputation Based Scheduling) is a scheduling algorithm based on continuously calculated efficiency of the worker nodes rather than a static reputation. The advantage is that the efficiency of the nodes may change with time and thus, this approach provides more real-time approach towards scheduling.