01-01-2013, 04:28 PM
Design and Implementation of a Process Migration System for the
Linux Environment
1Design and Implementation.pdf (Size: 235.17 KB / Downloads: 30)
Abstract
This paper reviews the field of process migration by
summarizing the key concepts and giving an overview
of a high level process migration algorithm and also
provides a unique alternative implementation of our
own for the linux environment. Design and implementation
issues of process migration are analyzed in
general, and these are used as pointers in describing
our implementation. The primary aim of this paper
is to build a user space process migration tool which
would obviate the need for kernel support. This paper
aims to provide an insight into the difficult task
of actual migration for performance gain.
Introduction
Process migration is the act of transferring an active
process between two machines and restoring the
process from the point it left off on the selected destination
node.The purpose of this is to provide for an
enhanced degree of dynamic load distribution, fault
resilience, eased system administration, and data access
locality. The potential of process migration is
great especially in a large network of personal workstations.
In a typical working environment,there generally
exists a large pool of idle workstations whose
power is unharnessed if the potential of process migration
is not tapped.Providing for a reliable and efficient
migration module that helps handle inconsistencies
in load distribution and allows for an effective
usage of the system resource potential is highly warranted.
Related Work
The future of process migration is extremely encouraging.
Significant research and development has been
conducted in process migration and closely related areas.
Different streams of development may well lead
to a wider deployment of process migration. [2] does a
survey on process migration and highlights the importance
of process migration. A process is suspended on
a machine and resumed on another machine. Remote
Procedure Calls
Load Measurement Metrics
The load information is typically represented by
means of the following metrics: the CPU usage, the
memory availability, the average turnaround time etc.
A process load is typically characterised by its CPU
usage, memory usage, communication, file usage etc.
Load information management deals with using these
to select the process to be migrated and to choose the
destination node.
Distributed Scheduling
This aspect mainly deals with allocation of nodes to
processes. There are a plethora of strategies that are
proposed, of which few are mentioned here:
that is overloaded and that wishes to off-load to
other nodes. A sender-initiated policy is preferable
for low and medium loaded systems, which
have a few overloaded nodes. This strategy is
convenient for remote invocation strategies.
• A receiver-initiated policy is activated on underloaded
nodes willing to accept the load from overloaded
ones. A receiver-initiated policy is preferable
for high load systems, with many overloaded
nodes and few underloaded ones. Process migration
is particularly well-suited for this strategy,
since only with migration can one initiate process
transfer at an arbitrary point in time
• A symmetric policy is the combination of the previous
two policies, in an attempt to take advantage
of the good characteristics of both of them.
It is suitable for a broader range of conditions
than either receiver-initiated or sender-initiated
strategies alone.
• A random policy chooses the destination node
randomly from all nodes in a distributed system.
This simple strategy can result in a significant
performance improvement
The Virtualisation concept
To provide for user space process migration while
guaranteeing transparency, it is imperative that the
processes do not realise that they are working in a
migration enabled environment. Primarily, this involves
building a uniform interface that the processes
interact to which empowers the system to achieve the
objective of transparency. This interface is sometimes
termed the virtual interface and the very idea is called
virtualisation.
Load Calculator
The Load Calculator is a daemon that runs on every
process on the network to calculate the load at regular
intervals. The calculation formula is a parametric
equation based on various parameters. The total load
on the system is calcualed as the sum of the loads
of individual process. The memory and the cpu usage
are the most significant parameters in the calculation
of load. The /proc file system has one directory
per process. The directories contain the relevant
information regarding memory and cpu usage for all
processes in the system. When the load exceeds a certain
threshold value, this module sends an overload
signal to the main server. This in turns begins the
process of preemption, checkpointing and scheduling.
Conclusion
Despite these goals and ongoing research efforts, migration
has not achieved widespread use. One reason
for this is the complexity of adding transparent migration
to systems originally designed to run stand-alone,
since designing new systems with migration in mind
from the beginning is not a realistic option anymore.
Another reason is that there has not been a compelling
commercial argument for operating system vendors
to support process migration.