03-09-2012, 02:23 PM
Checkpointing using Mobile Agents in Distributed Systems
Checkpointing.pdf (Size: 144.04 KB / Downloads: 29)
Abstract
Traditional message passing based checkpointing and
rollback recovery algorithms perform well for tightly coupled
systems. In wide area distributed systems these algorithms
may suffer from large overhead due to message passing
delay and network traffic. Mobile agents offer an attractive
option for designing checkpointing schemes for wide
area distributed systems. Network topology is assumed to
be arbitrary. Processes are mobile agent enabled. When
a process wants to take a checkpoint, it just creates one
mobile agent. Concurrent initiations by multiple processes
are allowed. Synchronization and creation of a consistent
global state (CGS) for checkpointing is managed by the mobile
agent(s). In the worst case, for k concurrent initiations
among n processes, checkpointing algorithm requires a total
of O(kn) hops by all the mobile agents. A mobile agent
carries O(n/k) (on the average) size data.
Introduction
Traditional checkpointing and rollback recovery algorithms
work well in tightly coupled systems. But in wide
area distributed systems like internet, these algorithm degrade
system performance due to network traffic and message
passing delays. Mobile agents can improve system
performance for checkpointing algorithms on wide area distributed
systems like internet.
A mobile agent is an executable program that can automatically
migrate among processes as a messenger. It
halts execution of the host, executes at the host process,
takes necessary actions, and finally dispatches itself, together
with required information, to another host. In wide
area networks, a mobile agent can execute asynchronously
and automatically with its own control. If a user is temporarily
disconnected from the network, the agent can still
carry out the tasks. Mobile agents can reduce the network
congestion by reducing frequent remote communications
System Model
We consider a distributed system consisting of n processes,
one process per processor, denoted by {P0, P1, P2,
· · · , Pn−1} and a set of bidirectional channels on an arbitrary
network topology. There is no common clock, shared
memory or central coordinator. Message passing is the only
mode of communication between any pair of processes.
Any process can initiate checkpointing. We assume that
there is no link failure, only processesmay fail. The computation
is asynchronous; messages are exchanged with finite
but arbitrary delays.
In this paper, we consider logical checkpoint [12, 15],
which is a standard checkpoint (i.e., snapshot of the process)
plus a list of messages, which have been sent by this
process but are unacknowledged at the time of taking the
checkpoint. Message lists are updated continuously. Our algorithm
allows the generation of missing messages in case
the system has to roll back to its last checkpoint. At the
time of restart after a failure, processes retransmit their unacknowledged
messages (not all of whom may be missing
messages). There may be duplicatemessages after recovery
from a failure and that is handled using message identifiers.
Conclusion
In this paper, we have proposed a mobile agent-based
checkpointing protocol for arbitrary network topology. Processes
take logical checkpoints. The protocol can handle
multiple initiations of checkpointing. At most twocheckpoints
(one permanent and other temporary) have to
be saved in the stable storage of a process. Each initiator
creates a mobile agent. An agent moves along a DFS
tree rooted at the creator of the agent. A total of O(kn)
moves and O(n) time are required by all the agents in
the complete checkpointing process. This is an improvement
over O(n3) messages in Spezialetti-Kearns as well as
Prakash-Singhal algorithms. The Mandal-Mukhopadhyaya
algorithm also has O(n2) moves, but that works only for
Hamiltonian topologies.