Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: Process Management in a Distributed Operating System
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
[attachment=73049]


NTRODUCTION
Our goal in designing the process management primitives described in this paper
was to provide mechanisms that can do what process management primitives in
existing general-purpose operating systems can do and much more. The added
functionality has to do with the properties of the kinds of distributed systems we
are interested in: personal workstations, shared server machines and guest systems,
connected by a fast local-area network.

The workstations are normally used by a single person, but, when nobody is
using them, they are available as a computing resource to users of other workstations.
Together with processors dedicated to being allocated for the execution of
user programs, the idle workstations form a Processor Pool. Shared server
machines provide a distributed file system, name service, gateways to the internet,
access to printers, tape drives, etc. By 'guest systems' we mean traditional
operating systems that have become connected to the distributed system with
some software to allow the sharing of software between the 'new' and the 'old'
world. In the case of our system, U~Ixt systems are still used because of the
enormous body of software available to us there; software that is only slowly
replaced by equivalent or better in the distributed system.
We are building a general-purpose distributed system, so the programming
environment we design for is a heterogeneous one: many languages, several file
systems, existing software developed on other systems, possibly a wide variety of
hardware and different kinds of networks to connect the machines. The
designed process abstraction must allow running existing software. There must
thus be support for heavy-weight processes and emulation of foreign operating
system interfaces (with the possibility of providing binary compatibility: binaries
fi"om the foreign system must run without modification).
In this environment, sufficient protection mechanisms must be implemented to
prevent one user's programs from disturbing another's. Programs from different
users will frequently share one physical processor, so they must run in separate
address spaces.
Not all machines can be expected to have a local file system, so programs will
have to be downloaded over the network. The mechanisms that do this must be
fast; some programs are several megabytes in size, so loading takes seconds, even
in the best of cases, and the user is often impatiently waiting at the terminal.
Distributed applications will rely heavily on fast interprocess communication.
In many distributed systems, the basic communication mechanism is the message
transaction, a message pair: a request message from a client process to a server,
followed by a reply message from the server back to the client. On top, remote
procedure call is often provided. When carefully designed and implemented, message
transactions form one of the most efficient communication protocols for
local-area networks, both in terms of delay and of throughput [6, 2]. In many
popular implementations, when a client process has sent a request, it blocks until
a reply arrives; when a server has asked for a request, it blocks until one arrives.
Using message transactions has several consequences for the design of the programming
environment. First, processes block once on each message transaction.
Two process switches thus occur: one when the process blocks to run
another process and one after the process has become unblocked again to run
the original again. If message transactions are to be very fast, process switching
had better be fast too.
Second, message transactions provide no parallelism: when the client runs, the
server waits for a request, and when the server runs, the client waits for a reply.


Only one process runs at a time, albeit on different machines. One solution
could be to implement non-blocking transactions, thus killing two birds with one
stone: process switches need not compete with message transactions in speed any
more and parallelism can be obtained by sending requests to many servers
simultaneously. This solution, however, introduces a whole new set of problems
[7]. One problem is that the interface between a process and the communications
substrate becomes more complicated: there must be handles for telling a
process when a message has arrived. Another is that the number of process
switches does not decrease at all: the communications software (which must
reside in a separate address space or in the kernel for protection) is invoked
upon requests to send, requests to receive, and upon receipt of a message from
the network. A third problem is that a non-blocking message transaction interface
is extremely hard to program and debug, because the order of events is no
longer specified.
Parallelism must be provided in some other way, and the way that was chosen
in Amoeba, as well as many other modern distributed)ystems, is to implement
light-weight processes, or threads of control. Many threads can share a single address
space; since much of the state of the light-weight processes is shared, thread
switching can be done blindingly fast. Using light-weight processes makes it possible
to implement servers by having one process serve a single client at a time;
many clients can be served simultaneously by creating many parallel lightweight
processes. Usually, a synchronization mechanism is provided to allow the
processes to share common data structures in shared memory (e.g., in the form of
sem~p~res).
Light-weight processes and blocking message transactions are used in many
distributed systems to simplify writing software that exploits parallelism [11, 1,7].
Mechanisms for migration of processes in distributed systems have been proposed
or implemented several times, but no algorithms have been proposed to
use migration for load-balancing. Given the time required to migrate a large
process (on the order of ten seconds), migration for load-balancing does not
appear to be very useful. It can be useful, however, in an environment of personal
workstations, where idle workstations are 'lent out' as a processing resource
for others and 'taken back' when their owners return.


THE AMOEBA DISTRmUTED OPERATL'qO SYSTEM
Amoeba is a distributed operating system, based on the popular paradigm of
client processes communicating with services via message transactions. Amoeba
uses capabilities to access services and the objects these services implement.
A capability is a 256-bit reference to an object; the first 64 bits --known as the
port --refer to the service managing the object; the next 64 bits are available to
the system for use as a location hint; the remaining 128 bits are allocated by the
service to identify the object, A capability is generated in such a way --and contains
sufficient bits--that the probability of an unauthorized user guessing an
object's capability is negligible.
41
These capabilities are used for protection, and also as the primary mechanism
for addressing requests to do operations on objects. When a client sends a
request, the system uses the port to determine which service should handle the
request. A server for that service is then found through a locate operation, e.g.,
implemented through broadcasting 'where-are-you" packets. The server uses the
private part of the capability to identify the object. After carrying out a request,
the server returns a reply.
Most services run in user space. The Amoeba Kernel provides only the bare
minimum of service: message-transaction facilities, ,process management, and
access paths to peripherals. File service, for instance, is a user-space service with
no special privileges, except knowledge of the capabilities to get to the disks
where the files are stored.
Message transactions are blocking, and the system provides no buffering.
When a server calls getrequest(port, capability, requestbuffer), (the port identifies the
server to the system), the server is blocked until a request arrives. The server
returns a reply with putreply(replybuffer), which doesn't block. When the client
calls trans(capability, requestbuffer, replybuffer), it blocks until the server's reply is
received.
In case of a failure, the client is told that the server could not be reached, or
that no reply was received. In the former case, the client can safely retry; in the
latter case, the client will have to find out whether the failure occurred before,
during, or after execution of the request (unless the request was idempotent; in this
case the request can always be safely repeated). When a client fails during a
transaction, the reply is lost.
A kernel request is just a request for an operation on an object maintained by
the kernel. A kernel request, or system call, is a transaction with the Kernel Service.
Thus, in principle, Amoeba could have only the system calls for doing message
transactions. In practice, however, it is more efficient to implement some of the
very frequent kernel service requests as ordinary system calls.
Since Amoeba transactions t are blocking, they cannot be used to obtain
parallelism. Amoeba uses parallel processes to achieve that. Amoeba implements
light-weight parallel processes, called tasks. For efficiency, a number of
tasks can share an address space. An address space with a number of tasks in it
is a cluster. Because the term process could refer both to a task or a cluster, we
have avoided it as much as possible in the remainder of the paper.
To allow programmers to use separate tasks for small units of work (e.g., use a
separate task for each request received by a file server), tasks are cheap to
create, destroy and schedule. The current scheme for this is quite efficient, but
we believe it can be made more flexible and more efficient still. This paper
discusses a new design for task and cluster management.
For more information about Amoeba, see 'The Design of a Capability-Based
Distributed Operating System' [12, 9, 10, 8]. For details of the Amoeba protection
mechanism, see 'Protection and Resource Control in Distributed Systems'



THE KERNEL SERVER
The Amoeba Kernel manipulates three kinds of basic objects to realize the process
abstraction in Amoeba. A cluster is a virtual address space consisting of a
number of segments and a number of threads of control, called tasks.
The mason for having tasks share an address space is one of efficiency: Tasks
can exchange information among each other more efficiently in shared memory,
and, since tasks have little context, task switching can be made very fast. The
concept of tasks is used in several modem distributed systems, notably, V [ 1],
Mesa [4], and Topaz.*