29-05-2012, 04:54 PM
DISTRIBUTED FILE SYSTEMS
file systems.doc (Size: 46 KB / Downloads: 31)
Design Considerations
Since its introduction in 1985, the Sun Microsystems Network File System (NFS) has been widely used in industry and academia. In addition to its technical innovations it has played a significant educational role in exposing a large number of users to the benefits of a distributed file system. Other vendors now support NFS and a significant fraction of the user community perceives it to be a de facto standard. Portability and heterogeneity are two considerations that have played a dominant role in the design of NFS. Although the original file system model was based on Unix, NFS has been ported to to non-Unix operating systems such as PC-DOS. To facilitate portability, Sun makes a careful distinction between the NFS protocol, and a specific implementation of an NFS server or client. The NFS protocol defines an RPC interface that allows a server to export local files for remote access. The protocol does not specify how the server should implement this interface, nor does it mandate how the interface should be used by a client. Design details such as caching, replication, naming, and consistency guarantees may vary considerably in different NFS implementations. In order to focus our discussion, we restrict our attention to the implementation of NFS provided by Sun for its workstations that run the SunOS flavor of Unix. Unless otherwise specified, the term‘‘NFS’’ will refer to this implementation in the rest of this paper. The term ‘‘NFS protocol’’ will continue to refer to the generic interface specification. SunOS defines a level of indirection in the kernel that allows file system operations to be intercepted and transparently routed to a variety of local and remote file systems. This interface, often referred to as the vnode interface after the primary data structure it exports, has been incorporated into many other versions of Unix. With a view to simplifying crash recovery on servers, the NFS protocol is designed to be stateless. Consequently, servers are not required to maintain contextual information about their clients. Each RPC request from a client contains all the information needed to satisfy the request. To some degree functionality and Unix compatibility have been sacrificed to meet this goal. Locking, for instance, is not supported by the NFS protocol, since locks would constitute state information on a server. SunOS does, however, provide a separate lock server to perform this function. Sun workstations are often configured without a local disk. The ability to operate such workstations without significant performance degradation is another goal of NFS. Early versions of Sun workstations used a separate remote-disk network protocol to support diskless operation. This protocol is no longer necessary since the kernel now transforms all its device operations into file operations. A high-level overview of NFS is presented by Walsh. Details of its design and implementation are given by Sandberg. Kleiman describes the vnode interface, while Rosen comment on the portability of NFS.
Naming and Location
The NFS paradigm treats workstations as peers, with no fundamental distinction between clients and servers. A workstation may be a server, exporting some of its files. It may also be a client, accessing files on other workstations. But it is common practice for installations to be configured so that a small number of nodes run as dedicated servers, while the others run as clients. NFS clients are usually configured so that each sees a Unix file name space with a private root. Using an extension of the Unix mount mechanism, subtrees exported by NFS servers are individually bound to nodes of the root file system. This binding usually occurs when Unix is initialized, and remains in effect until explicitly modified. Since each workstation is free to configure its own name space there is no guarantee that all workstations at an installation have a common view of shared files. But collaborating groups of users usually configure their workstations to have the same name space. Location transparency is thus obtained by convention, rather than being a basic architectural feature of NFS. Since name-to-site bindings are static, NFS does not require a dynamic file location mechanism. Each client maintains a table mapping remote subtrees to servers. The addition of new servers or the movement of files between servers renders the table obsolete. There is no mechanism built into NFS to propagate information about such changes.
Caching and Replication
NFS clients cache individual pages of remote files and directories in their main memory. They also cache the results of pathname to vnode translations. Local disks, even if present, are not used for caching. When a client caches any block of a file, it also caches a timestamp indicating when the file was last modified on the server. To validate cached blocks of a file, the client compares its cached timestamp with the timestamp on the server. If the server timestamp is more recent, the client invalidates all cached blocks of the file and refetches them on demand. A validation check is always performed when a file is opened, and when the server is contacted to
satisfy a cache miss. After a check, cached blocks are assumed valid for a finite interval of time, specified by the client when a remote file system is mounted. The first reference to any block of the file after this interval forces a validation check. If a cached page is modified, it is marked as dirty and scheduled to be flushed to the server. The actual flushing is performed by an asynchronous kernel activity and will occur after some unspecified delay. However, the kernel does provide a guarantee that all dirty pages of a file will be flushed to the server before a close operation on the file completes.
Directories are cached for reading in a manner similar to files. Modifications to directories, however, are performed directly on the server. When a file is opened, a cache validation check is also performed on its parent directory. Files and directories can have different revalidation intervals, typical values being 3 seconds for files and 30 seconds for directories. NFS performs network data transfers in large block sizes, typically 8 Kbytes, to improve performance. Read-ahead is employed to improve sequential access performance. Files corresponding to executable binaries are fetched in their entirety if they are smaller than a certain threshold. As originally specified, NFS did not support data replication. More recent versions of NFS support replication via a mechanism called Automounter. Automounter allows remote mount points to be specified using a set of servers rather than a single server. The first time a client traverses such a mount point a request is issued to each server, and the earliest to respond is chosen as the remote mount site. All further requests at the client that cross the mount point are directed to this server. Propagation of modifications to replicas has to be done manually. This replication mechanism is intended primarily for frequently-read and rarely-written files such as system binaries.
Security
NFS uses the underlying Unix file protection mechanism on servers for access checks. Each RPC request from a client conveys the identity of the user on whose behalf the request is being made. The server temporarily assumes this identity, and file accesses that occur while servicing the request are checked exactly as if the user had logged in directly to the server. The standard Unix protection mechanism using user, group and world mode bits is used to specify protection policies on individual files and directories. In the early versions of NFS, mutual trust was assumed between all participating machines. The identity of a user was determined by a client machine and accepted without further validation by a server. The level of security of an NFS site was effectively that of the least secure system in the environment. To reduce vulnerability, requests made on behalf of root (the Unix superuser) on a workstation were treated by the server as if they had come from a non-existent user, nobody. Root thus received the lowest level of privileges for remote files. More recent versions of NFS can be configured to provide a higher level of security. DES-based mutual authentication is used to validate the client and the server on each RPC request. Since file data in RPC packets is not encrypted, NFS is still vulnerable to unauthorized release and modification of information if the network is not physically secure. The common DES key needed for mutual authentication is obtained from information stored in a publicly readable database. Stored in this database for each user and server is a pair of keys suitable for public key encryption. One key of the pair is stored in the clear, while the other is stored encrypted with the login password of the user. Any two entities registered in the database can deduce a unique DES key for mutual authentication. Taylor describes the details of this mechanism.
System Management
Sun provides two mechanisms to assist system managers. One of these, the Yellow Pages (YP), is a mechanism for maintaining key-value pairs. The keys and values are application-specific and are not interpreted by YP. A number of Unix databases such as those mapping usernames to passwords, hostnames to network addresses, and network services to Internet port numbers are stored in YP. YP provides read-only replication, with one master and many slaves. Lookups may be performed at any replica. Updates are performed at the master, which is responsible for propagating the changes to the slaves. YP provides a shared repository for system information that changes relatively infrequently and that does not require simultaneous updates at all replication sites. YP is usually in use at NFS installations, although this is not mandatory. The Automounter, mentioned above in the context of read-only replication, is another mechanism for simplifying system management. It allows a client to lazy-evaluate NFS mount points, thus avoiding the need to mount all remote files of interest when the client is initialized. Automounter can be used in conjunction with YP to substantially simplify the administrative overheads of server reconfiguration
file systems.doc (Size: 46 KB / Downloads: 31)
Design Considerations
Since its introduction in 1985, the Sun Microsystems Network File System (NFS) has been widely used in industry and academia. In addition to its technical innovations it has played a significant educational role in exposing a large number of users to the benefits of a distributed file system. Other vendors now support NFS and a significant fraction of the user community perceives it to be a de facto standard. Portability and heterogeneity are two considerations that have played a dominant role in the design of NFS. Although the original file system model was based on Unix, NFS has been ported to to non-Unix operating systems such as PC-DOS. To facilitate portability, Sun makes a careful distinction between the NFS protocol, and a specific implementation of an NFS server or client. The NFS protocol defines an RPC interface that allows a server to export local files for remote access. The protocol does not specify how the server should implement this interface, nor does it mandate how the interface should be used by a client. Design details such as caching, replication, naming, and consistency guarantees may vary considerably in different NFS implementations. In order to focus our discussion, we restrict our attention to the implementation of NFS provided by Sun for its workstations that run the SunOS flavor of Unix. Unless otherwise specified, the term‘‘NFS’’ will refer to this implementation in the rest of this paper. The term ‘‘NFS protocol’’ will continue to refer to the generic interface specification. SunOS defines a level of indirection in the kernel that allows file system operations to be intercepted and transparently routed to a variety of local and remote file systems. This interface, often referred to as the vnode interface after the primary data structure it exports, has been incorporated into many other versions of Unix. With a view to simplifying crash recovery on servers, the NFS protocol is designed to be stateless. Consequently, servers are not required to maintain contextual information about their clients. Each RPC request from a client contains all the information needed to satisfy the request. To some degree functionality and Unix compatibility have been sacrificed to meet this goal. Locking, for instance, is not supported by the NFS protocol, since locks would constitute state information on a server. SunOS does, however, provide a separate lock server to perform this function. Sun workstations are often configured without a local disk. The ability to operate such workstations without significant performance degradation is another goal of NFS. Early versions of Sun workstations used a separate remote-disk network protocol to support diskless operation. This protocol is no longer necessary since the kernel now transforms all its device operations into file operations. A high-level overview of NFS is presented by Walsh. Details of its design and implementation are given by Sandberg. Kleiman describes the vnode interface, while Rosen comment on the portability of NFS.
Naming and Location
The NFS paradigm treats workstations as peers, with no fundamental distinction between clients and servers. A workstation may be a server, exporting some of its files. It may also be a client, accessing files on other workstations. But it is common practice for installations to be configured so that a small number of nodes run as dedicated servers, while the others run as clients. NFS clients are usually configured so that each sees a Unix file name space with a private root. Using an extension of the Unix mount mechanism, subtrees exported by NFS servers are individually bound to nodes of the root file system. This binding usually occurs when Unix is initialized, and remains in effect until explicitly modified. Since each workstation is free to configure its own name space there is no guarantee that all workstations at an installation have a common view of shared files. But collaborating groups of users usually configure their workstations to have the same name space. Location transparency is thus obtained by convention, rather than being a basic architectural feature of NFS. Since name-to-site bindings are static, NFS does not require a dynamic file location mechanism. Each client maintains a table mapping remote subtrees to servers. The addition of new servers or the movement of files between servers renders the table obsolete. There is no mechanism built into NFS to propagate information about such changes.
Caching and Replication
NFS clients cache individual pages of remote files and directories in their main memory. They also cache the results of pathname to vnode translations. Local disks, even if present, are not used for caching. When a client caches any block of a file, it also caches a timestamp indicating when the file was last modified on the server. To validate cached blocks of a file, the client compares its cached timestamp with the timestamp on the server. If the server timestamp is more recent, the client invalidates all cached blocks of the file and refetches them on demand. A validation check is always performed when a file is opened, and when the server is contacted to
satisfy a cache miss. After a check, cached blocks are assumed valid for a finite interval of time, specified by the client when a remote file system is mounted. The first reference to any block of the file after this interval forces a validation check. If a cached page is modified, it is marked as dirty and scheduled to be flushed to the server. The actual flushing is performed by an asynchronous kernel activity and will occur after some unspecified delay. However, the kernel does provide a guarantee that all dirty pages of a file will be flushed to the server before a close operation on the file completes.
Directories are cached for reading in a manner similar to files. Modifications to directories, however, are performed directly on the server. When a file is opened, a cache validation check is also performed on its parent directory. Files and directories can have different revalidation intervals, typical values being 3 seconds for files and 30 seconds for directories. NFS performs network data transfers in large block sizes, typically 8 Kbytes, to improve performance. Read-ahead is employed to improve sequential access performance. Files corresponding to executable binaries are fetched in their entirety if they are smaller than a certain threshold. As originally specified, NFS did not support data replication. More recent versions of NFS support replication via a mechanism called Automounter. Automounter allows remote mount points to be specified using a set of servers rather than a single server. The first time a client traverses such a mount point a request is issued to each server, and the earliest to respond is chosen as the remote mount site. All further requests at the client that cross the mount point are directed to this server. Propagation of modifications to replicas has to be done manually. This replication mechanism is intended primarily for frequently-read and rarely-written files such as system binaries.
Security
NFS uses the underlying Unix file protection mechanism on servers for access checks. Each RPC request from a client conveys the identity of the user on whose behalf the request is being made. The server temporarily assumes this identity, and file accesses that occur while servicing the request are checked exactly as if the user had logged in directly to the server. The standard Unix protection mechanism using user, group and world mode bits is used to specify protection policies on individual files and directories. In the early versions of NFS, mutual trust was assumed between all participating machines. The identity of a user was determined by a client machine and accepted without further validation by a server. The level of security of an NFS site was effectively that of the least secure system in the environment. To reduce vulnerability, requests made on behalf of root (the Unix superuser) on a workstation were treated by the server as if they had come from a non-existent user, nobody. Root thus received the lowest level of privileges for remote files. More recent versions of NFS can be configured to provide a higher level of security. DES-based mutual authentication is used to validate the client and the server on each RPC request. Since file data in RPC packets is not encrypted, NFS is still vulnerable to unauthorized release and modification of information if the network is not physically secure. The common DES key needed for mutual authentication is obtained from information stored in a publicly readable database. Stored in this database for each user and server is a pair of keys suitable for public key encryption. One key of the pair is stored in the clear, while the other is stored encrypted with the login password of the user. Any two entities registered in the database can deduce a unique DES key for mutual authentication. Taylor describes the details of this mechanism.
System Management
Sun provides two mechanisms to assist system managers. One of these, the Yellow Pages (YP), is a mechanism for maintaining key-value pairs. The keys and values are application-specific and are not interpreted by YP. A number of Unix databases such as those mapping usernames to passwords, hostnames to network addresses, and network services to Internet port numbers are stored in YP. YP provides read-only replication, with one master and many slaves. Lookups may be performed at any replica. Updates are performed at the master, which is responsible for propagating the changes to the slaves. YP provides a shared repository for system information that changes relatively infrequently and that does not require simultaneous updates at all replication sites. YP is usually in use at NFS installations, although this is not mandatory. The Automounter, mentioned above in the context of read-only replication, is another mechanism for simplifying system management. It allows a client to lazy-evaluate NFS mount points, thus avoiding the need to mount all remote files of interest when the client is initialized. Automounter can be used in conjunction with YP to substantially simplify the administrative overheads of server reconfiguration