07-12-2012, 12:34 PM
Principles of Distributed Database Systems
1Principles of Distributed.pdf (Size: 4.37 MB / Downloads: 36)
Introduction
Distributed database system (DDBS) technology is the union of what appear to
be two diametrically opposed approaches to data processing: database system and
computer network technologies. Database systems have taken us from a paradigm
of data processing in which each application defined and maintained its own data
(Figure 1.1) to one in which the data are defined and administered centrally (Figure
1.2). This new orientation results in data independence, whereby the application
programs are immune to changes in the logical or physical organization of the data,
and vice versa.
One of the major motivations behind the use of database systems is the desire
to integrate the operational data of an enterprise and to provide centralized, thus
controlled access to that data. The technology of computer networks, on the other
hand, promotes a mode of work that goes against all centralization efforts. At first
glance it might be difficult to understand how these two contrasting approaches can
possibly be synthesized to produce a technology that is more powerful and more
promising than either one alone.
Distributed Data Processing
The term distributed processing (or distributed computing) is hard to define precisely.
Obviously, some degree of distributed processing goes on in any computer system,
even on single-processor computers where the central processing unit (CPU) and input/
output (I/O) functions are separated and overlapped. This separation and overlap
can be considered as one form of distributed processing. The widespread emergence
of parallel computers has further complicated the picture, since the distinction between
distributed computing systems and some forms of parallel computers is rather
vague.
In this book we define distributed processing in such a way that it leads to a
definition of a distributed database system. The working definition we use for a
distributed computing system states that it is a number of autonomous processing
elements (not necessarily homogeneous) that are interconnected by a computer
network and that cooperate in performing their assigned tasks. The “processing
element” referred to in this definition is a computing device that can execute a
program on its own. This definition is similar to those given in distributed systems
textbooks (e.g., [Tanenbaum and van Steen, 2002] and [Colouris et al., 2001]).
What is a Distributed Database System?
We define a distributed database as a collection of multiple, logically interrelated
databases distributed over a computer network. A distributed database management
system (distributed DBMS) is then defined as the software system that permits the
management of the distributed database and makes the distribution transparent to the
users. Sometimes “distributed database system” (DDBS) is used to refer jointly to
the distributed database and the distributed DBMS. The two important terms in these
definitions are “logically interrelated” and “distributed over a computer network.”
They help eliminate certain cases that have sometimes been accepted to represent a
DDBS.
Data Delivery Alternatives
In distributed databases, data are “delivered” from the sites where they are stored to
where the query is posed. We characterize the data delivery alternatives along three
orthogonal dimensions: delivery modes, frequency and communication methods. The
combinations of alternatives along each of these dimensions (that we discuss next)
provide a rich design space.
The alternative delivery modes are pull-only, push-only and hybrid. In the pullonly
mode of data delivery, the transfer of data from servers to clients is initiated
by a client pull. When a client request is received at a server, the server responds by
locating the requested information.
Promises of DDBSs
Many advantages of DDBSs have been cited in literature, ranging from sociological
reasons for decentralization [D’Oliviera, 1977] to better economics. All of these can
be distilled to four fundamentals which may also be viewed as promises of DDBS
technology: transparent management of distributed and replicated data, reliable
access to data through distributed transactions, improved performance, and easier
system expansion. In this section we discuss these promises and, in the process,
introduce many of the concepts that we will study in subsequent chapters.
Data Independence
Data independence is a fundamental form of transparency that we look for within a
DBMS. It is also the only type that is important within the context of a centralized
DBMS. It refers to the immunity of user applications to changes in the definition and
organization of data, and vice versa.
As is well-known, data definition occurs at two levels. At one level the logical
structure of the data are specified, and at the other level its physical structure. The
former is commonly known as the schema definition, whereas the latter is referred
to as the physical data description.
Replication Transparency
The issue of replicating data within a distributed database is introduced in Chapter
3 and discussed in detail in Chapter 13. At this point, let us just mention that for
performance, reliability, and availability reasons, it is usually desirable to be able
to distribute data in a replicated fashion across the machines on a network. Such
replication helps performance since diverse and conflicting user requirements can be
more easily accommodated. For example, data that are commonly accessed by one
user can be placed on that user’s local machine as well as on the machine of another
user with the same access requirements. This increases the locality of reference.
Furthermore, if one of the machines fails, a copy of the data are still available on
another machine on the network. Of course, this is a very simple-minded description
of the situation. In fact, the decision as to whether to replicate or not, and how many
copies of any database object to have, depends to a considerable degree on user
applications. We will discuss these in later chapters.