13-09-2014, 03:19 PM
Parallel Database
Parallel Database.docx (Size: 84.95 KB / Downloads: 10)
ABSTRACT
Parallel database machine architectures have evolved from the use of exotic hardware to a software parallel dataflow architecture based on conventional shared-nothing hardware. These new designs provide impressive speedup and scale up when processing relational database queries.
INTRODUCTION
Highly parallel database systems are beginning to displace traditional mainframe computers for the largest database and transaction processing tasks. The success of these systems refutes a 1983 paper predicting the demise of database machines. Ten years ago the future of highly-parallel database machines seemed gloomy, even to their staunchest advocates. Most database machine research had focused on specialized, often trendy, hardware such as CCD memories, bubble memories, head-per-track disks, and optical disks. None of these technologies fulfilled their promises; so there was a sense that conventional cpus, electronic RAM, and moving-head magnetic disks would dominate the scene for many years to come. At that time, disk throughput was predicted to double while processor speeds were predicted to increase by much larger factors. Consequently, critics predicted that multi-processor systems would soon be I/O limited unless a solution to the I/O bottleneck were found. While these predictions were fairly accurate about the future of hardware, the critics were certainly wrong about the overall future of parallel database systems. Over the last decade Tera data, Tandem, and a host of start up companies have successfully developed and marketed highly parallel database machines
Relational
The relational databases such as MySQL, Microsoft SQL Server and Oracle, have a much more logical structure in the way that it stores data. Tables can be used to represent real world objects, with each field acting like an attribute. For example, a table called books could have the columns title, author and ISBN, which describe the details of each book where each row in the table is a new book.
The "relation" comes from the fact that the tables can be linked to each other, for example the author of a book could be cross-referenced with the authors table (assuming there was one) to provide more information about the author. These kind of relations can be quite complex in nature, and would be hard to replicate in the standard flat-file format.
One major advantage of the relational model is that, if a database is designed efficiently, there should be no duplication of any data; helping to maintain database integrity. This can also represent a huge saving in file size, which is important when dealing with large volumes of data. Having said that, joining large tables to each other to get the data required for a query can be quite heavy on the processor; so in some cases, particularly when data is read only, it can be beneficial to have some duplicate data in a relational database.
Database Comparisons
In most cases, you would want your database to support various types of relations; such databases, particularly if designed correctly, can dramatically improve the speed of data retrieval as well as being easier to maintain. Ideally, you will want to avoid the replication of data within a database to keep a high level of integrity, otherwise changes to one field will have to be made manually to those that are related.
While several flat-files can be combined in such a way as to be able to emulate some of the behaviours of a relational database, it can prove to be slower in practice. A single connection to a relational database can access all the tables within that database; whereas a flat file implementation of the same data would result in a new file open operation for each table.
All the sorting for flat-file databases need to be done at the script level. Relational databases have functions that can sort and filter the data so the results that are sent to the script are pretty much what you need to work with. It is often quicker to sort the results before they are returned to the script than to have them sorted via a script, few scripting languages are designed to filter data effectively and so the more functions a database supports, the less work a script has to do.
If you are only working with a small amount of data that is rarely updated then a full blown relational database solution can be considered overkill. Flat-file databases are not as scaleable as the relational model, so if you are looking for a suitable database for more frequent and heavy use then a relational database is probably more suitabl.
ADVANTAGES
Organizations of every size benefit from databases because they improve the management of information. The database has a server, a specialized program that oversees all user requests for data and adheres to strict rules for security and system integrity. If an organization has a large user base and millions of records to process, it may turn to a parallel database approach. Parallel databases are fast, flexible and reliable.
DISADVANTAGES
Database systems are complex, difficult, and time-consuming to design.Initial training required for all programmers and users.Suitable hardware and software start-up costs.A longer running time for individual applications.
CONCLUSION
Like most applications, database systems want cheap, fast hardware. Today that means commodity processors, memories, and disks. Consequently, the hardware concept of a database machine built of exotic hardware is inappropriate for current technology. On the other hand, the availability of fast microprocessors, and small inexpensive disks packaged as standard inexpensive but fast computers is an ideal platform for parallel database systems. A shared nothing architecture is relatively straightforward to implement and, more importantly, has demonstrated both speedup and scale up to hundreds of processors. Furthermore, shared-nothing architectures actually simplify the software implementation. If the software techniques of data partitioning, dataflow, and intra-operator parallelism are employed, the task of converting an existing database management system to a highly parallel one becomes a relatively straightforward. Finally, there are certain applications (e.g., data mining in terabyte databases) that require the computational and I/O resources availab1le only from a parallel architecture. While the successes of both commercial products and prototypes demonstrates the viability of highly parallel database machines, several open research issues remain unsolved including techniques for mixing ad-hoc queries and with online transaction processing without seriously limiting transaction throughput, improved optimizers for parallel queries, tools for physical database design, on-line database reorganization, and algorithms for handling relations with highly skewed data distributions. Some application domains are not well supported by the relational data model. It appears that a new class of database systems based on an object oriented data model are needed. Such systems pose a host of interesting research problems that required further examination.