25-08-2012, 09:41 AM
Interquery Parallelism
1Interquery Parallelism.ppt (Size: 244 KB / Downloads: 25)
Queries/transactions execute in parallel with one another.
Increases transaction throughput; used primarily to scale up a transaction processing system to support a larger number of transactions per second.
Easiest form of parallelism to support, particularly in a shared-memory parallel database, because even sequential database systems support concurrent processing.
More complicated to implement on shared-disk or shared-nothing architectures
Locking and logging must be coordinated by passing messages between processors.
Data in a local buffer may have been updated at another processor.
Cache-coherency has to be maintained — reads and writes of data in buffer must find latest version of data.
Execution of a single query in parallel on multiple processors/disks; important for speeding up long-running queries.
Two complementary forms of intraquery parallelism :
Intraoperation Parallelism – parallelize the execution of each individual operation in the query.
Interoperation Parallelism – execute the different operations in a query expression in parallel.
the first form scales better with increasing parallelism becausethe number of tuples processed by each operation is typically more than the number of operations in a query.
Parallel Processing of Relational Operations
Our discussion of parallel algorithms assumes:
read-only queries
shared-nothing architecture
n processors, P0, ..., Pn-1, and n disks D0, ..., Dn-1, where disk Di is associated with processor Pi.
If a processor has multiple disks they can simply simulate a single disk Di.
Shared-nothing architectures can be efficiently simulated on shared-memory and shared-disk systems.
Algorithms for shared-nothing systems can thus be run on shared-memory and shared-disk systems.
However, some optimizations may be possible.
Query Optimization
Query optimization in parallel databases is significantly more complex than query optimization in sequential databases.
Cost models are more complicated, since we must take into account partitioning costs and issues such as skew and resource contention.
When scheduling execution tree in parallel system, must decide:
How to parallelize each operation and how many processors to use for it.
What operations to pipeline, what operations to execute independently in parallel, and what operations to execute sequentially, one after the other.
Determining the amount of resources to allocate for each operation is a problem.
E.g., allocating more processors than optimal can result in high communication overhead.
Long pipelines should be avoided as the final operation may wait a lot for inputs, while holding precious resources
Design of Parallel Systems
Some issues in the design of parallel systems:
Parallel loading of data from external sources is needed in order to handle large volumes of incoming data.
Resilience to failure of some processors or disks.
Probability of some disk or processor failing is higher in a parallel system.
Operation (perhaps with degraded performance) should be possible in spite of failure.
Redundancy achieved by storing extra copy of every data item at another processor.