30-04-2014, 12:15 PM
Load Balancing Parallel Applications on Heterogeneous Platforms
Load Balancing.ppt (Size: 301 KB / Downloads: 64)
Heterogeneous Platforms
So far we’ve only talked about platforms in which all processors/nodes are identical
representative of most supercomputers
Clusters often “end up” heterogeneous
Built as homogeneous
New nodes are added and have faster CPUs
A cluster that stays up 3 years will likely have several generations of processors in it
Network of workstations are heterogeneous
couple together all machines in your lab to run a parallel applications (became popular with PVM)
“Grids”
couple together different clusters, supercomputers, and workstations
It is important to develop parallel applications that can leverage heterogeneous platforms
Heterogeneous Load Balancing
There is an impressively large literature on load-balancing for heterogeneous platforms
In this lecture we’re looking only at “static” load balancing
before execution we know how much load is given to each processor
e.g., as opposed to some dynamic algorithm that assigns work to processors when they’re “ready”
We will look at:
2 simple load-balancing algorithms
application to our 1-D distributed image processing application
application to the 1-D distributed LU factorization
discussion of load-balancing for 2-D data distributions (which turns out to be a very difficult problem)
We assume homogeneous network and heterogeneous compute nodes
Load-balancing
Of course, this should be done for blocks of columns, and not individual columns
Also, should be done in a “motif” that spans some number of columns (B=10 in our example) and is repeated cyclically throughout the matrix
provided that B is large enough, we get a good approximation of the optimal load-balance
2-D Data Distributions
What we’ve seen so far works well for 1-D data distributions
use the simple algorithm
use the allocation pattern over block in a cyclic fashion
We have seen that a 2-D distribution is what’s most appropriate, for instance for matrix multiplication
We use matrix multiplication as our driving example
Free 2-D distribution
Each rectangle must have a surface that’s proportional to the processor speed
One can “play” with the width and the height of the rectangles to try to minimize communication costs
A communication involves sending/receiving rectangles’ half-perimeters
One must minimize the sum of the half-perimeters if communications happen sequentially
One must minimize the maximum of the half-perimeters if communications happen in parallel