17-05-2014, 10:17 AM
Scheduling in Heterogeneous Computing Environments for Proximity Queries
Scheduling in Heterogeneous.pdf (Size: 1.66 MB / Downloads: 16)
Abstract
We present a novel, linear programming (LP)-based scheduling algorithm that exploits heterogeneous multicore
architectures such as CPUs and GPUs to accelerate a wide variety of proximity queries. To represent complicated performance
relationships between heterogeneous architectures and different computations of proximity queries, we propose a simple, yet accurate
model that measures the expected running time of these computations. Based on this model, we formulate an optimization problem
that minimizes the largest time spent on computing resources, and propose a novel, iterative LP-based scheduling algorithm. Since our
method is general, we are able to apply our method into various proximity queries used in five different applications that have different
characteristics. Our method achieves an order of magnitude performance improvement by using four different GPUs and two hexa-
core CPUs over using a hexa-core CPU only. Unlike prior scheduling methods, our method continually improves the performance, as
we add more computing resources. Also, our method achieves much higher performance improvement compared with prior methods
as heterogeneity of computing resources is increased. Moreover, for one of tested applications, our method achieves even higher
performance than a prior parallel method optimized manually for the application. We also show that our method provides results that
are close (e.g., 75 percent) to the performance provided by a conservative upper bound of the ideal throughput. These results
demonstrate the efficiency and robustness of our algorithm that have not been achieved by prior methods. In addition, we integrate one
of our contributions with a work stealing method. Our version of the work stealing method achieves 18 percent performance
improvement on average over the original work stealing method. This result shows wide applicability of our approach.
INTRODUCTION
P ROXIMITY
computation is one of the most fundamental
geometric operations, and has been studied in the last
two decades for various applications including games,
physically based simulations, ray tracing-based rendering,
motion planning in robotics, and so on [1].
There have been numerous attempts to accelerate the
queries. One of the most general approaches is adopting an
acceleration hierarchy such as kd-trees [2] or bounding
volume hierarchies (BVHs) [1], [3]. Even though this
method is general and improves the performance of various
proximity queries by several orders of magnitude, there are
ever growing demands for further improving the perfor-
mance of proximity queries, since the model complexities
are also ever growing. For example, proximity queries
employed in interactive applications such as games should
provide real-time performance. However, it may not meet
such requirement, especially for large-scale models that
consist of hundreds of thousands of triangles.
Scheduling for Unrelated Parallel Systems
Scheduling concerns allocating jobs to resources for achiev-
ing a particular goal, and has been extensively studied [11].
We are interested in finding the optimal job distribution
that maximizes the throughput of entire parallel system.
This is known as minimizing the makespan. At high level, a
parallel system can be classified as identical, related, or
unrelated. Unrelated parallel system consists of heteroge-
neous computing resources that have different character-
istics and thus performance varies. Related or identical
systems, on the other hand, are composed of resources that
are similar or the exactly same in terms of characteristics
and performance, respectively. Our scheduling method
aims for most general parallel systems, such as unrelated
parallel machines (i.e., heterogeneous computing systems).
Parallel Proximity Queries and Scheduling
Many scheduling methods have been proposed for various
proximity computations on parallel systems. Tang et al. [28]
group jobs into blocks and assign a block to each idle CPU
thread in a round robin fashion. Lauterbach et al. [29] check
the workload balance among cores on a GPU and perform
workload balancing (i.e., distributing jobs evenly) when the
level of workload balance is low.
Only a few works have proposed utilizing heterogeneous
multicore architectures for proximity queries. Budge et al.
[30] designed an out-of-core data management method for
ray tracing on hybrid resources including CPUs and GPUs.
They prioritize different jobs in ray tracing and assign them
into either a CPU or a GPU, by checking which processor
the jobs prefer and where the required data is stored. Kim
et al. [20] decompose continuous collision detection into
two different task types and manually dedicate all the tasks
of each job type into one of the CPU or GPU
CONCLUSION
We have presented a novel, LP-based scheduling method,
to maximally utilize more widely available heterogeneous
multi-core architectures. To achieve wide applicability, we
factored out common jobs of various proximity queries and
formulate an optimization problem that minimizes the
largest time spent on computing resources. We have
designed a novel, iterative LP solver that has a minor
computational overhead and computes a job distribution
that achieves near-optimal expected running time. We then
have further improved the efficiency of our scheduling
method with hierarchical scheduling to handle a larger
number of resources.