20-06-2013, 04:53 PM
Fast Sparse Level Sets on Graphics Hardware SEMINAR REPORT
Fast Sparse Level.docx (Size: 644.32 KB / Downloads: 17)
ABSTRACT
This method is one of the most usedmethods for capturing and tracking deformable interfaces. Even though levelsets have extreme potential in visualization and computer graphics applications, such as surface editing and physicallybased modeling, its use for interactive simulations has been limited because of the high computational demands involved. The contents of this paper address this computational challenge by showcasing the increased computing power of graphics processors, for achieving fastsimulations based on level sets. Our effective, sparse GPU level-set method is significantly faster than other methods, parallelmethodologies on both the CPU and GPU hardware. Furthermore we investigate its performance through a method for surface reconstruction,based on GPU level sets. Our novel multi-resolution method for surface reconstruction from unorganized point clouds relatesfavorably with current, prevailing techniques and other parallel applications. As a final point, we conclude that the methods of level-set computations andrendering of level-set surfaces can be achieved at interactive rates, equally well on large volumetric grids. As a result, many applicationscentered on level sets can benefit from our sparse level-set method.
INTRODUCTION
As its started by Osher and Sethian, the level-set method has become the most preferred technique for capturing and tracking moving interfaces, and found a host of applications in a wide variety of research fields from chemistry and physics to computer vision and graphics. The basic idea is to represent the dynamic interface (e.g., contour or surface) totally and embed it as the zero level set of a time-dependent, higher dimensional function. Embryonic the interface with a given velocity in the normal direction then becomes equivalent to solving a time-dependent PDE for the embedding level-set function. The main advantages of the level-set method are that it allows the interface to undergo random topological changes and conveniently provides inherent geometric properties such as normal and curvature information. Unfortunately, although the level-set method is well suited for tracking highly deformable models such as mud and water in accurate, physically based simulations its use for interactive systems has been vulnerable due to the high computational demands. The cost which has to be paid for the flexibility offered by the level-set method is of twofold nature: first, computationally, one has to solve the level-set PDE in a higher dimensional space than that of the embedded interface, and second, the memory requirements are higher than the size of the interface, as one needs to explicitly store a uniform Cartesian grid for solving the level-set PDE. To address these issues, a number of techniques have been projected, see Section 2. These methods rely on the fact that it suffices to solve the PDE only in the vicinity of the interface in order to preserve the embedding.
PREVIOUS AND RELATED WORK
As the level-set method is a tremendously popular approach for tracking moving interfaces, there has been continuous interest in developing improved algorithms to address the computational issues. Here, we focus on closely related methods and state-of-the-art results for general issues in designing GPU-based algorithms.
EFFICIENT LEVEL-SET METHODS ON THE CPU
The computational issue was first addressed with the introduction of the narrow-band schemes, which restrict the computations to small vicinity around the zero level set representing the deforming interface. However, narrow-band methods still need to explicitly store a full grid and additional data structures. Thus, such methods have storage complexities scaling with the size of the grid. Quadtree and octre-based methods do achieve smaller memory footprints, although they usually require uniform refinement along the interface. An alternative approach for reducing memory requirements, called the “Sparse Block Grid” method, was presented by Bridson . In 3D, this method divides the volume of size n3 into small blocks of size m3 voxels each. A coarse grid of size stores at each location a single pointer to a full-resolution block that intersects the interface. Although this method has non optimal storage complexity, it maintains constant access time similar to the full- grid method.
LEVEL-SET GPU METHODS
The first GPU implementation of level sets is due to Rumpf and Strzodka. More recently, parallel implementations of particle and marker level sets were also proposed. These methods achieve 15 and 24 fps, respectively, on full grids of size 1283 voxels. Both methods are more computationally involved than the pure level-set method, and thus a direct comparison with regard to efficiency is unfair. While not sparse, the recent methods of Roberts and Klar represent approaches to implement narrow-band approaches on the GPU. Unfortunately, both methods have rather high memory requirements. However, Robertsn show that their method is both work and step efficient. To the best of our knowledge, the only memory-adaptive model for the level-set representation on the GPU is due to Lefohn . In this method, the domain is decomposed into small 2D tiles, of which only the tiles with nonzero derivatives are stored on the GPU. A lookup table spanning the entire domain stores a pointer to the data for every tile. Memory management is performed by transfer- ring a bit-vector image of about 64 kB from the GPU in every iteration, after which the CPU loads and unloads tiles based on their necessity for the computation during the next iteration.
SPARSE CPU METHODS
Recently, Nielsen and Museth introduced the “Dynamic Tubular Grid” (DT-Grid), a recursive, compressed level- set representation inspired by the compressed-row-storage technique used to represent sparse matrices. They showed that the memory requirement of DT-Grid is optimal, i.e., it is proportional to the size of the interface. Moreover, they found the 3DDT-Gridtobe faster and more memory efficient than state-of-the-art octree-based approaches. Huston use hierarchical run-length encoding (RLE) in a dimensional-recursive fashion to encode the domain in a series of runs, each associated with a specific run code. Regions away from the narrow band are encoded to just their sign representation, while the narrow band is stored in full precision. Although this method is more flexible than DT-Grid, the price paid is a slight increase in computation time and memory usage. Similar to the methods by Lefohn and Bridson , the Sorted Tile List method divides the domain into fixed-size tiles, such that each tile represents a part of the domain of the level-set function. Tiles outside the narrow band are discarded. The remaining narrow-band tiles form the “active list.”
SURFACE RECONSTRUCTION
Surface reconstruction from unorganized point clouds has been intensively studied. Despite the increased availability of commodity parallel platforms, there has been very little work on parallel algorithms for surface reconstruction. Only Zhou and Bolitho implemented the Poisson method on the GPU, and on shared and distributed-memory parallel platforms, respectively. In , significant speedups were obtained on the GPU at small grid resolutions. As pointed out by Bolitho, a limitation is the need to maintain the entire octree and additional data structures in GPU memory, thus drastically limiting the maximum resolution. Moreover, the method is more susceptible to noise than the original one, due to some computational simplifications that were introduced. Finally, the results of Bolitho indicate that the Poisson method scales well on distributed-memory parallel computers and badly on shared-memory architectures.
Traversal of the Active List
An efficient CUDA algorithm divides the work between CUDA blocks and threads in a way that yields minimal overhead. For this, we introduce the concept of units, i.e., groups of T threads working together on one tile at a time. Each thread among the first N = 26 threads of a unit, with T = 64 threads, first tracks a pointer to a neighboringtile, see Fig. =. The remaining T N= 38 threads wait until the other (active) threads track the pointers to the neighbors of the currently processed tile. After tracking, each thread of the unit computes one level-set value within the current tile. Finally, the unit advances to the next tile, so that in total, each unit processes a number of consecutive tiles (a chunk) of the active list. A CUDA block can consist of a multiple of these units, which act independently of each other, see Fig. 2. The total number of units is chosen so as to saturate all GPU multiprocessors.