We propose a Parallel Banding Algorithm (PBA) on the GPU to compute the exact Euclidean Distance Transform (EDT) for a binary image in 2D and higher dimensions. Partitioning the image into small bands to process and then merging them concurrently, PBA computes the exact EDT with optimal linear total work, high level of parallelism and a good memory access pattern. This work is the first attempt to exploit the enormous power of the GPU in computing the exact EDT, while prior works are only on approximation. Compared to these other algorithms in our experiments, our exact algorithm is still a few times faster in 2D and 3D for most input sizes. We illustrate the use of our algorithm in applications such as computing the Euclidean skeleton using the integer medial axis transform, performing morphological operations of 3D volumetric data, and constructing 2D weighted centroidal Voronoi diagrams.
This paper presents a novel approach to compute, for a given point set S in R 2 , its Delaunay triangulation T (S). Though prior work mentions the possibility of using the graphics processing unit (GPU) to compute Delaunay triangulations, no known implementation and performance have been reported. Our work uncovers various challenges in the use of GPU for such a purpose. In practice, our approach exploits the GPU to assist in the computation of a triangulation T ′ of S that is a good approximation to T (S). From that, the approach employs the CPU to transform T ′ to T (S). As a major part of the total work is done by the GPU with parallel computing capability, it is a fast and practical approach, particularly for a large number of points (millions with the current state-of-the-art GPU). For such cases, our current implementation can run up to 53% faster on a Core2 Duo machine when compared to Triangle, the well-known fastest Delaunay triangulation implementation.
This paper presents a novel approach, termed GPU-CDT, to compute the constrained Delaunay triangulation (CDT) for a planar straight line graph (PSLG), consisting of points and edges, using the graphics processing unit (GPU). Although there are many algorithms for constructing the 2D CDT using the CPU, there has been no known prior approach using the parallel computing power of the GPU efficiently. For the special case of the CDT problem with PSLGs consisting of just points, which is the normal Delaunay triangulation problem, a hybrid approach has recently been proposed that uses the GPU together with the CPU to partially speed up the computation. Our GPU-CDT works for such special case too, but the whole computation is fully accelerated by the GPU. Our implementation using the CUDA programming model on nVidia GPUs is numerically robust and runs several times faster than any existing CPU algorithms as well as the prior GPU-CPU hybrid approach. This result is reflected in our experiment with both randomly generated PSLGs and real world GIS data, with millions of points and edges.
We propose the first algorithm to compute the 3D Delaunay triangulation (DT) on the GPU. Our algorithm uses massively parallel point insertion followed by bilateral flipping, a powerful local operation in computational geometry. Although a flipping algorithm is very amenable to parallel processing and has been employed to construct the 2D DT and the 3D convex hull on the GPU, to our knowledge there is no such successful attempt for constructing the 3D DT. This is because in 3D when many points are inserted in parallel, flipping gets stuck long before reaching the DT, and thus any further correction to obtain the DT is costly. In contrast, we show that by alternating between parallel point insertion and flipping, together with picking an appropriate point insertion order, one can still obtain a triangulation very close to Delaunay. We further propose an adaptive star splaying approach to subsequently transform this result into the 3D DT efficiently. In addition, we introduce several GPU speedup techniques for our implementation, which are also useful for general computational geometry algorithms. On the whole, our hybrid approach, with the GPU accelerating the main work of constructing a near-Delaunay structure and the CPU transforming that into the 3D DT, outperforms all existing sequential CPU algorithms by up to an order of magnitude, in both synthetic and real-world inputs. We also adapt our approach to the 2D DT problem and obtain similar speedup over the best sequential CPU algorithms, and up to 2 times over previous GPU algorithms.
We propose the first graphics processing unit (GPU) solution to compute the 2D constrained Delaunay triangulation (CDT) of a planar straight line graph (PSLG) consisting of points and edges. There are many existing CPU algorithms to solve the CDT problem in computational geometry, yet there has been no prior approach to solve this problem efficiently using the parallel computing power of the GPU. For the special case of the CDT problem where the PSLG consists of just points, which is simply the normal Delaunay triangulation (DT) problem, a hybrid approach using the GPU together with the CPU to partially speed up the computation has already been presented in the literature. Our work, on the other hand, accelerates the entire computation on the GPU. Our implementation using the CUDA programming model on NVIDIA GPUs is numerically robust, and runs up to an order of magnitude faster than the best sequential implementations on the CPU. This result is reflected in our experiment with both randomly generated PSLGs and real-world GIS data having millions of points and edges.
Flip is a simple and local operation to transform one triangulation to another. It makes changes only to some neighboring simplices, without considering any attribute or configuration global in nature to the triangulation. Thanks to this characteristic, several flips can be independently applied to different small, non-overlapping regions of one triangulation. Such operation is favored when designing algorithms for data-parallel, massively multithreaded hardware, such as the GPU. However, most existing flip algorithms are designed to be executed sequentially, and usually need some restrictions on the execution order of flips, making them hard to be adapted to parallel computation. In this paper, we present an in depth study of flip algorithms in low dimensions, with the emphasis on the flexibility of their execution order. In particular, we propose a series of provably correct flip algorithms for regular triangulation and convex hull in 2D and 3D, with implementations for both CPUs and GPUs. Our experiment shows that our GPU implementation for constructing these structures from a given point set achieves up to two orders of magnitude of speedup over other popular single-threaded CPU implementation of existing algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.