We present parallel algorithms and implementations of a bzip2-like lossless data compression scheme for GPU architectures. Our approach parallelizes three main stages in the bzip2 compression pipeline: Burrows-Wheeler transform (BWT), move-to-front transform (MTF), and Huffman coding. In particular, we utilize a two-level hierarchical sort for BWT, design a novel scan-based parallel MTF algorithm, and implement a parallel reduction scheme to build the Huffman tree. For each algorithm, we perform detailed performance analysis, discuss its strengths and weaknesses, and suggest future directions for improvements. Overall, our GPU implementation is dominated by BWT performance and is 2.78× slower than bzip2, with BWT and MTFHuffman respectively 2.89× and 1.34× slower on average.
ROMS is software that models and simulates an ocean region using a finite difference grid and time stepping. ROMS simulations can take from hours to days to complete due to the compute-intensive nature of the software. As a result, the size and resolution of simulations are constrained by the perfor mance limitations of modern computing hardware. To address these issues, the existing ROMS code can be run in parallel with either OpenMP or MPI. In this work, we implement a new parallelization of ROMS on a graphics processing unit (GPU) using CUDA Fortran. We exploit the massive parallelism offered by modern GPUs to gain a performance benefit at a lower cost and with less power. To test our implementation, we benchmark with idealistic marine conditions as well as real data collected from coastal waters near central California. Our implementation yields a speedup of up to 8x over a serial implementation and 2.5x over an OpenMP implementation, while demonstrating comparable performance to a MPI implementation.
This paper presents a framework for GPU-accelerated N -view triangulation in multi-view reconstruction that improves processing time and final reprojection error with respect to methods in the literature. The framework uses an algorithm based on optimizing an angular error-based L 1 cost function and it is shown how adaptive gradient descent can be applied for convergence. The triangulation algorithm is mapped onto the GPU and two approaches for parallelization are compared: one thread per track and one thread block per track. The better performing approach depends on the number of tracks and the lengths of the tracks in the dataset. Furthermore, the algorithm uses statistical sampling based on confidence levels to successfully reduce the quantity of feature track positions needed to triangulate an entire track. Sampling aids in load balancing for the GPU's SIMD architecture and for exploiting the GPU's memory hierarchy. When compared to a serial implementation, a typical performance increase of 3-4x can be achieved on a 4-core CPU. On a GPU, large track numbers are favorable and an increase of up to 40x can be achieved. Results on real and synthetic data prove that reprojection errors are similar to the best performing current triangulation methods but costing only a fraction of the computation time, allowing for efficient and accurate triangulation of large scenes.
The angular error-based triangulation method and the parallax path method are both high-performance methods for large-scale multi-view sequential reconstruction that can be parallelized on the GPU. We map parallax paths to the GPU and test its performance and accuracy as a triangulation method for the first time. To this end, we compare it with the angular method on the GPU for both performance and accuracy. Furthermore, we improve the recovery of path scales and perform more extensive analysis and testing compared with the original parallax paths method. Although parallax paths requires sequential and piecewise-planar camera positions, in such scenarios, we can achieve a speedup of up to 14x over angular triangulation, while maintaining comparable accuracy.
Abstract-We introduce a method for creating very dense reconstructions of datasets, particularly turn-table varieties. The method takes in initial reconstructions (of any origin) and makes them denser by interpolating depth values in two-dimensional image space within a superpixel region and then optimizing the interpolated value via image consistency analysis across neighboring images in the dataset. One of the core assumptions in this method is that depth values per pixel will vary gradually along a gradient for a given object. As such, turntable datasets, such as the dinosaur dataset, are particularly easy for our method. Our method modernizes some existing techniques and parallelizes them on a GPU, which produces results faster than other densification methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.