State-of-the-art cache-oblivious parallel algorithms for dynamic programming (DP) problems usually guarantee asymptotically optimal cache performance without any tuning of cache parameters, but they often fail to exploit the theoretically best parallelism at the same time. While these algorithms achieve cache-optimality through the use of a recursive divide-and-conquer (DAC) strategy, scheduling tasks at the granularity of task dependency introduces artificial dependencies in addition to those arising from the defining recurrence equations. We removed the artificial dependency by scheduling tasks ready for execution as soon as all its real dependency constraints are satisfied, while preserving the cache-optimality by inheriting the DAC strategy. We applied our approach to a set of widely known dynamic programming problems, such as Floyd-Warshall's All-Pairs Shortest Paths, Stencil, and LCS. Theoretical analyses show that our techniques improve the span of 2-way DAC-based Floyd Warshall's algorithm on an n node graph from Θ n log 2 n to Θ (n), stencil computations on a d-dimensional hypercubic grid of width w for h time steps from Θ (d 2 h)w log(d+2)−1 to Θ (h), and LCS on two sequences of length n each from Θ n log 2 3 to Θ (n). In each case, the total work and cache complexity remain asymptotically optimal. Experimental measurements exhibit a 3 -5 times improvement in absolute running time, 10 -20 times improvement in burdened span by Cilkview, and approximately the same L1/L2 cache misses by PAPI.
Stencil computations are widely used to simulate the change of state of physical systems across a multidimensional grid over multiple timesteps. The state-of-the-art techniques in this area fall into three groups: cache-aware tiled looping algorithms, cache-oblivious divide-and-conquer trapezoidal algorithms, and Krylov subspace methods.In this paper, we present two efficient parallel algorithms for performing linear stencil computations. Current direct solvers in this domain are computationally inefficient, and Krylov methods require manual labor and mathematical training. We solve these problems for linear stencils by using DFT preconditioning on a Krylov method to achieve a direct solver which is both fast and general. Indeed, while all currently available algorithms for solving general linear stencils perform Θ (𝑁𝑇 ) work, where 𝑁 is the size of the spatial grid and 𝑇 is the number of timesteps, our algorithms perform 𝑜 (𝑁𝑇 ) work.To the best of our knowledge, we give the first algorithms that use fast Fourier transforms to compute final grid data by evolving the initial data for many timesteps at once. Our algorithms handle both periodic and aperiodic boundary conditions, and achieve polynomially better performance bounds (i.e., computational complexity and parallel runtime) than all other existing solutions.Initial experimental results show that implementations of our algorithms that evolve grids of roughly 10 7 cells for around 10 5 timesteps run orders of magnitude faster than state-of-the-art implementations for periodic stencil problems, and 1.3× to 8.5× faster for aperiodic stencil problems.
We present efficient parallel recursive divide-and-conquer algorithms for bubble sort, selection sort, and insertion sort. Our algorithms have excellent data locality and are highly parallel. The computational complexity of our insertion sort is ${{\mathcal{O}}}\left ({n^{\log _2 3}}\right )$ in contrast to ${{\mathcal{O}}}\left ({n^2}\right )$ of standard insertion sort.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.