Hari Sundar scite author profile

In this article, we propose new parallel algorithms for the construction and 2:1 balance refinement of large linear octrees on distributed memory machines. Such octrees are used in many problems in computational science and engineering, e.g., object representation, image analysis, unstructured meshing, finite elements, adaptive mesh refinement, and N-body simulations. Fixed-size scalability and isogranular analysis of the algorithms using an MPI-based parallel implementation was performed on a variety of input data and demonstrated good scalability for different processor counts (1 to 1024 processors) on the Pittsburgh Supercomputing Center's TCS-1 AlphaServer. The results are consistent for different data distributions. Octrees with over a billion octants were constructed and balanced in less than a minute on 1024 processors. Like other existing algorithms for constructing and balancing octrees, our algorithms have O(N log N ) work and O(N ) storage complexity. Under reasonable assumptions on the distribution of octants and the work per octant, the parallel time complexity is O( N np log( N np ) + np log np), where N is the size of the final linear octree and np is the number of processors.

show abstract

Parallel geometric-algebraic multigrid on unstructured forests of octrees

Sundar

Biros

Burstedde

et al. 2012

View full text Add to dashboard Cite

We present a parallel multigrid method for solving variable-coefficient elliptic partial differential equations on arbitrary geometries using highly adapted meshes. Our method is designed for meshes that are built from an unstructured hexahedral macro mesh, in which each macro element is adaptively refined as an octree. This forest-of-octrees approach enables us to generate meshes for complex geometries with arbitrary levels of local refinement. We use geometric multigrid (GMG) for each of the octrees and algebraic multigrid (AMG) as the coarse grid solver. We designed our GMG sweeps to entirely avoid collectives, thus minimizing communication cost.We present weak and strong scaling results for the 3D variablecoefficient Poisson problem that demonstrate high parallel scalability. As a highlight, the largest problem we solve is on a non-uniform mesh with 100 billion unknowns on 262,144 cores of NCCS's Cray XK6 "Jaguar"; in this solve we sustain 272 TFlops/s.

show abstract

FFT, FMM, or Multigrid? A comparative Study of State-Of-the-Art Poisson Solvers for Uniform and Nonuniform Grids in the Unit Cube

Gholami¹,

Malhotra²,

Sundar³

et al. 2016

SIAM J. Sci. Comput.

View full text Add to dashboard Cite

From molecular dynamics and quantum chemistry, to plasma physics and computational astrophysics, Poisson solvers in the unit cube are used in many applications in computational science and engineering. In this work, we benchmark and discuss the performance of the scalable methods for the Poisson problem which are used widely in practice: the Fast Fourier Transform (FFT), the Fast Multipole Method (FMM), the geometric multigrid (GMG) and algebraic multigrid (AMG). Our focus is on solvers support high-order, highly non-uniform discretizations, but for reference we compare with solvers specialized for problems on regular grids. So, we include FFT, since it is a very popular algorithm for several practical applications, and the finite element variant of HPGMG, a high-performance geometric multigrid benchmark. In total we compare five different codes, three of which are developed in our group. Our FFT, GMG and FMM are parallel solvers that use high-order approximation schemes for Poisson problems with continuous forcing functions (the source or right-hand side). Our FFT code is based on the FFTW for single node parallelism. The AMG code is from the Trilinos library from the Sandia National Laboratory. Our geometric multigrid and our FMM support octree based mesh refinement, variable coefficients, and enable highly non-uniform discretizations. The GMG, actually also supports complex (non-cubic) geometries using a forest of octrees.We examine and report results for weak scaling, strong scaling, and time to solution for uniform and highly refined grids. We present results on the Stampede system at the Texas Advanced Computing Center and on the Titan system at the Oak Ridge National Laboratory. In our largest test case, we solved a problem with 600 billion unknowns on 229,379 cores of Titan. Overall, all methods scale quite well to these problem sizes. We have tested all of the methods with different source functions (the right hand side in the Poisson problem). Our results indicate that FFT is the method of choice for smooth source functions that require uniform resolution. However, FFT loses its performance advantage when the source function has highly localized features like internal sharp layers. FMM and GMG considerably outperform FFT for those cases. The distinction between FMM and GMG is less pronounced and is sensitive to the quality (from a performance point of view) of the underlying implementations. In most cases, high-order accurate versions of GMG and FMM significantly outperform their low-order accurate counterparts.

show abstract

Comparison of multigrid algorithms for high‐order continuous finite element discretizations

Sundar

Stadler

Biros

2015

Numerical Linear Algebra App

View full text Add to dashboard Cite

We present a comparison of different multigrid approaches for the solution of systems arising from high-order continuous finite element discretizations of elliptic partial differential equations on complex geometries. We consider the pointwise Jacobi, the Chebyshev-accelerated Jacobi and the symmetric successive over-relaxation (SSOR) smoothers, as well as elementwise block Jacobi smoothing. Three approaches for the multigrid hierarchy are compared: (1) high-order h-multigrid, which uses high-order interpolation and restriction between geometrically coarsened meshes; (2) p-multigrid, in which the polynomial order is reduced while the mesh remains unchanged, and the interpolation and restriction incorporate the different-order basis functions; and (3), a first-order approximation multigrid preconditioner constructed using the nodes of the high-order discretization. This latter approach is often combined with algebraic multigrid for the low-order operator and is attractive for high-order discretizations on unstructured meshes, where geometric coarsening is difficult. Based on a simple performance model, we compare the computational cost of the different approaches. Using scalar test problems in two and three dimensions with constant and varying coefficients, we compare the performance of the different multigrid approaches for polynomial orders up to 16. Overall, both h-and p-multigrid work well; the first-order approximation is less efficient. For constant coefficients, all smoothers work well. For variable coefficients, Chebyshev and SSOR smoothing outperform Jacobi smoothing. While all of the tested methods converge in a mesh-independent number of iterations, none of them behaves completely independent of the polynomial order. When multigrid is used as a preconditioner in a Krylov method, the iteration number decreases significantly compared to using multigrid as a solver.

show abstract

Low-constant parallel algorithms for finite element simulations using linear octrees

Sundar

Sampath

Adavani

et al. 2007

View full text Add to dashboard Cite

In this article we propose parallel algorithms for the construction of conforming finite-element discretization on linear octrees. Existing octree-based discretizations scale to billions of elements, but the complexity constants can be high. In our approach we use several techniques to minimize overhead: a novel bottom-up tree-construction and 2:1 balance constraint enforcement; a Golomb-Rice encoding for compression by representing the octree and element connectivity as an Uniquely Decodable Code (UDC); overlapping communication and computation; and byte alignment for cache efficiency. The cost of applying the Laplacian is comparable to that of applying it using a direct indexing regular grid discretization with the same number of elements. Our algorithm has scaled up to four billion octants on 4096 processors on a Cray XT3 at the Pittsburgh Supercomputing Center. The overall tree construction time is under a minute in contrast to previous implementations that required several minutes; the evaluation of the discretization of a variable-coefficient Laplacian takes only a few seconds.

show abstract

Massively Parallel Simulations of Binary Black Hole Intermediate-Mass-Ratio Inspirals

Fernando¹,

Neilsen²,

Lim³

et al. 2019

SIAM J. Sci. Comput.

View full text Add to dashboard Cite

We present a highly-scalable framework that targets problems of interest to the numerical relativity and broader astrophysics communities. This framework combines a parallel octree-refined adaptive mesh with a wavelet adaptive multiresolution and a physics module to solve the Einstein equations of general relativity in the BSSNOK formulation. The goal of this work is to perform advanced, massively parallel numerical simulations of Intermediate Mass Ratio Inspirals (IMRIs) of binary black holes with mass ratios on the order of 100:1. These studies will be used to generate waveforms as used in LIGO data analysis and to calibrate semi-analytical approximate methods. Our framework consists of a distributed memory octree-based adaptive meshing framework in conjunction with a node-local code generator. The code generator makes our code portable across different architectures. The equations corresponding to the target application are written in symbolic notation and generators for different architectures can be added independent of the application. Additionally, this symbolic interface also makes our code extensible, and as such has been designed to easily accommodate many existing algorithms in astrophysics for plasma dynamics and radiation hydrodynamics. Our adaptive meshing algorithms and data-structures have been optimized for modern architectures with deep memory hierarchies. This enables our framework to have achieve excellent performance and scalability on modern leadership architectures. We demonstrate excellent weak scalability up to 131K cores on ORNL's Titan for binary mergers for mass ratios up to 100. Fig. 1: This figure illustrates the calculation of a single Runge-Kutta time step, computingthe solution at the advanced time, un+1, from data at the previous time step, un. For computational efficiency, spatial and time derivatives are evaluated on equispaced blocks (unzipped); a sparse grid constructed from wavelet coefficients is used for communication and to store the final solution (zipped). For each RK stage s we perform the unzip operation which results in a sequence of blocks which are used to compute the solution on the internal block ( ), using the padding values at the block boundary ( ) followed by a zip operation in between RK stages while the final update (i.e. next time step) performed using the zip version of the variables. Note that the re-meshing is performed as needed based on the wavelet expansion of the current solution (see §3.5). 7

show abstract

Efficient myocyte gene delivery with complete cardiac surgical isolation in situ

Bridges

Gopal

Holt

et al. 2005

The Journal of Thoracic and Cardiovascular Surgery

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hari Sundar

Skeleton based shape matching and retrieval

Bottom-Up Construction and 2:1 Balance Refinement of Linear Octrees in Parallel

Parallel geometric-algebraic multigrid on unstructured forests of octrees

FFT, FMM, or Multigrid? A comparative Study of State-Of-the-Art Poisson Solvers for Uniform and Nonuniform Grids in the Unit Cube

Comparison of multigrid algorithms for high‐order continuous finite element discretizations

Low-constant parallel algorithms for finite element simulations using linear octrees

Massively Parallel Simulations of Binary Black Hole Intermediate-Mass-Ratio Inspirals

Efficient myocyte gene delivery with complete cardiac surgical isolation in situ

Contact Info

Product

Resources

About