SUMMARYWe implement and evaluate a massively parallel and scalable algorithm based on a multigrid preconditioned Defect Correction method for the simulation of fully nonlinear free surface flows. The simulations are based on a potential model that describes wave propagation over uneven bottoms in three space dimensions and is useful for fast analysis and prediction purposes in coastal and offshore engineering. A dedicated numerical model based on the proposed algorithm is executed in parallel by utilizing affordable modern special purpose graphics processing unit (GPU). The model is based on a low-storage flexible-order accurate finite difference method that is known to be efficient and scalable on a CPU core (single thread). To achieve parallel performance of the relatively complex numerical model, we investigate a new trend in high-performance computing where many-core GPUs are utilized as high-throughput co-processors to the CPU. We describe and demonstrate how this approach makes it possible to do fast desktop computations for large nonlinear wave problems in numerical wave tanks (NWTs) with close to 50/100 million total grid points in double/single precision with 4 GB global device memory available. A new code base has been developed in C++ and compute unified device architecture C and is found to improve the runtime more than an order in magnitude in double precision arithmetic for the same accuracy over an existing CPU (single thread) Fortran 90 code when executed on a single modern GPU. These significant improvements are achieved by carefully implementing the algorithm to minimize data-transfer and take advantage of the massive multi-threading capability of the GPU device.
Objective We demonstrate and evaluate the first markerless motion tracker compatible with PET, MRI, and simultaneous PET/MRI systems for motion correction (MC) of brain imaging. Methods PET and MRI compatibility is achieved by careful positioning of in-bore vision extenders and by placing all electronic components out-of-bore. The motion tracker is demonstrated in a clinical setup during a pediatric PET/MRI study including 94 pediatric patient scans. PET MC is presented for two of these scans using a customized version of the Multiple Acquisition Frame method. Prospective MC of MRI acquisition of two healthy subjects is demonstrated using a motion-aware MRI sequence. Real-time motion estimates are accompanied with a tracking validity parameter to improve tracking reliability. Results For both modalities, MC shows that motion induced artifacts are noticeably reduced and that motion estimates are sufficiently accurate to capture motion ranging from small respiratory motion to large intentional motion. In the PET/MRI study, a time-activity curve analysis shows image improvements for a patient performing head movements corresponding to a tumor motion of ±5-10 mm with a 19% maximal difference in standardized uptake value before and after MC. Conclusion The first markerless motion tracker is successfully demonstrated for prospective MC in MRI and MC in PET with good tracking validity. Significance As simultaneous PET/MRI systems have become available for clinical use, an increasing demand for accurate motion tracking and MC in PET/MRI scans has emerged. The presented markerless motion tracker facilitate this demand.
Quantifying the financial savings of motion correction in brain MRIA model-based estimate of the costs arising from patient head motion and potential savings from implementation of motion correction This article is protected by copyright. All rights reserved. This is the author manuscript accepted for publication and has undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process, which may lead to differences between this version and the Version of Record. Please cite this article as
We present performance results of a mixed-precision strategy developed to improve a recently developed massively parallel GPU-accelerated tool for fast and scalable simulation of unsteady fully nonlinear free surface water waves over uneven depths (Engsig-Karup et.al. 2011). The underlying wave model is based on a potential flow formulation, which requires efficient solution of a Laplace problem at large-scales. We report recent results on a new mixed-precision strategy for efficient iterative high-order accurate and scalable solution of the Laplace problem using a multigrid-preconditioned defect correction method. The improved strategy improves the performance by exploiting architectural features of modern GPUs for mixed precision computations and is tested in a recently developed generic library for fast prototyping of PDE solvers. The new wave tool is applicable to solve and analyze large-scale wave problems in coastal and offshore engineering.
Purpose: To compare prospective motion correction (PMC) and retrospective motion correction (RMC) in Cartesian 3D-encoded MPRAGE scans and to investigate the effects of correction frequency and parallel imaging on the performance of RMC.Methods: Head motion was estimated using a markerless tracking system and sent to a modified MPRAGE sequence, which can continuously update the imaging FOV to perform PMC. The prospective correction was applied either before each echo train (before-ET) or at every sixth readout within the ET (within-ET). RMC was applied during image reconstruction by adjusting k-space trajectories according to the measured motion. The motion correction frequency was retrospectively increased with RMC or decreased with reverse RMC. Phantom and in vivo experiments were used to compare PMC and RMC, as well as to compare within-ET and before-ET correction frequency during continuous motion. The correction quality was quantitatively evaluated using the structural similarity index measure with a reference image without motion correction and without intentional motion.Results: PMC resulted in superior image quality compared to RMC both visually and quantitatively. Increasing the correction frequency from before-ET to within-ET reduced the motion artifacts in RMC. A hybrid PMC and RMC correction, that is, retrospectively increasing the correction frequency of before-ET PMC to within-ET, also reduced motion artifacts. Inferior performance of RMC
This is an open access article under the terms of the Creat ive Commo ns Attri bution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
Users may download and print one copy of any publication from the public portal for the purpose of private study or research. You may not further distribute the material or use it for any profit-making activity or commercial gain You may freely distribute the URL identifying the publication in the public portal If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.
The focus of this article is on the parallel scalability of a distributed multigrid framework, known as the DTU Compute GPUlab Library, for execution on graphics processing unit (GPU)-accelerated supercomputers. We demonstrate near-ideal weak scalability for a high-order fully nonlinear potential flow (FNPF) time domain model on the Oak Ridge Titan supercomputer, which is equipped with a large number of many-core CPU-GPU nodes. The high-order finite difference scheme for the solver is implemented to expose data locality and scalability, and the linear Laplace solver is based on an iterative multilevel preconditioned defect correction method designed for high-throughput processing and massive parallelism. In this work, the FNPF discretization is based on a multi-block discretization that allows for large-scale simulations. In this setup, each grid block is based on a logically structured mesh with support for curvilinear representation of horizontal block boundaries to allow for an accurate representation of geometric features such as surface-piercing bottom-mounted structures—for example, mono-pile foundations as demonstrated. Unprecedented performance and scalability results are presented for a system of equations that is historically known as being too expensive to solve in practical applications. A novel feature of the potential flow model is demonstrated, being that a modest number of multigrid restrictions is sufficient for fast convergence, improving overall parallel scalability as the coarse grid problem diminishes. In the numerical benchmarks presented, we demonstrate using 8192 modern Nvidia GPUs enabling large-scale and high-resolution nonlinear marine hydrodynamics applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.