Abstract:Nonuniform fast Fourier transforms dominate the computational cost in many applications including image reconstruction and signal processing. We thus present a generalpurpose GPU-based CUDA library for type 1 (nonuniform to uniform) and type 2 (uniform to nonuniform) transforms in dimensions 2 and 3, in single or double precision. It achieves high performance for a given user-requested accuracy, regardless of the distribution of nonuniform points, via cache-aware point reordering, and load-balanced blocked spr… Show more
“…The Radon transform can similarly be computed by performing the above steps in reverse order. In this work we have used the cuFINUFFT (Shih et al, 2021) library to compute NUFFTs. For a detailed discussion on the topic, we refer the reader to Dutt & Rokhlin (1993), Fessler & Sutton (2003), Greengard & Lee (2004), Barnett et al (2019) and Barnett (2021).…”
X-ray-based computed tomography is a well established technique for determining the three-dimensional structure of an object from its two-dimensional projections. In the past few decades, there have been significant advancements in the brightness and detector technology of tomography instruments at synchrotron sources. These advancements have led to the emergence of new observations and discoveries, with improved capabilities such as faster frame rates, larger fields of view, higher resolution and higher dimensionality. These advancements have enabled the material science community to expand the scope of tomographic measurements towards increasingly in situ and in operando measurements. In these new experiments, samples can be rapidly evolving, have complex geometries and restrictions on the field of view, limiting the number of projections that can be collected. In such cases, standard filtered back-projection often results in poor quality reconstructions. Iterative reconstruction algorithms, such as model-based iterative reconstructions (MBIR), have demonstrated considerable success in producing high-quality reconstructions under such restrictions, but typically require high-performance computing resources with hundreds of compute nodes to solve the problem in a reasonable time. Here, tomoCAM, is introduced, a new GPU-accelerated implementation of model-based iterative reconstruction that leverages non-uniform fast Fourier transforms to efficiently compute Radon and back-projection operators and asynchronous memory transfers to maximize the throughput to the GPU memory. The resulting code is significantly faster than traditional MBIR codes and delivers the reconstructive improvement offered by MBIR with affordable computing time and resources. tomoCAM has a Python front-end, allowing access from Jupyter-based frameworks, providing straightforward integration into existing workflows at synchrotron facilities.
“…The Radon transform can similarly be computed by performing the above steps in reverse order. In this work we have used the cuFINUFFT (Shih et al, 2021) library to compute NUFFTs. For a detailed discussion on the topic, we refer the reader to Dutt & Rokhlin (1993), Fessler & Sutton (2003), Greengard & Lee (2004), Barnett et al (2019) and Barnett (2021).…”
X-ray-based computed tomography is a well established technique for determining the three-dimensional structure of an object from its two-dimensional projections. In the past few decades, there have been significant advancements in the brightness and detector technology of tomography instruments at synchrotron sources. These advancements have led to the emergence of new observations and discoveries, with improved capabilities such as faster frame rates, larger fields of view, higher resolution and higher dimensionality. These advancements have enabled the material science community to expand the scope of tomographic measurements towards increasingly in situ and in operando measurements. In these new experiments, samples can be rapidly evolving, have complex geometries and restrictions on the field of view, limiting the number of projections that can be collected. In such cases, standard filtered back-projection often results in poor quality reconstructions. Iterative reconstruction algorithms, such as model-based iterative reconstructions (MBIR), have demonstrated considerable success in producing high-quality reconstructions under such restrictions, but typically require high-performance computing resources with hundreds of compute nodes to solve the problem in a reasonable time. Here, tomoCAM, is introduced, a new GPU-accelerated implementation of model-based iterative reconstruction that leverages non-uniform fast Fourier transforms to efficiently compute Radon and back-projection operators and asynchronous memory transfers to maximize the throughput to the GPU memory. The resulting code is significantly faster than traditional MBIR codes and delivers the reconstructive improvement offered by MBIR with affordable computing time and resources. tomoCAM has a Python front-end, allowing access from Jupyter-based frameworks, providing straightforward integration into existing workflows at synchrotron facilities.
“…Most of the memory in 3D was occupied by the activations from the 3D convolutional neural networks used in the image denoising step in NC-PDNet. Memory efficient implementations of NUFFT was carried out by using tensorflow-nufft [34], which is based on tensorflow implementations of cuFINUFFT [35].…”
Compressed sensing in magnetic resonance imaging essentially involves the optimization of (1) the sampling pattern in k-space under MR hardware constraints and (2) image reconstruction from undersampled k-space data. Recently, deep learning methods have allowed the community to address both problems simultaneously, especially in the non-Cartesian acquisition setting. This work aims to contribute to this field by tackling some major concerns in existing approaches. Particularly, current state-of-the-art learning methods seek hardware compliant k-space sampling trajectories by enforcing the hardware constraints through additional penalty terms in the training loss. Through ablation studies, we rather show the benefit of using a projection step to enforce these constraints and demonstrate that the resulting k-space trajectories are more flexible under a projection-based scheme, which results in superior performance in reconstructed image quality. In 2D studies, our novel method trajectories present an improved image reconstruction quality at a 20-fold acceleration factor on the fastMRI data set with SSIM scores of nearly 0.92–0.95 in our retrospective studies as compared to the corresponding Cartesian reference and also see a 3–4 dB gain in PSNR as compared to earlier state-of-the-art methods. Finally, we extend the algorithm to 3D and by comparing optimization as learning-based projection schemes, we show that data-driven joint learning-based method trajectories outperform model-based methods such as SPARKLING through a 2 dB gain in PSNR and 0.02 gain in SSIM.
“…Merging and slicing operations are offloaded to GPUs using the cuFINUFFT [15] library, and data movement is handled using pyCUDA [16]. We compare performance of a multi-threaded instance using the FINUFFT [8] library with OpenMP to an equivalent CUDA implementation on a NVIDIA V100 and find that the forward function call runs approximately 1.5× faster, and the adjoint function call runs approximately 8× faster for our dataset.…”
The Linac Coherent Light Source (LCLS) is an Xray free electron laser (XFEL) facility enabling the study of the structure and dynamics of single macromolecules. A major upgrade will bring the repetition rate of the X-ray source from 120 to 1 million pulses per second. Exascale high performance computing (HPC) capabilities will be required to process the corresponding data rates. We present SpiniFEL, an application used for structure determination of proteins from single-particle imaging (SPI) experiments. An emerging technique for imaging individual proteins and other large molecular complexes by outrunning radiation damage, SPI breaks free from the need for crystallization (which is difficult for some proteins) and allows for imaging molecular dynamics at near ambient conditions. SpiniFEL is being developed to run on supercomputers in near real-time while an experiment is taking place, so that the feedback about the data can guide the data collection strategy. We describe here how we reformulated the mathematical framework for parallelizable implementation and accelerated the most compute intensive parts of the application. We also describe the use of Pygion, a Python interface for the Legion task-based programming model and compare to our existing MPI+GPU implementation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.