cuFINUFFT: a load-balanced GPU library for general-purpose nonuniform FFTs

Shih, Yu-hsuan; Wright, Garrett; Andén, Joakim; Blaschke, Johannes; Barnett, Alex H.

doi:10.48550/arxiv.2102.08463

Cited by 3 publications

(3 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Radon transform can similarly be computed by performing the above steps in reverse order. In this work we have used the cuFINUFFT (Shih et al, 2021) library to compute NUFFTs. For a detailed discussion on the topic, we refer the reader to Dutt & Rokhlin (1993), Fessler & Sutton (2003), Greengard & Lee (2004), Barnett et al (2019) and Barnett (2021).…”

Section: Figurementioning

confidence: 99%

tomoCAM: fast model-based iterative reconstruction via GPU acceleration and non-uniform fast Fourier transforms

Kumar,

Parkinson,

Donatelli

2024

J Synchrotron Radiat

View full text Add to dashboard Cite

X-ray-based computed tomography is a well established technique for determining the three-dimensional structure of an object from its two-dimensional projections. In the past few decades, there have been significant advancements in the brightness and detector technology of tomography instruments at synchrotron sources. These advancements have led to the emergence of new observations and discoveries, with improved capabilities such as faster frame rates, larger fields of view, higher resolution and higher dimensionality. These advancements have enabled the material science community to expand the scope of tomographic measurements towards increasingly in situ and in operando measurements. In these new experiments, samples can be rapidly evolving, have complex geometries and restrictions on the field of view, limiting the number of projections that can be collected. In such cases, standard filtered back-projection often results in poor quality reconstructions. Iterative reconstruction algorithms, such as model-based iterative reconstructions (MBIR), have demonstrated considerable success in producing high-quality reconstructions under such restrictions, but typically require high-performance computing resources with hundreds of compute nodes to solve the problem in a reasonable time. Here, tomoCAM, is introduced, a new GPU-accelerated implementation of model-based iterative reconstruction that leverages non-uniform fast Fourier transforms to efficiently compute Radon and back-projection operators and asynchronous memory transfers to maximize the throughput to the GPU memory. The resulting code is significantly faster than traditional MBIR codes and delivers the reconstructive improvement offered by MBIR with affordable computing time and resources. tomoCAM has a Python front-end, allowing access from Jupyter-based frameworks, providing straightforward integration into existing workflows at synchrotron facilities.

show abstract

Section: Figurementioning

confidence: 99%

tomoCAM: fast model-based iterative reconstruction via GPU acceleration and non-uniform fast Fourier transforms

Kumar,

Parkinson,

Donatelli

2024

J Synchrotron Radiat

View full text Add to dashboard Cite

show abstract

“…Most of the memory in 3D was occupied by the activations from the 3D convolutional neural networks used in the image denoising step in NC-PDNet. Memory efficient implementations of NUFFT was carried out by using tensorflow-nufft [34], which is based on tensorflow implementations of cuFINUFFT [35].…”

Section: Practical Implementationsmentioning

confidence: 99%

Jointly Learning Non-Cartesian k-Space Trajectories and Reconstruction Networks for 2D and 3D MR Imaging through Projection

Chaithya

Ciuciu

2023

Bioengineering

View full text Add to dashboard Cite

Compressed sensing in magnetic resonance imaging essentially involves the optimization of (1) the sampling pattern in k-space under MR hardware constraints and (2) image reconstruction from undersampled k-space data. Recently, deep learning methods have allowed the community to address both problems simultaneously, especially in the non-Cartesian acquisition setting. This work aims to contribute to this field by tackling some major concerns in existing approaches. Particularly, current state-of-the-art learning methods seek hardware compliant k-space sampling trajectories by enforcing the hardware constraints through additional penalty terms in the training loss. Through ablation studies, we rather show the benefit of using a projection step to enforce these constraints and demonstrate that the resulting k-space trajectories are more flexible under a projection-based scheme, which results in superior performance in reconstructed image quality. In 2D studies, our novel method trajectories present an improved image reconstruction quality at a 20-fold acceleration factor on the fastMRI data set with SSIM scores of nearly 0.92–0.95 in our retrospective studies as compared to the corresponding Cartesian reference and also see a 3–4 dB gain in PSNR as compared to earlier state-of-the-art methods. Finally, we extend the algorithm to 3D and by comparing optimization as learning-based projection schemes, we show that data-driven joint learning-based method trajectories outperform model-based methods such as SPARKLING through a 2 dB gain in PSNR and 0.02 gain in SSIM.

show abstract

“…Merging and slicing operations are offloaded to GPUs using the cuFINUFFT [15] library, and data movement is handled using pyCUDA [16]. We compare performance of a multi-threaded instance using the FINUFFT [8] library with OpenMP to an equivalent CUDA implementation on a NVIDIA V100 and find that the forward function call runs approximately 1.5× faster, and the adjoint function call runs approximately 8× faster for our dataset.…”

Section: Acceleration -Gpu Offloadingmentioning

confidence: 99%

Scaling and Acceleration of Three-dimensional Structure Determination for Single-Particle Imaging Experiments with SpiniFEL

Chang¹,

Slaughter²,

Mirchandaney³

et al. 2021

Preprint

View full text Add to dashboard Cite

The Linac Coherent Light Source (LCLS) is an Xray free electron laser (XFEL) facility enabling the study of the structure and dynamics of single macromolecules. A major upgrade will bring the repetition rate of the X-ray source from 120 to 1 million pulses per second. Exascale high performance computing (HPC) capabilities will be required to process the corresponding data rates. We present SpiniFEL, an application used for structure determination of proteins from single-particle imaging (SPI) experiments. An emerging technique for imaging individual proteins and other large molecular complexes by outrunning radiation damage, SPI breaks free from the need for crystallization (which is difficult for some proteins) and allows for imaging molecular dynamics at near ambient conditions. SpiniFEL is being developed to run on supercomputers in near real-time while an experiment is taking place, so that the feedback about the data can guide the data collection strategy. We describe here how we reformulated the mathematical framework for parallelizable implementation and accelerated the most compute intensive parts of the application. We also describe the use of Pygion, a Python interface for the Legion task-based programming model and compare to our existing MPI+GPU implementation.

show abstract

cuFINUFFT: a load-balanced GPU library for general-purpose nonuniform FFTs

Cited by 3 publications

References 23 publications

tomoCAM: fast model-based iterative reconstruction via GPU acceleration and non-uniform fast Fourier transforms

tomoCAM: fast model-based iterative reconstruction via GPU acceleration and non-uniform fast Fourier transforms

Jointly Learning Non-Cartesian k-Space Trajectories and Reconstruction Networks for 2D and 3D MR Imaging through Projection

Scaling and Acceleration of Three-dimensional Structure Determination for Single-Particle Imaging Experiments with SpiniFEL

Contact Info

Product

Resources

About