Accelerating GOR Algorithm Using CUDA

Gan, Xiangchao; liu, Cong; Wang, Zhiying; Shen, Li; Zhu, Qi; Liu, Jie; Chi, Lihua; Yan, Yihui; Yu, Bin

doi:10.12785/amis/072l28

Cited by 4 publications

(3 citation statements)

References 10 publications

(15 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The new graphics processing unit (GPU) has fast shared memory and slow memory. Reusing the data in shared memory is a key point to improve the performance of GPU applications [105][106][107][108][109]. A very useful optimization is presented for fractional derivative [69,110].…”

Section: Memory Access Optimization (Fractional Precomputing Operator)mentioning

confidence: 99%

Computational Challenge of Fractional Differential Equations and the Potential Solutions: A Survey

Gong

Bao

Tang

et al. 2015

Mathematical Problems in Engineering

Self Cite

View full text Add to dashboard Cite

We present a survey of fractional differential equations and in particular of the computational cost for their numerical solutions from the view of computer science. The computational complexities of time fractional, space fractional, and space-time fractional equations areO(N2M),O(NM2), andO(NM(M+N)) compared withO(MN) for the classical partial differential equations with finite difference methods, whereM,Nare the number of space grid points and time steps. The potential solutions for this challenge include, but are not limited to, parallel computing, memory access optimization (fractional precomputing operator), short memory principle, fast Fourier transform (FFT) based solutions, alternating direction implicit method, multigrid method, and preconditioner technology. The relationships of these solutions for both space fractional derivative and time fractional derivative are discussed. The authors pointed out that the technologies of parallel computing should be regarded as a basic method to overcome this challenge, and some attention should be paid to the fractional killer applications, high performance iteration methods, high order schemes, and Monte Carlo methods. Since the computation of fractional equations with high dimension and variable order is even heavier, the researchers from the area of mathematics and computer science have opportunity to invent cornerstones in the area of fractional calculus.

show abstract

Section: Memory Access Optimization (Fractional Precomputing Operator)mentioning

confidence: 99%

Computational Challenge of Fractional Differential Equations and the Potential Solutions: A Survey

Gong

Bao

Tang

et al. 2015

Mathematical Problems in Engineering

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, how to make best use of accelerator resources is still a challenge. Most research work is focused on maximizing utilization of accelerator cores by exploiting millions of concurrent threads, 7,[10][11][12][13] while few works are interested in data transfer into accelerators and fewer works have referred to non-contiguous data transfer with special respect to strided data transfer. However, non-contiguous chunks of data, especially for strided data, are widely applied to real-life scenarios, such as regions-of-interest (ROI) coding and critical component of dataset duplication for reliability.…”

Section: Introductionmentioning

confidence: 99%

“…However, it is more expensive to transfer data into CUDA memory. [10][11][12][13][14][15][16][17] Therefore, minimizing data transfer is proposed for GPU clusters, 18 and asynchronous transfer is also resorted to in porting FFT into CUDA. 11,[19][20][21] But, the above techniques are proposed for contiguous chunks of data transfer and difficult to extend into strided data in practice.…”

Section: Introductionmentioning

confidence: 99%

How to Transfer Strided Data into Accelerator

Gan

Wang

2021

AATCC Journal of Research

Self Cite

View full text Add to dashboard Cite

Data transfer from a host central processing unit (CPU) into an accelerator is a performance bottleneck for applications accelerated by accelerators (such as general purpose digital signal processing (GPDSP), many integrated core (MIC), and general purpose graphics processing unit (GPGPU)). It is complicated and inefficient to transfer non-contiguous data with special respect to strided data. In this work, we present three approaches to transfer strided data for different scenarios: Redundant copy (RC), selective copy (SC), and transfer after transformed (TaT). We propose a space and time efficient method named TaT, in which strided data are transformed on the CPU first and then transferred into the accelerator. We simulated regions-of-interest (ROI) coding and validate proposed techniques. TaT was superior to RC on space efficiency and close to SC on saving space, but better than SC on time waste respectively.

show abstract

Deep Learning and GPU Based Approaches to Protein Secondary Structure Prediction

Patel

2018

Communications in Computer and Information Science

View full text Add to dashboard Cite

Accelerating GOR Algorithm Using CUDA

Cited by 4 publications

References 10 publications

Computational Challenge of Fractional Differential Equations and the Potential Solutions: A Survey

Computational Challenge of Fractional Differential Equations and the Potential Solutions: A Survey

How to Transfer Strided Data into Accelerator

Deep Learning and GPU Based Approaches to Protein Secondary Structure Prediction

Contact Info

Product

Resources

About