Data handling inefficiencies between CUDA, 3D rendering, and system memory

Gordon, Brian; Sohoni, Sohum; Chandler, Damon M.

doi:10.1109/iiswc.2010.5648828

Cited by 9 publications

(8 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Other researchers have investigated techniques for accelerating specific IQA algorithms. For example, in [364], Gordon et al investigated the acceleration of PSNR by using GPGPU implementations in both CUDA and OpenGL. Via a performance analysis, they specifically investigated how the application and system performance is affected by utilizing GPGPU acceleration of PSNR in a model-based coding application (the primary bottleneck in model-based coding stems from the optimization procedure used to determine the model parameters from the input image).…”

Section: Acceleration Of Specific Iqa Algorithmsmentioning

confidence: 99%

Seven Challenges in Image Quality Assessment: Past, Present, and Future Research

Chandler

2013

ISRN Signal Processing

374

191

View full text Add to dashboard Cite

Image quality assessment (IQA) has been a topic of intense research over the last several decades. With each year comes an increasing number of new IQA algorithms, extensions of existing IQA algorithms, and applications of IQA to other disciplines. In this article, I first provide an up-to-date review of research in IQA, and then I highlight several open challenges in this field. The first half of this article provides discuss key properties of visual perception, image quality databases, existing full-reference, no-reference, and reduced-reference IQA algorithms. Yet, despite the remarkable progress that has been made in IQA, many fundamental challenges remain largely unsolved. The second half of this article highlights some of these challenges. I specifically discuss challenges related to lack of complete perceptual models for: natural images, compound and suprathreshold distortions, and multiple distortions, and the interactive effects of these distortions on the images. I also discuss challenges related to IQA of images containing nontraditional, and I discuss challenges related to the computational efficiency. The goal of this article is not only to help practitioners and researchers keep abreast of the recent advances in IQA, but to also raise awareness of the key limitations of current IQA knowledge.

show abstract

Section: Acceleration Of Specific Iqa Algorithmsmentioning

confidence: 99%

Seven Challenges in Image Quality Assessment: Past, Present, and Future Research

Chandler

2013

ISRN Signal Processing

374

191

View full text Add to dashboard Cite

show abstract

“…For the single-GPU evaluations, in order to minimize latency introduced by PCIe memory traffic, we performed this transfer from the host memory to the GPU memory only once during the beginning of the program. As described previously, and as demonstrated in similar studies [20,22], transferring data to and from the GPU memory creates a performance bottleneck which can result in a latency that can obviate the performance gains. For the multi-GPU evaluation, the arrays were also transferred to each GPU only once at the beginning; however, a considerable number of inter-GPU transfers were required, as discussed later in Section 5.3.…”

Section: Cpu Tasks: Loading Images; Computing Overall Qualitymentioning

confidence: 70%

“…Furthermore, many perceptual models employ multiple independent stages, which are candidates for task-level parallelization. However, as demonstrated in previous studies [19][20][21][22], the need to transfer data to and from the GPU's dynamic random-access memory (DRAM) gives rise to performance degradation due to memory bandwidth bottlenecks. This memory bandwidth bottleneck is commonly the single largest limiting factor in the use of GPUs for image and video processing applications.…”

Section: Introductionmentioning

confidence: 99%

GPU Acceleration of the Most Apparent Distortion Image Quality Assessment Algorithm

et al. 2018

Self Cite

View full text Add to dashboard Cite

The primary function of multimedia systems is to seamlessly transform and display content to users while maintaining the perception of acceptable quality. For images and videos, perceptual quality assessment algorithms play an important role in determining what is acceptable quality and what is unacceptable from a human visual perspective. As modern image quality assessment (IQA) algorithms gain widespread adoption, it is important to achieve a balance between their computational efficiency and their quality prediction accuracy. One way to improve computational performance to meet real-time constraints is to use simplistic models of visual perception, but such an approach has a serious drawback in terms of poor-quality predictions and limited robustness to changing distortions and viewing conditions. In this paper, we investigate the advantages and potential bottlenecks of implementing a best-in-class IQA algorithm, Most Apparent Distortion, on graphics processing units (GPUs). Our results suggest that an understanding of the GPU and CPU architectures, combined with detailed knowledge of the IQA algorithm, can lead to non-trivial speedups without compromising prediction accuracy. A single-GPU and a multi-GPU implementation showed a 24× and a 33× speedup, respectively, over the baseline CPU implementation. A bottleneck analysis revealed the kernels with the highest runtimes, and a microarchitectural analysis illustrated the underlying reasons for the high runtimes of these kernels. Programs written with optimizations such as blocking that map well to CPU memory hierarchies do not map well to the GPU’s memory hierarchy. While compute unified device architecture (CUDA) is convenient to use and is powerful in facilitating general purpose GPU (GPGPU) programming, knowledge of how a program interacts with the underlying hardware is essential for understanding performance bottlenecks and resolving them.

show abstract

“…Gordon et al [6] have explained how the limitation of CUDA in using the CPUs storage units for transferring data between the GPU and the memory results in a lower performance than through the CPU. They have also depicted a shortage in CUDA for transferring data between GPU-based APIs such as OpenGL and Direct3D.…”

Section: Background and Motivationmentioning

confidence: 99%

A time-efficient image processing algorithm for multicore/manycore parallel computing

2015

View full text Add to dashboard Cite

Traditional methods for processing large images are extremely time intensive. Also, conventional image processing methods do not take advantage of available computing resources such as multicore central processing unit (CPU) and manycore general purpose graphics processing unit (GP-GPU). Studies suggest that applying parallel programming techniques to various image filters should improve the overall performance without compromising the existing resources. Recent studies also suggest that parallel implementation of image processing on compute unified device architecture (CUDA)-accelerated CPU/GPU system has potential to process the image very fast. In this paper, we introduce a CUDA-accelerated image processing method suitable for multicore/manycore systems. Using a bitmap file, we implement image processing and filtering through traditional sequential C and newly introduced parallel CUDA/C programs. A key step of the proposed algorithm is to load the pixel's bytes in a one dimensional array with length equal to matrix width * matrix height * bytes per pixel. This is done to process the image concurrently in parallel. According to experimental results, the proposed CUDA-accelerated parallel image processing algorithm provides benefit with a speedup factor up to 365 for an image with 8,192x8,192 pixels.

show abstract

Data handling inefficiencies between CUDA, 3D rendering, and system memory

Cited by 9 publications

References 16 publications

Seven Challenges in Image Quality Assessment: Past, Present, and Future Research

Seven Challenges in Image Quality Assessment: Past, Present, and Future Research

GPU Acceleration of the Most Apparent Distortion Image Quality Assessment Algorithm

A time-efficient image processing algorithm for multicore/manycore parallel computing

Contact Info

Product

Resources

About