IEEE International Symposium on Workload Characterization (IISWC'10) 2010
DOI: 10.1109/iiswc.2010.5648828
|View full text |Cite
|
Sign up to set email alerts
|

Data handling inefficiencies between CUDA, 3D rendering, and system memory

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
6
0
1

Year Published

2012
2012
2024
2024

Publication Types

Select...
5
3

Relationship

4
4

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 16 publications
1
6
0
1
Order By: Relevance
“…Other researchers have investigated techniques for accelerating specific IQA algorithms. For example, in [364], Gordon et al investigated the acceleration of PSNR by using GPGPU implementations in both CUDA and OpenGL. Via a performance analysis, they specifically investigated how the application and system performance is affected by utilizing GPGPU acceleration of PSNR in a model-based coding application (the primary bottleneck in model-based coding stems from the optimization procedure used to determine the model parameters from the input image).…”
Section: Acceleration Of Specific Iqa Algorithmsmentioning
confidence: 99%
“…Other researchers have investigated techniques for accelerating specific IQA algorithms. For example, in [364], Gordon et al investigated the acceleration of PSNR by using GPGPU implementations in both CUDA and OpenGL. Via a performance analysis, they specifically investigated how the application and system performance is affected by utilizing GPGPU acceleration of PSNR in a model-based coding application (the primary bottleneck in model-based coding stems from the optimization procedure used to determine the model parameters from the input image).…”
Section: Acceleration Of Specific Iqa Algorithmsmentioning
confidence: 99%
“…For the single-GPU evaluations, in order to minimize latency introduced by PCIe memory traffic, we performed this transfer from the host memory to the GPU memory only once during the beginning of the program. As described previously, and as demonstrated in similar studies [20,22], transferring data to and from the GPU memory creates a performance bottleneck which can result in a latency that can obviate the performance gains. For the multi-GPU evaluation, the arrays were also transferred to each GPU only once at the beginning; however, a considerable number of inter-GPU transfers were required, as discussed later in Section 5.3.…”
Section: Cpu Tasks: Loading Images; Computing Overall Qualitymentioning
confidence: 70%
“…Furthermore, many perceptual models employ multiple independent stages, which are candidates for task-level parallelization. However, as demonstrated in previous studies [19][20][21][22], the need to transfer data to and from the GPU's dynamic random-access memory (DRAM) gives rise to performance degradation due to memory bandwidth bottlenecks. This memory bandwidth bottleneck is commonly the single largest limiting factor in the use of GPUs for image and video processing applications.…”
Section: Introductionmentioning
confidence: 99%
“…Gordon et al [6] have explained how the limitation of CUDA in using the CPUs storage units for transferring data between the GPU and the memory results in a lower performance than through the CPU. They have also depicted a shortage in CUDA for transferring data between GPU-based APIs such as OpenGL and Direct3D.…”
Section: Background and Motivationmentioning
confidence: 99%