GPGPU Benchmark Suites: How Well Do They Sample the Performance Spectrum?

Ryoo, Jee Ho; Quirem, Saddam; LeBeane, Michael; Panda, Rutuparna; Song, Shuang; John, Lizy K.

doi:10.1109/icpp.2015.41

Cited by 9 publications

(4 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We operated this initial filtering by following the naming convention explained in Section III-B; specifically, we focused on three macro categories: a first category that refers to GPU memory accesses (L2 and DRAM, starts with lts); a second category that refers to metrics related to compute instructions (starts with gr, gpu, smsp inst or cycles) and a final category that summarizes branch divergence (starts with smsp warp). The reason for choosing these categories is because it has been shown to fit into what is known as principal components for GPGPU performance prediction and/or optimization [30], [31].…”

Section: A Ncu and Kernels' Performance Metricsmentioning

confidence: 99%

“…all the possible slowdown values, Y ), we plot both how the slowdown changes as we vary the two parameters, keeping fixed at 7 the number of CPU interferents (Figure 4), and the total slowdown distribution among all the generated kernels on all the possible number of interferents (Figure 5). We set p 0 ∈ [1, 10, 20, 30, 40, 50, 75, 100, 125, 150, 175, 200, 250, 300, 350, 400, 500, 1000] and p 1 ∈ [1, 3,7,9,10,12,14,17,23,30]. Considering these steps and ranges and the number of interferents, we generate 180 different versions of PARKERNEL and 1440 individual experiments calculated by varying the number of interferents.…”

Section: B a Parameterized Kernel: Parkernelmentioning

confidence: 99%

See 1 more Smart Citation

Machine Learning Techniques for Understanding and Predicting Memory Interference in CPU-GPU Embedded Systems

Masola,

Capodieci,

Rouxel

et al. 2023

2023 IEEE 29th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)

View full text Add to dashboard Cite

Nowadays, heterogeneous embedded platforms are extensively used in various low-latency applications, including the automotive industry, real-time IoT systems, and automated factories. These platforms utilize specific components, such as CPUs, GPUs, and neural network accelerators for efficient task processing and to solve specific problems with a lower power consumption compared to more traditional systems. However, since these accelerators share resources such as the global memory, it is crucial to understand how workloads behave under high computational loads to determine how parallel computational engines on modern platforms can interfere and adversely affect the system's predictability and performance. One area that remains unclear is the interference effect on shared memory resources between the CPU and GPU: more specifically, the latency degradation experienced by GPU kernels when memory-intensive CPU applications run concurrently. In this work, we first analyze the metrics that characterize the behavior of different kernels under various board conditions caused by CPU memory-intensive workloads on a Nvidia Jetson Xavier. Then, we exploit various machine learning methodologies aiming to estimate the latency degradation of kernels based on their metrics. As a result of this, we are able to identify the metrics that could potentially have the most significant impact when predicting the kernels completion latency degradation.

show abstract

Section: A Ncu and Kernels' Performance Metricsmentioning

confidence: 99%

Section: B a Parameterized Kernel: Parkernelmentioning

confidence: 99%

Machine Learning Techniques for Understanding and Predicting Memory Interference in CPU-GPU Embedded Systems

Masola,

Capodieci,

Rouxel

et al. 2023

2023 IEEE 29th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)

View full text Add to dashboard Cite

show abstract

“…Therefore, we used the Rodinia benchmark. 29 It correctly represents a variety of workload 45 and is a proven workload to test computing and profiling techniques.…”

Section: Overhead Analysismentioning

confidence: 99%

Visualization of profiling and tracing in CPU‐GPU programs

Fiorini

Dagenais

2022

Concurrency and Computation

View full text Add to dashboard Cite

Summary As the complexity of the toolchain increases for heterogeneous CPU‐GPU systems, the needs for comprehensive tracing and debugging tools also grows. Heterogeneous platforms bring new possibilities but also new performance issues that are hard to detect. Some techniques that were used on CPU programs are now adapted to GPUs. However, there are some concepts specific to GPUs, like SIMD processing, and the effects of the close interactions between the CPUs and the GPUs, with shared virtual memory and user‐level queues. Multiple sources of data need to be extracted and correlated to obtain a more global view of the performance. In this article, we introduce a novel approach for measuring and visualizing performance defects inside CPU‐GPU programs by combining kernel events, compute kernel events, user API calls and memory transfers. We created two new views that combine this information, to help provide a global view. This framework uses the open source user queue system described in the HSA standard. It can easily be adapted to any user queue system for heterogeneous computing devices. We compare this framework with current existing tools and test it against the Rodinia benchmark. We look at how the execution behavior affects the tracing and profiling overhead and we use Trace Compass to visualize the resulting trace.

show abstract

“…With regards to GPU simulation acceleration, there exist some research that either choose a portion [51] or perform a pre-characterization [52] of target workloads and then derive the execution time from the simulation results. There are also studies that focus on the generation of GPU benchmarks [53] to reveal GPU's performance spectrum, and modeling of GPU memory systems [54]. These studies are supplementary for GPU performance estimation techniques.…”

Section: Related Workmentioning

confidence: 99%

A Hybrid Framework for Fast and Accurate GPU Performance Estimation through Source-Level Analysis and Trace-Based Simulation

Wang

Huang

Knoll

et al. 2019

2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)

View full text Add to dashboard Cite

This paper proposes a hybrid framework for fast and accurate performance estimation of OpenCL kernels running on GPUs. The kernel execution flow is statically analyzed and thereupon the execution trace is generated via a loop-based bidirectional branch search. Then the trace is dynamically simulated to perform a dummy execution of the kernel to obtain the estimated time. The framework does not rely on profiling or measurement results which are used in conventional performance estimation techniques. Moreover, the lightweight trace-based simulation consumes much less time than a fine-grained GPU simulator. Our framework can accurately grasp the variation trend of the execution time in the design space and robustly predict the performance of the kernels across two generations of recent Nvidia GPU architectures. Experiments on four Commercial Off-The-Shelf (COTS) GPUs show that our framework can predict the runtime performance with average Mean Absolute Percentage Error (MAPE) of 17.04% and time consumption of a few seconds. We also demonstrate the practicability of our framework with a realworld application.

show abstract

GPGPU Benchmark Suites: How Well Do They Sample the Performance Spectrum?

Cited by 9 publications

References 32 publications

Machine Learning Techniques for Understanding and Predicting Memory Interference in CPU-GPU Embedded Systems

Machine Learning Techniques for Understanding and Predicting Memory Interference in CPU-GPU Embedded Systems

Visualization of profiling and tracing in CPU‐GPU programs

A Hybrid Framework for Fast and Accurate GPU Performance Estimation through Source-Level Analysis and Trace-Based Simulation

Contact Info

Product

Resources

About