PORPLE: An Extensible Optimizer for Portable Data Placement on GPU

Chen, Guoyang; Wu, Bo; Liu, Dong; Shen, Xipeng

doi:10.1109/micro.2014.20

Cited by 68 publications

(17 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Fauzia [17] uses instrumentation to find noncoalesced memory references and offers a PTX-level optimization technique. Porple [12] uses a small configuration language to specify memory placement of objects and combines it with an auto-tuner to achieve high performance.…”

Section: Related Workmentioning

confidence: 99%

gpucc: an open-source GPGPU compiler

Belevich

Bendersky

et al. 2016

Proceedings of the 2016 International Symposium on Code Generation and Optimization

View full text Add to dashboard Cite

Graphics Processing Units have emerged as powerful accelerators for massively parallel, numerically intensive workloads. The two dominant software models for these devices are NVIDIA's CUDA and the cross-platform OpenCL standard. Until now, there has not been a fully open-source compiler targeting the CUDA environment, hampering general compiler and architecture research and making deployment difficult in datacenter or supercomputer environments. In this paper, we present gpucc, an LLVM-based, fully open-source, CUDA compatible compiler for high performance computing. It performs various general and CUDAspecific optimizations to generate high performance code. The Clang-based frontend supports modern language features such as those in C++11 and C++14. Compile time is 8% faster than NVIDIA's toolchain (nvcc) and it reduces compile time by up to 2.4x for pathological compilations (>100 secs), which tend to dominate build times in parallel build environments. Compared to nvcc, gpucc's runtime performance is on par for several open-source benchmarks, such as Rodinia (0.8% faster), SHOC (0.5% slower), or Tensor (3.7% faster). It outperforms nvcc on internal large-scale end-to-end benchmarks by up to 51.0%, with a geometric mean of 22.9%.

show abstract

Section: Related Workmentioning

confidence: 99%

gpucc: an open-source GPGPU compiler

Belevich

Bendersky

et al. 2016

Proceedings of the 2016 International Symposium on Code Generation and Optimization

View full text Add to dashboard Cite

show abstract

“…Swapping some data of an evicted kernel to the host memory could be an option to alleviate the problem. Extensions of portable data placement optimizers (e.g., PORPLE [16][17][18]) to both host and device memory could facilitate the process. It is left to study in the future.…”

Section: Discussionmentioning

confidence: 99%

EffiSha

et al. 2017

Self Cite

View full text Add to dashboard Cite

Modern GPUs are broadly adopted in many multitasking environments, including data centers and smartphones. However, the current support for the scheduling of multiple GPU kernels (from different applications) is limited, forming a major barrier for GPU to meet many practical needs. This work for the first time demonstrates that on existing GPUs, efficient preemptive scheduling of GPU kernels is possible even without special hardware support. Specifically, it presents EffiSha, a pure software framework that enables preemptive scheduling of GPU kernels with very low overhead. The enabled preemptive scheduler offers flexible support of kernels of different priorities, and demonstrates significant potential for reducing the average turnaround time and improving the system overall throughput of programs that time share a modern GPU.

show abstract

“…• VFP, which is the analysis described in Section 3.3.1 • HOTL, which computes the miss ratio as described in Section 2 for a single cache of the combined size including LLC and all private caches (i.e., 12MB, 14MB, 16MB in two-, three-, and four-benchmark co-runs • Even, which assumes each co-run program uses an equal partition of the combined-size cache and then uses HOTL as described in Section 2 (Chen et al [6] developed this heuristic for GPU caches.) • Proportional(MissRatio) and Proportional(MissRate) are similar to Even, but the cache occupancy is proportional to its solo-run miss ratio (misses per hundred access) and miss rate (misses per second), respectively.…”

Section: Theory Versus Heuristicsmentioning

confidence: 99%

“…Bubble-up predicted the co-run performance (not just the miss ratio), but its measurement was intentionally machine dependent (and probe dependent) [34]. PORPLE was developed for GPUs for data "symbiosis" (rather than task) [6] and assumed even partitioning of shared cache. Section 4.3 shows that the PORPLE heuristic is highly accurate for exclusive CPU caches.…”

Section: Related Workmentioning

confidence: 99%

Cache Exclusivity and Sharing

Ding

Luo

et al. 2017

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

A problem on multicore systems is cache sharing, where the cache occupancy of a program depends on the cache usage of peer programs. Exclusive cache hierarchy as used on AMD processors is an effective solution to allow processor cores to have a large private cache while still benefitting from shared cache. The shared cache stores the "victims" (i.e., data evicted from private caches). The performance depends on how victims of co-run programs interact in shared cache.This article presents a new metric called the victim footprint (VFP). It is measured once per program in its solo execution and can then be combined to compute the performance of any exclusive cache hierarchy, replacing parallel testing with theoretical analysis. The work evaluates the VFP by using it to analyze cache sharing by parallel mixes of sequential programs, comparing the accuracy of the theory to hardware counter results, and measuring the benefit of exclusivity-aware analysis and optimization.

show abstract

PORPLE: An Extensible Optimizer for Portable Data Placement on GPU

Cited by 68 publications

References 26 publications

gpucc: an open-source GPGPU compiler

gpucc: an open-source GPGPU compiler

EffiSha

Cache Exclusivity and Sharing

Contact Info

Product

Resources

About