2014 47th Annual IEEE/ACM International Symposium on Microarchitecture 2014
DOI: 10.1109/micro.2014.20
|View full text |Cite
|
Sign up to set email alerts
|

PORPLE: An Extensible Optimizer for Portable Data Placement on GPU

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 68 publications
(17 citation statements)
references
References 26 publications
0
17
0
Order By: Relevance
“…Fauzia [17] uses instrumentation to find noncoalesced memory references and offers a PTX-level optimization technique. Porple [12] uses a small configuration language to specify memory placement of objects and combines it with an auto-tuner to achieve high performance.…”
Section: Related Workmentioning
confidence: 99%
“…Fauzia [17] uses instrumentation to find noncoalesced memory references and offers a PTX-level optimization technique. Porple [12] uses a small configuration language to specify memory placement of objects and combines it with an auto-tuner to achieve high performance.…”
Section: Related Workmentioning
confidence: 99%
“…Swapping some data of an evicted kernel to the host memory could be an option to alleviate the problem. Extensions of portable data placement optimizers (e.g., PORPLE [16][17][18]) to both host and device memory could facilitate the process. It is left to study in the future.…”
Section: Discussionmentioning
confidence: 99%
“…• VFP, which is the analysis described in Section 3.3.1 • HOTL, which computes the miss ratio as described in Section 2 for a single cache of the combined size including LLC and all private caches (i.e., 12MB, 14MB, 16MB in two-, three-, and four-benchmark co-runs • Even, which assumes each co-run program uses an equal partition of the combined-size cache and then uses HOTL as described in Section 2 (Chen et al [6] developed this heuristic for GPU caches.) • Proportional(MissRatio) and Proportional(MissRate) are similar to Even, but the cache occupancy is proportional to its solo-run miss ratio (misses per hundred access) and miss rate (misses per second), respectively.…”
Section: Theory Versus Heuristicsmentioning
confidence: 99%
“…Bubble-up predicted the co-run performance (not just the miss ratio), but its measurement was intentionally machine dependent (and probe dependent) [34]. PORPLE was developed for GPUs for data "symbiosis" (rather than task) [6] and assumed even partitioning of shared cache. Section 4.3 shows that the PORPLE heuristic is highly accurate for exclusive CPU caches.…”
Section: Related Workmentioning
confidence: 99%