Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis 2015
DOI: 10.1145/2807591.2807606
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive and transparent cache bypassing for GPUs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
29
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 62 publications
(29 citation statements)
references
References 28 publications
0
29
0
Order By: Relevance
“…To summarize, we argue that GPUs are the promising platform for the ALS workload when taking both performance and power consumption into account. In the future, we will further investigate the performance gap between platforms and push the factorizing performance to the hardware limit (in particular on newer Intel Xeon Phi processors with onpackage high bandwidth memory [35,36], newer GPUs on warp-level [37,38], CTA-level [39] and cache-level [40], and other emergent accelerators such as Matrix-2000 [41]).…”
Section: Applying Optimizationsmentioning
confidence: 99%
“…To summarize, we argue that GPUs are the promising platform for the ALS workload when taking both performance and power consumption into account. In the future, we will further investigate the performance gap between platforms and push the factorizing performance to the hardware limit (in particular on newer Intel Xeon Phi processors with onpackage high bandwidth memory [35,36], newer GPUs on warp-level [37,38], CTA-level [39] and cache-level [40], and other emergent accelerators such as Matrix-2000 [41]).…”
Section: Applying Optimizationsmentioning
confidence: 99%
“…In the worst case, due to the lack of L2 cache capacity, it is sometimes necessary to load the evicted data from the off-chip memory. 6,31,[33][34][35][36][37][38][39][40][41] Shared memory is an alternative to the L1 cache for storing preloaded data. There are several reasons to support this.…”
Section: Preloading In the Shared Memorymentioning
confidence: 99%
“…As many previous research studies have shown, effectively hiding cache resource contention is a crucial step to achieving high performance on GPUs. 6,31,[33][34][35][36][37][38][39][40][41]43 Previous studies of resolving the resource contention problems are based on dynamic analysis methods that require hardware modification. In addition to preloading in shared memory efficiently, it is necessary to combine static analysis to avoid the L1 cache from the resource contentions effectively.…”
Section: Impact Of Various Preload Factorsmentioning
confidence: 99%
“…If the attacker detects the protection and then changes to use L1 data cache, Tangram will eliminate the covert channel formed through L1 data cache using cache bypassing. Previous studies show that the GPU L1D cache miss rate is so high so that the performance is not harmed when the GPU L1-D cache is bypassed [11,23,34,35,61,68]. Therefore, Tangram selectively bypasses the L1-D cache requests if the attacks are detected on the L1D cache instead.…”
Section: Tangram: Attack Mitigationmentioning
confidence: 99%