2015 International Conference on Parallel Architecture and Compilation (PACT) 2015
DOI: 10.1109/pact.2015.38
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
52
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 73 publications
(53 citation statements)
references
References 60 publications
1
52
0
Order By: Relevance
“…In a block group, the metadata block stores the sequence ID (SID), which is the unique number in the memory log area to represent a block group, and the metadata (BLK-1. Note that memory controllers are becoming increasingly more intelligent and complex to deal with various scheduling and performance management issues in multi-core and heterogeneous systems (e.g., [5], [6], [7], [8], [11], [12], [13], [14], [21], [25], [26], [27], [32], [33], [34], [35], [38], [39], [42], [45], [46], [49], [50], [51], [52], [53], [54], [61], [62], [64], [65], [66], [67], [68], [81], [84], [85], [86], [87], [88], [89], [97], [98], [108], [110], [112], [113],…”
Section: Eager Commitmentioning
confidence: 99%
“…In a block group, the metadata block stores the sequence ID (SID), which is the unique number in the memory log area to represent a block group, and the metadata (BLK-1. Note that memory controllers are becoming increasingly more intelligent and complex to deal with various scheduling and performance management issues in multi-core and heterogeneous systems (e.g., [5], [6], [7], [8], [11], [12], [13], [14], [21], [25], [26], [27], [32], [33], [34], [35], [38], [39], [42], [45], [46], [49], [50], [51], [52], [53], [54], [61], [62], [64], [65], [66], [67], [68], [81], [84], [85], [86], [87], [88], [89], [97], [98], [108], [110], [112], [113],…”
Section: Eager Commitmentioning
confidence: 99%
“…In the worst case, due to the lack of L2 cache capacity, it is sometimes necessary to load the evicted data from the off-chip memory. 6,31,[33][34][35][36][37][38][39][40][41] Shared memory is an alternative to the L1 cache for storing preloaded data. There are several reasons to support this.…”
Section: Preloading In the Shared Memorymentioning
confidence: 99%
“…As many previous research studies have shown, effectively hiding cache resource contention is a crucial step to achieving high performance on GPUs. 6,31,[33][34][35][36][37][38][39][40][41]43 Previous studies of resolving the resource contention problems are based on dynamic analysis methods that require hardware modification. In addition to preloading in shared memory efficiently, it is necessary to combine static analysis to avoid the L1 cache from the resource contentions effectively.…”
Section: Impact Of Various Preload Factorsmentioning
confidence: 99%
See 1 more Smart Citation
“…In previous studies, many researchers proposed various ways to improve the performance of the parallel algorithm. The work in [40] mainly studies the effect of warp sizing and scheduling on performance, and the work in [41] also analyzes the impact of warp-level sizing and thread block-level resource management. Both these studies adjust the number of active warps to improve performance.…”
Section: Two-level Parallelism Optimization Modelmentioning
confidence: 99%