2011 International Conference on Parallel Architectures and Compilation Techniques 2011
DOI: 10.1109/pact.2011.37
|View full text |Cite
|
Sign up to set email alerts
|

An Alternative Memory Access Scheduling in Manycore Accelerators

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2014
2014
2019
2019

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 2 publications
0
6
0
Order By: Relevance
“…We compare our new instructions to two related works: cache-conscious wavefront scheduling (CCWS) (Rogers et al 2012) and alternative memory access scheduling which batch requests which map to the same DRAM row (BATCH) (Kim et al 2011;Yuan et al 2009). Though CCWS and BATCH do not affect the dynamic instruction stream of the application, both techniques can reduce memory request interference, similar to our new instructions.…”
Section: Comparison To Related Workmentioning
confidence: 92%
See 1 more Smart Citation
“…We compare our new instructions to two related works: cache-conscious wavefront scheduling (CCWS) (Rogers et al 2012) and alternative memory access scheduling which batch requests which map to the same DRAM row (BATCH) (Kim et al 2011;Yuan et al 2009). Though CCWS and BATCH do not affect the dynamic instruction stream of the application, both techniques can reduce memory request interference, similar to our new instructions.…”
Section: Comparison To Related Workmentioning
confidence: 92%
“…Other related DRAM memory scheduling research propose methods to better schedule and prioritize requests from the SMs in order to avoid the effects of memory request interference and better exploit DRAM row buffer locality. One prior work suggests batching an SM's L1 cache miss requests by DRAM row into network packets (Kim et al 2011). Another work focuses on exposing and prioritizing row buffer hits in the on-chip network (Yuan et al 2009).…”
Section: Related Workmentioning
confidence: 99%
“…Another issue is the head-of-line problem in the memory request queue that unavoidably arises when memory requests to the same rows are grouped together [10,12]. Modern DRAM chips are usually organized into banks, and memory requests to different banks can be serviced concurrently.…”
Section: Design Issuesmentioning
confidence: 99%
“…Thus, they proposed a NoC arbitration scheme called Hold Grant to preserve the row buffer access locality of memory request streams. In [10], the idea of superpackets is proposed for the shader core to maintain row buffer locality for the memory requests out of the core. While these works focus on maintaining the row buffer locality from a single shader core, our work exploit the coalescing opportunity across the cores inside the NoC.…”
Section: Related Workmentioning
confidence: 99%
“…• stream-specific or locality-aware arbitration within GPU, as suggested in [15] [10], -this provides marginal benefit since there are multiple arbitration points for different streams and processing elements in the internal interconnection network. Maintaining locality when requests get merged at various locations before they reach the memory is challenging with internal-to-GPU arbitration mechanisms.…”
Section: Introductionmentioning
confidence: 99%