2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) 2016
DOI: 10.1109/isca.2016.46
|View full text |Cite
|
Sign up to set email alerts
|

Accelerating Dependent Cache Misses with an Enhanced Memory Controller

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 56 publications
(11 citation statements)
references
References 49 publications
0
11
0
Order By: Relevance
“…Concretely, we avoid high-frequency interactions with internal CPU resources, avoid tracking dependencies among non-load instructions and avoid the increased verification costs associated with complicating the CPU design. The cost of exact criticality and dependency tracking has motivated researchers to develop heuristic approaches [64,65,66]. Such heuristics are similar to the heuristics used in architecture-centric accounting which we have shown to be less accurate than dataflow accounting.…”
Section: Related Workmentioning
confidence: 97%
“…Concretely, we avoid high-frequency interactions with internal CPU resources, avoid tracking dependencies among non-load instructions and avoid the increased verification costs associated with complicating the CPU design. The cost of exact criticality and dependency tracking has motivated researchers to develop heuristic approaches [64,65,66]. Such heuristics are similar to the heuristics used in architecture-centric accounting which we have shown to be less accurate than dataflow accounting.…”
Section: Related Workmentioning
confidence: 97%
“…A hardware mechanism called cache-conscious wavefront scheduling, which uses an intra-wavefront locality detector to capture a locality, was proposed in [14]. To minimize dependent cache miss latency, Hashemi and others [16] proposed adding enough functionality to dynamically identify instructions at the core and migrate them to the memory controller for execution. To minimize dependent cache miss latency, Hashemi and others [16] proposed adding enough functionality to dynamically identify instructions at the core and migrate them to the memory controller for execution.…”
Section: Previous Workmentioning
confidence: 99%
“…A new memory-buffer chip called Centaur, which provides up to 128 MB of embedded DRAM buffer cache per processor along with an improved DRAM scheduler, was proposed in [15]. To minimize dependent cache miss latency, Hashemi and others [16] proposed adding enough functionality to dynamically identify instructions at the core and migrate them to the memory controller for execution. In [17], a dynamic scheduling algorithm was proposed for a set of sporadic realtime tasks that efficiently co-schedule a processor and a DMA execution to hide memory transfer latency.…”
Section: Previous Workmentioning
confidence: 99%
“…Various prior works [1,2,3,5,7,8,25,31,33,34,35,36,38,40,42,46,47,56,62,69,92,109,111,112,114,125,126,128,129,134,139,149] examine processing in memory to reduce DRAM latency. Other prior works propose memory scheduling techniques, [4,37,49,66,67,74,99,100,103,104,135,136,137,138,141], which generally reduce latency to access DRAM.…”
Section: Other Latency Reduction Mechanismsmentioning
confidence: 99%