The International Symposium on Memory Systems 2020
DOI: 10.1145/3422575.3422803
|View full text |Cite
|
Sign up to set email alerts
|

Improving Memory Reliability by Bounding DRAM Faults

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
14
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(14 citation statements)
references
References 0 publications
0
14
0
Order By: Relevance
“…To combat DRAM-related failures, system designers typically incorporate reliability, availability and serviceability (RAS) features [153][154][155] that collectively improve system reliability beyond what commodity DRAM chips can provide alone. In general, memory RAS is a broad research area with solutions spanning the hardware-so ware stack, ranging from hardware-based mechanisms within the DRAM chip (e.g., on-die ECC scrubbing [11,101,156], postpackage repair [10,11,[157][158][159], target row refresh [100,160]), memory controller (e.g., rank-level ECC [48-55, 57-60, 81], rank-level ECC scrubbing [56, 61, 62, 62-65, 82, 156, 161], repair techniques [22,79,[162][163][164][165][166][167][168][169]) to so ware-only solutions (e.g., page retirement [76,[120][121][122][123][124], failure prediction [170][171][172][173][174][175]).…”
Section: Bene Ts For Dram Consumersmentioning
confidence: 99%
See 1 more Smart Citation
“…To combat DRAM-related failures, system designers typically incorporate reliability, availability and serviceability (RAS) features [153][154][155] that collectively improve system reliability beyond what commodity DRAM chips can provide alone. In general, memory RAS is a broad research area with solutions spanning the hardware-so ware stack, ranging from hardware-based mechanisms within the DRAM chip (e.g., on-die ECC scrubbing [11,101,156], postpackage repair [10,11,[157][158][159], target row refresh [100,160]), memory controller (e.g., rank-level ECC [48-55, 57-60, 81], rank-level ECC scrubbing [56, 61, 62, 62-65, 82, 156, 161], repair techniques [22,79,[162][163][164][165][166][167][168][169]) to so ware-only solutions (e.g., page retirement [76,[120][121][122][123][124], failure prediction [170][171][172][173][174][175]).…”
Section: Bene Ts For Dram Consumersmentioning
confidence: 99%
“…In Step 2, we propose extending DRAM standards with explicit DRAM reliability standards that provide industrystandard guarantees, tools, and/or information helpful to consumers. We envision di erent possibilities for these reliability standards, including (1) reliability guarantees for how a chip is expected to behave under certain operating conditions (e.g., predictable behavior of faults [101]); (2) disclosure of industry-validated DRAM reliability models and testing strategies suitable for commodity DRAM chips (e.g., similar to how JEDEC JEP122 [102], JESD218 [103], and JESD219 [104] address Flash-memory-speci c error mechanisms [105][106][107] such as oating-gate data retention [108][109][110][111] and models for physical phenomena such as threshold voltage distributions [112][113][114][115]); and (3) requirements for manufacturers to directly provide relevant information about their DRAM chips (e.g., the information requested in Step 1). As the DRAM industry continues to evolve, we anticipate closer collaboration between DRAM and system designers to e ciently overcome the technology scaling challenges that DRAM is already facing [26,28,116,117].…”
Section: Introductionmentioning
confidence: 99%
“…On-die ECC addresses uncorrelated single-bit errors that limit a manufacturers' factory yield [21,43,60,74,121,145,146,162] and is already prevalent among commodity DRAM chips today. Therefore, it is imperative that system-level error-mitigation mechanisms take on-die ECC into account, as clearly motivated by several prior works [21,32,43,69,137,145,162]. [18-20, 113, 114], main memory is generally designed separately from the memory controller [130].…”
Section: Addressing Scaling-related Errorsmentioning
confidence: 99%
“…Unfortunately, this separation discourages building a unified error-mitigation mechanism across the memory and its controller. This is exemplified by the widespread use of proprietary DRAM on-die ECC, which introduces new reliability challenges for designing error mitigation mechanisms within the DRAM controller [21,32,43,137,145,162]. In general, the standardized interface between the memory and the controller (e.g., JEDEC DRAM standards [64,67,68]) must be modified to develop a joint solution, which impacts all manufacturers and consumers involved, and thus is a laborious and long (and often politically-charged) process.…”
Section: Addressing Scaling-related Errorsmentioning
confidence: 99%
See 1 more Smart Citation