2021
DOI: 10.1109/lca.2021.3117150
|View full text |Cite
|
Sign up to set email alerts
|

HBM3 RAS: Enhancing Resilience at Scale

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(8 citation statements)
references
References 5 publications
0
6
0
Order By: Relevance
“…In contrast, modern DRAM chips exhibit much higher error rates because technology scaling exacerbates the underlying circuit-level error mechanisms that cause errors [86,142,143,166,167,224,225]. To combat these errors, DRAM producers use stronger error-mitigation mechanisms in modern DRAM chips (e.g., on-die ECC [34,38,40,41,87,115,119,167,224,[226][227][228][229][230], post-package repair [33,34,216,222,223], target row refresh [23,32,45,46], refresh management [34,47]), which are more expensive and incur higher performance and energy overheads.…”
Section: Breakdown Of the Separation Of Concernsmentioning
confidence: 99%
See 1 more Smart Citation
“…In contrast, modern DRAM chips exhibit much higher error rates because technology scaling exacerbates the underlying circuit-level error mechanisms that cause errors [86,142,143,166,167,224,225]. To combat these errors, DRAM producers use stronger error-mitigation mechanisms in modern DRAM chips (e.g., on-die ECC [34,38,40,41,87,115,119,167,224,[226][227][228][229][230], post-package repair [33,34,216,222,223], target row refresh [23,32,45,46], refresh management [34,47]), which are more expensive and incur higher performance and energy overheads.…”
Section: Breakdown Of the Separation Of Concernsmentioning
confidence: 99%
“…Unfortunately, worsening memory reliability remains a serious problem for DRAM consumers, especially high-volume consumers for whom even modest chip error rates are significant at scale [142,221]. Although stronger in-DRAM error mitigations are effective against growing error rates [142,224], they introduce new overheads and challenges for consumers. For example, neither on-die ECC nor target row refresh correct all errors, and the remaining errors (e.g., uncorrectable errors) are difficult for consumers to predict and mitigate because their manifestation depends on the particular on-die ECC and/or TRR mechanism used by a given chip [23,32,38,40,41,46,115,152,229,231].…”
Section: Breakdown Of the Separation Of Concernsmentioning
confidence: 99%
“…In addition, available theoretical estimations of device performance are given. These values are roughly of the same order of magnitude as the ones in some commercial memory products, [159][160][161][162][163] and therefore can be instructive in guiding future applications. From the table, we can observe that 2D-material-based memory devices have advantages in some aspects, including a lower write/read voltage and power consumption.…”
Section: D Stacking Architecturementioning
confidence: 74%
“…To exacerbate the problem of identifying a de nitive error model, DRAM manufacturers are starting to incorporate two on-die error-mitigation mechanisms that correct a limited number of errors from within the DRAM chip itself: (1) on-die ECC [28,54,95,[254][255][256][257][258] for improving reliability and yield and (2) target row refresh [100,160,222,239] for partially mitigating the RowHammer vulnerability. Prior works on ECC [27,30,54,95,101,258,259,296,[320][321][322][323][324] and RowHammer [92,100,160,226] show that both on-die ECC and TRR change how errors appear outside of the DRAM chip, thereby changing the DRAM error model seen by the memory controller (and therefore, to the rest of the system). Unfortunately, both mechanisms are opaque to the memory controller and are considered trade secrets that DRAM manufacturers will not ofcially disclose [22,23,92,93,95,226,258,298].…”
Section: Lack Of Transparency In Commodity Drammentioning
confidence: 99%
“…Even under con dentiality, DRAM manufacturers may be unwilling to reveal certain proprietary aspects of their designs (e.g., on-die error correction[258,296], target row refresh[92]) or provide speci cally requested numbers.…”
mentioning
confidence: 99%