HARP: Practically and Effectively Identifying Uncorrectable Errors in Memory Chips That Use On-Die Error-Correcting Codes

Patel, Minesh; Oliveira, Geraldo F.; Mutlu, Onur

doi:10.1145/3466752.3480061

Cited by 11 publications

(12 citation statements)

References 159 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Fourth, we test DRAM modules that do not implement error correction codes (ECC) [12,21,36,41,67,124]. Doing so ensures that neither on-die [46,104,[113][114][115] nor rank-level [21,67] ECC can alter the RowHammer bit flips we observe and analyze. Fifth, we prevent known on-DRAM-die RowHammer defenses (i.e., TRR [52,55,84,93]) from working by not issuing refresh commands throughout our tests [27,71].…”

Section: Testing Methodologymentioning

confidence: 99%

“…15 and 16 on spatial variation of HC first across subarrays can be leveraged to reduce the time required to profile a given DRAM module's RowHammer vulnerability characteristics. This is an important challenge because profiling a DRAM module's RowHammer characteristics requires analyzing several environmental conditions and attack properties (e.g., data pattern, access pattern, and temperature), requiring time-consuming tests that lead to long profiling times [20,27,71,72,78,110,111,113,166]. According to our Obsvs.…”

Section: Potential Defense Improvementsmentioning

confidence: 99%

See 1 more Smart Citation

A Deeper Look into RowHammer’s Sensitivities: Experimental Analysis of Real DRAM Chips and Implications on Future Attacks and Defenses

Orosa

Yağlıkçı

Luo

et al. 2021

MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

Self Cite

View full text Add to dashboard Cite

RowHammer is a circuit-level DRAM vulnerability where repeatedly accessing (i.e., hammering) a DRAM row can cause bit flips in physically nearby rows. The RowHammer vulnerability worsens as DRAM cell size and cell-to-cell spacing shrink. Recent studies demonstrate that modern DRAM chips, including chips previously marketed as RowHammer-safe, are even more vulnerable to RowHammer than older chips such that the required hammer count to cause a bit flip has reduced by more than 10X in the last decade. Therefore, it is essential to develop a better understanding and in-depth insights into the RowHammer vulnerability of modern DRAM chips to more effectively secure current and future systems.Our goal in this paper is to provide insights into fundamental properties of the RowHammer vulnerability that are not yet rigorously studied by prior works, but can potentially be 𝑖) exploited to develop more effective RowHammer attacks or 𝑖𝑖) leveraged to design more effective and efficient defense mechanisms. To this end, we present an experimental characterization using 248 DDR4 and 24 DDR3 modern DRAM chips from four major DRAM manufacturers demonstrating how the RowHammer effects vary with three fundamental properties: 1) DRAM chip temperature, 2) aggressor row active time, and 3) victim DRAM cell's physical location. Among our 16 new observations, we highlight that a RowHammer bit flip 1) is very likely to occur in a bounded range, specific to each DRAM cell (e.g., 5.4% of the vulnerable DRAM cells exhibit errors in the range 70 °C to 90 °C), 2) is more likely to occur if the aggressor row is active for longer time (e.g., RowHammer vulnerability increases by 36% if we keep a DRAM row active for 15 column accesses), and 3) is more likely to occur in certain physical regions of the DRAM module under attack (e.g., 5% of the rows are 2x more vulnerable than the remaining 95% of the rows). Our study has important practical implications on future RowHammer attacks and defenses. We describe and analyze the implications of our new findings by proposing three future RowHammer attack and six future RowHammer defense improvements.

show abstract

Section: Testing Methodologymentioning

confidence: 99%

Section: Potential Defense Improvementsmentioning

confidence: 99%

A Deeper Look into RowHammer’s Sensitivities: Experimental Analysis of Real DRAM Chips and Implications on Future Attacks and Defenses

Orosa

Yağlıkçı

Luo

et al. 2021

MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

Self Cite

View full text Add to dashboard Cite

show abstract

“…To exacerbate the problem of identifying a de nitive error model, DRAM manufacturers are starting to incorporate two on-die error-mitigation mechanisms that correct a limited number of errors from within the DRAM chip itself: (1) on-die ECC [28,54,95,[254][255][256][257][258] for improving reliability and yield and (2) target row refresh [100,160,222,239] for partially mitigating the RowHammer vulnerability. Prior works on ECC [27,30,54,95,101,258,259,296,[320][321][322][323][324] and RowHammer [92,100,160,226] show that both on-die ECC and TRR change how errors appear outside of the DRAM chip, thereby changing the DRAM error model seen by the memory controller (and therefore, to the rest of the system). Unfortunately, both mechanisms are opaque to the memory controller and are considered trade secrets that DRAM manufacturers will not ofcially disclose [22,23,92,93,95,226,258,298].…”

Section: Lack Of Transparency In Commodity Drammentioning

confidence: 99%

“…Prior works propose two practical ways of identifying retentionweak cells: (1) active pro ling, which uses comprehensive tests to search for error-prone cells o ine [77-79, 127, 129, 135], and (2) reactive pro ling, which constantly monitors memory to identify errors as they manifest during runtime, e.g., ECC scrubbing [56,61,82]. Both approaches require the pro ler to understand the worst-case behavior of data-retention errors for a given DRAM chip [79,127]: an active pro ler must use the worst-case conditions to maximize the proportion of retention-weak cells it identi es during pro ling [78] and a reactive pro ler must be provisioned to identify (and possibly also mitigate) the worst-case error pa ern(s) that might be observed at runtime, e.g., to choose an appropriate ECC detection and correction capability [127,226,324].…”

Section: Lack Of Transparency In Commodity Drammentioning

confidence: 99%

A Case for Transparent Reliability in DRAM Systems

Patel¹,

Shahroodi²,

Manglik³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Mass-produced commodity DRAM is the preferred choice of main memory for a broad range of computing systems due to its favorable cost-per-bit. However, today's systems have diverse system-speci c needs (e.g., performance, energy, reliability) that are di cult to address using one-size-ts-all generalpurpose DRAM. Unfortunately, although system designers can theoretically adapt commodity DRAM chips to meet their particular design goals (e.g., by exploiting slack in access timings to improve performance, or implementing system-level RowHammer mitigations), we observe that designers today lack the necessary insight into commodity DRAM chips' reliability characteristics to implement these techniques in practice. In this work, we make a case for DRAM manufacturers to provide increased transparency into simple device characteristics (e.g., internal row address mapping, cell array organization) that a ect consumer-visible reliability. Doing so has negligible impact on manufacturers given that these characteristics can be reverse-engineered using known techniques; however, it has signi cant bene t for system designers, who can then make informed decisions to be er adapt commodity DRAM to meet modern systems' needs while preserving its cost advantages.To support our argument, we study four ways that system designers can adapt commodity DRAM chips to system-speci c design goals: (1) improving DRAM reliability; (2) reducing DRAM refresh overheads; (3) reducing DRAM access latency; and (4) defending against RowHammer a acks. We observe that adopting solutions for any of the four goals requires system designers to make assumptions about a DRAM chip's reliability characteristics. ese assumptions discourage system designers from using such solutions in practice due to the di culty of both making and relying upon the assumption.We identify DRAM standards as the root of the problem: current standards rigidly enforce a xed operating point with no speci cations for how a system designer might explore alternative operating points. To overcome this problem, we introduce a two-step approach that reevaluates DRAM standards with a focus on transparency of reliability characteristics so that system designers are encouraged to make the most of commodity DRAM technology for both current and future DRAM chips.

show abstract

“…Further, we find that 1) over 99.9% of the DRAM rows are vulnerable (i.e., have at least one bit flip) to the new access patterns and 2) the new access patterns cause up to 9.4 million bit flips per DRAM bank. The large number of RowHammer bit flips caused by our specialized access patterns has significant implications for systems protected by Error Correction Codes (ECC) [47,92,93,95]. Our analysis shows that the U-TRR-discovered access patterns can cause up to 7 bit flips at arbitrary locations in one 8-byte dataword, suggesting that typical ECC schemes capable of correcting one error/symbol and detecting two errors/symbols (e.g., SECDED ECC [10,37,43,60,61,79,87,118] and Chipkill [2,20,86]) cannot provide sufficient protection against RowHammer even in the presence of TRR mechanisms.…”

Section: Introductionmentioning

confidence: 99%

Uncovering In-DRAM RowHammer Protection Mechanisms:A New Methodology, Custom RowHammer Patterns, and Implications

Hassan

Can

Kim

et al. 2021

MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

Self Cite

View full text Add to dashboard Cite

The RowHammer vulnerability in DRAM is a critical threat to system security. To protect against RowHammer, vendors commit to security-through-obscurity: modern DRAM chips rely on undocumented, proprietary, on-die mitigations, commonly known as Target Row Refresh (TRR). At a high level, TRR detects and refreshes potential RowHammer-victim rows, but its exact implementations are not openly disclosed. Security guarantees of TRR mechanisms cannot be easily studied due to their proprietary nature.To assess the security guarantees of recent DRAM chips, we present Uncovering TRR (U-TRR), an experimental methodology to analyze in-DRAM TRR implementations. U-TRR is based on the new observation that data retention failures in DRAM enable a side channel that leaks information on how TRR refreshes potential victim rows. U-TRR allows us to (i) understand how logical DRAM rows are laid out physically in silicon; (ii) study undocumented on-die TRR mechanisms; and (iii) combine (i) and (ii) to evaluate the RowHammer security guarantees of modern DRAM chips. We show how U-TRR allows us to craft RowHammer access patterns that successfully circumvent the TRR mechanisms employed in 45 DRAM modules of the three major DRAM vendors. We find that the DRAM modules we analyze are vulnerable to RowHammer, having bit flips in up to 99.9% of all DRAM rows. CCS Concepts• Hardware → Dynamic memory; • Security and privacy → Hardware reverse engineering.

show abstract

HARP: Practically and Effectively Identifying Uncorrectable Errors in Memory Chips That Use On-Die Error-Correcting Codes

Cited by 11 publications

References 159 publications

A Deeper Look into RowHammer’s Sensitivities: Experimental Analysis of Real DRAM Chips and Implications on Future Attacks and Defenses

A Deeper Look into RowHammer’s Sensitivities: Experimental Analysis of Real DRAM Chips and Implications on Future Attacks and Defenses

A Case for Transparent Reliability in DRAM Systems

Uncovering In-DRAM RowHammer Protection Mechanisms:A New Methodology, Custom RowHammer Patterns, and Implications

Contact Info

Product

Resources

About