“…To combat DRAM-related failures, system designers typically incorporate reliability, availability and serviceability (RAS) features [153][154][155] that collectively improve system reliability beyond what commodity DRAM chips can provide alone. In general, memory RAS is a broad research area with solutions spanning the hardware-so ware stack, ranging from hardware-based mechanisms within the DRAM chip (e.g., on-die ECC scrubbing [11,101,156], postpackage repair [10,11,[157][158][159], target row refresh [100,160]), memory controller (e.g., rank-level ECC [48-55, 57-60, 81], rank-level ECC scrubbing [56, 61, 62, 62-65, 82, 156, 161], repair techniques [22,79,[162][163][164][165][166][167][168][169]) to so ware-only solutions (e.g., page retirement [76,[120][121][122][123][124], failure prediction [170][171][172][173][174][175]).…”