2012 39th Annual International Symposium on Computer Architecture (ISCA) 2012
DOI: 10.1109/isca.2012.6237032
|View full text |Cite
|
Sign up to set email alerts
|

A case for exploiting subarray-level parallelism (SALP) in DRAM

Abstract: Modern DRAMs have multiple banks to serve multiple memory requests in parallel. However, when two requests go to the same bank, they have to be served serially, exacerbating the high latency of off-chip memory. Adding more banks to the system to mitigate this problem incurs high system cost. Our goal in this work is to achieve the benefits of increasing the number of banks with a low cost approach. To this end, we propose three new mechanisms that overlap the latencies of different requests that go to the same… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
110
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
5
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 195 publications
(111 citation statements)
references
References 42 publications
1
110
0
Order By: Relevance
“…This allows the row's data (in the form of charge) to be transferred into the row-buffer shown in Figure 1a. Better known as sense-amplifiers, the row-buffer reads out the charge from the cells -a process that destroys the data in [38,41,43]. Subsequently, all accesses to the row are served by the row-buffer on behalf of the row.…”
Section: Low-level Organizationmentioning
confidence: 99%
See 1 more Smart Citation
“…This allows the row's data (in the form of charge) to be transferred into the row-buffer shown in Figure 1a. Better known as sense-amplifiers, the row-buffer reads out the charge from the cells -a process that destroys the data in [38,41,43]. Subsequently, all accesses to the row are served by the row-buffer on behalf of the row.…”
Section: Low-level Organizationmentioning
confidence: 99%
“…Using a cycle-accurate DRAM simulator, we evaluate PARA's performance impact on 29 single-threaded workloads from SPEC CPU2006, TPC, and memory-intensive microbenchmarks (We assume a reasonable system setup [41] with a 4GHz out-of-order core and dual-channel DDR3-1600.) Due to re-mapping, we conservatively assume that a row can have up to ten different rows as neighbors, not just two.…”
Section: Seventh Solution: Paramentioning
confidence: 99%
“…Commodity DDR3 (2007) [14]; DDR4 (2012) [18] Low-Power LPDDR3 (2012) [17]; LPDDR4 (2014) [20] Graphics GDDR5 (2009) [15] Performance eDRAM [28], [32]; RLDRAM3 (2011) [29] 3D-Stacked WIO (2011) [16]; WIO2 (2014) [21]; MCDRAM (2015) [13]; HBM (2013) [19]; HMC1.0 (2013) [10]; HMC1.1 (2014) [11] Academic SBA/SSA (2010) [38]; Staged Reads (2012) [8]; RAIDR (2012) [27]; SALP (2012) [24]; TL-DRAM (2013) [26]; RowClone (2013) [37]; Half-DRAM (2014) [39]; Row-Buffer Decoupling (2014) [33]; SARP (2014) [6]; AL-DRAM (2015) [25] At the forefront of such innovations should be DRAM simulators, the software tool with which to evaluate the strengths and weaknesses of each new proposal. However, DRAM simulators have been lagging behind the rapid-fire changes to DRAM.…”
Section: Segment Dram Standards and Architecturesmentioning
confidence: 99%
“…As listed in Table 1, some were evolutionary upgrades to existing standards (e.g., DDR4, LPDDR4), while some were pioneering implementations of die-stacking (e.g., WIO, HMC, HBM), and still others were academic research projects in experimental stages (e.g., Udipi et al [38], Kim et al [24]). …”
Section: Introductionmentioning
confidence: 99%
“…DRAM represents each bit of memory using a single transistor and capacitor, organizing these memory cells in in two-dimensional arrays (banks) to amortize control overheads. Each bank is sub-divided into 512 × 512 cell subarrays and all data within neighboring subarrays are connected to one or more neighboring data pins for efficiency [13,14,15,16].…”
Section: Drammentioning
confidence: 99%