2015
DOI: 10.1145/2764905
|View full text |Cite
|
Sign up to set email alerts
|

Architecting the Last-Level Cache for GPUs using STT-RAM Technology

Abstract: Future GPUs should have larger L2 caches based on the current trends in VLSI technology and GPU architectures toward increase of processing core count. Larger L2 caches inevitably have proportionally larger power consumption. In this article, having investigated the behavior of GPGPU applications, we present an efficient L2 cache architecture for GPUs based on STT-RAM technology. Due to its high-density and low-power characteristics, STT-RAM technology can be utilized in GPUs where numerous cores leave a limit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 12 publications
(8 citation statements)
references
References 47 publications
(44 reference statements)
0
8
0
Order By: Relevance
“…Our ILP model tries to place existing data blocks during different time frames in the proper positions in the hybrid cache. At different time frames, our scheme decides how the data blocks locate in the [7,11,17,19,21,22,24,25,27,28] reducing data migration [12,18,29] using compiler [12,18,24,29] using prediction [11,17,25,27,28] hybrid cache. At each time frame, for each memory block, the number of read and write operations is considered as the problem inputs.…”
Section: Block Placement Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…Our ILP model tries to place existing data blocks during different time frames in the proper positions in the hybrid cache. At different time frames, our scheme decides how the data blocks locate in the [7,11,17,19,21,22,24,25,27,28] reducing data migration [12,18,29] using compiler [12,18,24,29] using prediction [11,17,25,27,28] hybrid cache. At each time frame, for each memory block, the number of read and write operations is considered as the problem inputs.…”
Section: Block Placement Modelmentioning
confidence: 99%
“…EDP cost of normal read/write operations and EDP cost of migration operations. The following equation represents the EDP cost of normal read/write operations during the entire time frames: (see (21)) . The cost of EDP during the entire migration operations is computed as the following equation:…”
Section: Minimising Edpmentioning
confidence: 99%
“…It is important that improving the read/write latency does not come at the cost of increasing write energy. Moreover, write energy relates to cell wearout; that is, increasing the write energy leads to decreasing PCM lifetime [45]. Figure 16 shows the write energy of RWR, WT, RWR+FPC, and WT+FPC methods normalized to 2-bit MLC PCM baseline.…”
Section: Write Energymentioning
confidence: 99%
“…Reducing write energy corresponds to enhancing PCM lifetime [4,45,60]. We evaluated the effect of our proposed scheme and other implemented methods on memory lifetime.…”
Section: Wearoutmentioning
confidence: 99%
“…For NVIDIA Pascal [57], more than 60% of the on-chip storage area, amounting to 14.3 MB is dedicated to the register file. GPU register files face the difficult challenge of optimizing latency, bandwidth, and power consumption, while having maximal capacity [2,19,20,23,25,27,28,39,43,45,46,48,65,66,78,79,80]. Larger register files are slower, take up more silicon area and consume more power.…”
Section: Register File Scalabilitymentioning
confidence: 99%