2015
DOI: 10.1109/lca.2015.2430853
|View full text |Cite
|
Sign up to set email alerts
|

Toggle-Aware Compression for GPUs

Abstract: Abstract-Memory bandwidth compression can be an effective way to achieve higher system performance and energy efficiency in modern data-intensive applications by exploiting redundancy in data. Prior works studied various data compression techniques to improve both capacity (e.g., of caches and main memory) and bandwidth utilization (e.g., of the on-chip and off-chip interconnects). These works addressed two common shortcomings of compression: (i) compression/decompression overhead in terms of latency, energy, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
7
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 13 publications
(8 citation statements)
references
References 29 publications
1
7
0
Order By: Relevance
“…Such schemes would need to decide when to compress data depending upon the potential increase in latency compared to the reduction in toggle rate and crosstalk. This is similar to the work by Pekhimenko et al [29], but taking into account the effects of crosstalk and using a model derived from realistic data.…”
Section: Discussionsupporting
confidence: 72%
“…Such schemes would need to decide when to compress data depending upon the potential increase in latency compared to the reduction in toggle rate and crosstalk. This is similar to the work by Pekhimenko et al [29], but taking into account the effects of crosstalk and using a model derived from realistic data.…”
Section: Discussionsupporting
confidence: 72%
“…To enable LCP, ETC employs an additional 512-entry metadata cache inside the memory controller to accelerate compression metadata lookup and thus reduce the performance overhead of the LCP framework. Once the application classification logic determines that the executing application is 1) a regular application with data sharing or 2) an irregular application, ETC begins the capacity compression process by storing all data written to the GPU memory using the base-delta-immediate compression algorithm [73], which is simple to implement and effective [70][71][72][73]98]. Figure 9 shows the design overview of ETC, which consists of Application Classification, Proactive Eviction, Memoryaware Throttling, and memory Capacity Compression.…”
Section: Capacity Compressionmentioning
confidence: 99%
“…Memory Compression in GPUs. Several works study memory and cache compression in GPUs [49,70,71,79,87,99]. These works show benefits due to on-chip and off-chip memory bandwidth savings.…”
Section: Related Workmentioning
confidence: 99%
“…Data compression is a technique that exploits the redundancy in the applications' data to reduce capacity and bandwidth requirements for many modern systems by saving and transmitting data in a more compact form. Hardware-based data compression has been explored in the context of on-chip caches [4,11,25,33,49,87,89,99,118], interconnect [30], and main memory [2,37,88,90,91,104,114] as a means to save storage capacity as well as memory bandwidth. In modern GPUs, memory bandwidth is a key limiter to system performance in many workloads (Section 3).…”
Section: A Case For Caba: Data Compressionmentioning
confidence: 99%
“…Compression. Several prior works [6,11,88,89,90,91,100,104,114] study memory and cache compression with several di erent compression algorithms [4,11,25,49,87,118], in the context of CPUs or GPUs.…”
Section: Related Workmentioning
confidence: 99%