2008
DOI: 10.1109/dcc.2008.90
|View full text |Cite
|
Sign up to set email alerts
|

Design and Implementation of a High-Performance Microprocessor Cache Compression Algorithm

Abstract: Researchers have proposed using hardware data compression units within the memory hierarchies of microprocessors in order to improve performance, energy efficiency, and functionality. However, most past work, and in particular work on cache compression, has made unsubstantiated assumptions about the performance, power consumption, and area overheads of the required compression hardware. We present a lossless compression algorithm that has been designed for on-line memory hierarchy compression, and cache compre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2009
2009
2019
2019

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 12 publications
0
7
0
Order By: Relevance
“…Chen et al [49] present C-PACK, which is a patternbased partial dictionary match compression algorithm. C-PACK compresses data by both statically and dynamically detecting frequently appearing data words.…”
Section: Compression Algorithmsmentioning
confidence: 99%
See 1 more Smart Citation
“…Chen et al [49] present C-PACK, which is a patternbased partial dictionary match compression algorithm. C-PACK compresses data by both statically and dynamically detecting frequently appearing data words.…”
Section: Compression Algorithmsmentioning
confidence: 99%
“…Due to their diverse data patterns, different workloads benefit from different compression algorithms and hence, their technique allows implementing different algorithms (e.g. [35,40,49]) using different assist warps. Compared to hardwareonly implementations of compression, their approach uses existing underutilized resources and does not require additional dedicated resources.…”
Section: Compression Techniques For Gpusmentioning
confidence: 99%
“…Compression and decompression take about 1-2 and 1 cycles respectively. BΔI is implemented as a simple two-stage scheme (arithmetic and encoding selection) and does not need a dictionary, which would incur storage and maintenance penalties, as shown in [62]. The major drawback to BΔI is its high area overhead due to the use of adders in parallel.…”
Section: Bδi Lossless Compressionmentioning
confidence: 99%
“…There already exists a substantial body of knowledge on compression at the cache, link, and main memory levels. Works on cache compression seek to improve effective cache capacity to reduce the cache miss rate and subsequently reduce the number of main memory accesses [9,43,[61][62][63]. Cache compression/decompression requires extremely low latency operation to prevent caches themselves from becoming a performance bottleneck.…”
Section: Introductionmentioning
confidence: 99%
“…Thus, previous research does not consider the frequency among multiple cache lines, and the algorithms also do not fit to the normal data such as multimedia data. On the other hand, the C-Pack [26] improved the performance of the compression rate better than the one of the BDI. It combines the approach of the FPC and a lookup table mechanism.…”
Section: Related Workmentioning
confidence: 99%