Design and Implementation of a High-Performance Microprocessor Cache Compression Algorithm

Chen, Xi; Yang, Lei; Lekatsas, H.; Dick, Robert P.; Shang, Li

doi:10.1109/dcc.2008.90

Cited by 7 publications

(7 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Chen et al [49] present C-PACK, which is a patternbased partial dictionary match compression algorithm. C-PACK compresses data by both statically and dynamically detecting frequently appearing data words.…”

Section: Compression Algorithmsmentioning

confidence: 99%

“…Due to their diverse data patterns, different workloads benefit from different compression algorithms and hence, their technique allows implementing different algorithms (e.g. [35,40,49]) using different assist warps. Compared to hardwareonly implementations of compression, their approach uses existing underutilized resources and does not require additional dedicated resources.…”

Section: Compression Techniques For Gpusmentioning

confidence: 99%

See 1 more Smart Citation

A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems

Mittal

Vetter

2016

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

As the number of cores on a chip increase and key applications become even more data-intensive, memory systems in modern processors have to deal with increasingly large amount of data. In face of such challenges, data compression presents as a promising approach to increase effective memory system capacity and also provide performance and energy advantages. This paper presents a survey of techniques for using compression in cache and main memory systems. It also classifies the techniques based on key parameters to highlight their similarities and differences. It discusses compression in CPUs and GPUs, conventional and non-volatile memory (NVM) systems, and 2D and 3D memory systems. We hope that this survey will help the researchers in gaining insight into the potential role of compression approach in memory components of future extreme-scale systems.

show abstract

Section: Compression Algorithmsmentioning

confidence: 99%

Section: Compression Techniques For Gpusmentioning

confidence: 99%

A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems

Mittal

Vetter

2016

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

show abstract

“…Compression and decompression take about 1-2 and 1 cycles respectively. BΔI is implemented as a simple two-stage scheme (arithmetic and encoding selection) and does not need a dictionary, which would incur storage and maintenance penalties, as shown in [62]. The major drawback to BΔI is its high area overhead due to the use of adders in parallel.…”

Section: Bδi Lossless Compressionmentioning

confidence: 99%

“…There already exists a substantial body of knowledge on compression at the cache, link, and main memory levels. Works on cache compression seek to improve effective cache capacity to reduce the cache miss rate and subsequently reduce the number of main memory accesses [9,43,[61][62][63]. Cache compression/decompression requires extremely low latency operation to prevent caches themselves from becoming a performance bottleneck.…”

Section: Introductionmentioning

confidence: 99%

Enabling Approximate Storage through Lossy Media Data Compression

Worek

Ampadu

2019

Proceedings of the 2019 Great Lakes Symposium on VLSI

View full text Add to dashboard Cite

Memory capacity, bandwidth, and energy all continue to present hurdles in the quest for efficient, high-speed computing. Recognition, mining, and synthesis (RMS) applications in particular are limited by the efficiency of the memory subsystem due to their large datasets and need to frequently access memory. RMS applications, such as those in machine learning, deliver intelligent analysis and decision making through their ability to learn, identify, and create complex data models. To meet growing demand for RMS application deployment in battery constrained devices, such as mobile and Internet-of-Things, designers will need novel techniques to improve system energy consumption and performance. Fortunately, many RMS applications demonstrate inherent error resilience, a property that allows them to produce acceptable outputs even when data used in computation contain errors. Approximate storage techniques across circuits, architectures, and algorithms exploit this property to improve the energy consumption and performance of the memory subsystem through quality-energy scaling. This thesis reviews state of the art techniques in approximate storage and presents our own contribution that uses lossy compression to reduce the storage cost of media data. I would first like to thank my research advisor, Dr. Paul Ampadu, for his guidance and support. He has taught me so much about research and communication and has been instrumental to my success in writing this thesis.

show abstract

“…Thus, previous research does not consider the frequency among multiple cache lines, and the algorithms also do not fit to the normal data such as multimedia data. On the other hand, the C-Pack [26] improved the performance of the compression rate better than the one of the BDI. It combines the approach of the FPC and a lookup table mechanism.…”

Section: Related Workmentioning

confidence: 99%

Lazy Management for Frequency Table on Hardware-Based Stream Lossless Data Compression

et al. 2016

View full text Add to dashboard Cite

Abstract:The demand for communicating large amounts of data in real-time has raised new challenges with implementing high-speed communication paths for high definition video and sensory data. It requires the implementation of high speed data paths based on hardware. Implementation difficulties have to be addressed by applying new techniques based on data-oriented algorithms. This paper focuses on a solution for this problem by applying a lossless data compression mechanism on the communication data path. The new lossless data compression mechanism, called LCA-DLT, provides dynamic histogram management for symbol lookup tables used in the compression and the decompression operations. When the histogram memory is fully used, the management algorithm needs to find the least used entries and invalidate these entries. The invalidation operations cause the blocking of the compression and the decompression data stream. This paper proposes novel techniques to eliminate blocking by introducing a dynamic invalidation mechanism, which allows achievement of a high throughput data compression.

show abstract

Design and Implementation of a High-Performance Microprocessor Cache Compression Algorithm

Cited by 7 publications

References 12 publications

A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems

A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems

Enabling Approximate Storage through Lossy Media Data Compression

Lazy Management for Frequency Table on Hardware-Based Stream Lossless Data Compression

Contact Info

Product

Resources

About