Parallel Variable-Length Encoding on GPGPUs

Balevic, Ana

doi:10.1007/978-3-642-14122-5_6

Cited by 23 publications

(18 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These floating-point compressors aims for high performance data process for certain speed rates. On a prior work Balevic proposes a data parallel algorithm for variable length encoding with the new availability of atomic operations on GPGPUs [26]. This approach brings a 35x to 50x speed up over serial implementation.…”

Section: B Gpu Basedmentioning

confidence: 98%

CULZSS: LZSS Lossless Data Compression on CUDA

Özsoy

Swany

2011

2011 IEEE International Conference on Cluster Computing

View full text Add to dashboard Cite

Increasing needs in efficient storage management and better utilization of network bandwidth with less data transfer lead the computing community to consider data compression as a solution. However, compression introduces an extra overhead and performance can suffer. The key elements in making the decision to use compression are execution time and compression ratio. Due to negative performance impact, compression is often neglected.General purpose computing on graphic processing units (GPUs) introduces new opportunities where parallelism is available. Our work targets the use of the opportunities in GPU based systems by exploiting the parallelism in compression algorithms. In this paper we present an implementation of Lempel-Ziv-Storer-Szymanski(LZSS) lossless data compression algorithm by using NVIDIA GPUs Compute Unified Device Architecture (CUDA) Framework. Our implementation of the LZSS algorithm on GPUs significantly improves the performance of the compression process compared to CPU based implementation without any loss in compression ratio which can support GPU based clusters to solve bandwidth problems. Our system outperforms the serial CPU LZSS implementation by up to 18x, the parallel threaded version up to 3x and the BZIP2 program by up to 6x in terms of compression time, showing the promise of CUDA systems in lossless data compression. To give the programmers an easy to use tool, our work also provides an API for in memory compression without the need for reading from and writing to files, in addition to the version involving I/O.

show abstract

Section: B Gpu Basedmentioning

confidence: 98%

CULZSS: LZSS Lossless Data Compression on CUDA

Özsoy

Swany

2011

2011 IEEE International Conference on Cluster Computing

View full text Add to dashboard Cite

show abstract

“…9. It is similar to the one proposed by Balevic [Bal09], but uses the data-dependent Huffman table constructed in the previous step. Lengths of codewords corresponding to input symbols (green array) are written into an auxiliary buffer.…”

Section: Encodermentioning

confidence: 99%

Interactive Editing of GigaSample Terrain Fields

Treib

Reichl

Auer

et al. 2012

Computer Graphics Forum

View full text Add to dashboard Cite

Figure 1: A terrain field of over 300 gigasamples (left). Direct editing using a paint and displacement brush (right) and simultaneous rendering of the resulting changes is performed at 60 fps on a 1920×1080 viewport using our approach. AbstractPrevious terrain rendering approaches have addressed the aspect of data compression and fast decoding for rendering, but applications where the terrain is repeatedly modified and needs to be buffered on disk have not been considered so far. Such applications require both decoding and encoding to be faster than disk transfer. We present a novel approach for editing gigasample terrain fields at interactive rates and high quality. To achieve high decoding and encoding throughput, we employ a compression scheme for height and pixel maps based on a sparse wavelet representation. On recent GPUs it can encode and decode up to 270 and 730 MPix/s of color data, respectively, at compression rates and quality superior to JPEG, and it achieves more than twice these rates for lossless height field compression. The construction and rendering of a height field triangulation is avoided by using GPU ray-casting directly on the regular grid underlying the compression scheme. We show the efficiency of our method for interactive editing and continuous level-of-detail rendering of terrain fields comprised of several hundreds of gigasamples.

show abstract

“…During a parallel implementation where separate CUDA threads are utilized to encode each symbol of the input stream, it is not clear where the thread should write the corresponding codeword in the final encoded stream. We can compute the bit-offsets for each input symbol and have different threads write the corresponding codewords to the appropriate positions in the encoded bit stream as done in [15]. But then, all threads have to be synchronized properly as many of them would need to access the same memory location.…”

Section: Introductionmentioning

confidence: 99%

“…In [15], authors present a data parallel algorithm for variable length encoding and achieve high speedups by making limitations on the codeword length. Specifically, the author assumes that the total codeword length for four consecutive symbols of the input stream cannot exceed 32-bits, which severely limits the algorithm's usage for data having high entropies.…”

Section: Introductionmentioning

confidence: 99%

A parallel Huffman coder on the CUDA architecture

Rahmani¹,

Topal

Akınlar³

2014

2014 IEEE Visual Communications and Image Processing Conference

View full text Add to dashboard Cite

We present a parallel implementation of the widely-used entropy encoding algorithm, the Huffman coder, on the NVIDIA CUDA architecture. After constructing the Huffman codeword tree serially, we proceed in parallel by generating a byte stream where each byte represents a single bit of the compressed output stream. The final step is then to combine each consecutive 8 bytes into a single byte in parallel to generate the final compressed output bit stream. Experimental results show that we can achieve up to 22x speedups compared to the serial CPU implementation without any constraint on the maximum codeword length or data entropy.

show abstract

Parallel Variable-Length Encoding on GPGPUs

Cited by 23 publications

References 5 publications

CULZSS: LZSS Lossless Data Compression on CUDA

CULZSS: LZSS Lossless Data Compression on CUDA

Interactive Editing of GigaSample Terrain Fields

A parallel Huffman coder on the CUDA architecture

Contact Info

Product

Resources

About