Increasing needs in efficient storage management and better utilization of network bandwidth with less data transfer lead the computing community to consider data compression as a solution. However, compression introduces an extra overhead and performance can suffer. The key elements in making the decision to use compression are execution time and compression ratio. Due to negative performance impact, compression is often neglected.General purpose computing on graphic processing units (GPUs) introduces new opportunities where parallelism is available. Our work targets the use of the opportunities in GPU based systems by exploiting the parallelism in compression algorithms. In this paper we present an implementation of Lempel-Ziv-Storer-Szymanski(LZSS) lossless data compression algorithm by using NVIDIA GPUs Compute Unified Device Architecture (CUDA) Framework. Our implementation of the LZSS algorithm on GPUs significantly improves the performance of the compression process compared to CPU based implementation without any loss in compression ratio which can support GPU based clusters to solve bandwidth problems. Our system outperforms the serial CPU LZSS implementation by up to 18x, the parallel threaded version up to 3x and the BZIP2 program by up to 6x in terms of compression time, showing the promise of CUDA systems in lossless data compression. To give the programmers an easy to use tool, our work also provides an API for in memory compression without the need for reading from and writing to files, in addition to the version involving I/O.