Sandino Vargas Perez scite author profile

Sandino Vargas Perez

2Publications

4Citation Statements Received

43Citation Statements Given

How they've been cited

How they cite others

Affiliations

Western Michigan University

Publications

Order By: Most citations

A Parallel Algorithm for Compression of Big Next-Generation Sequencing Datasets

Perez¹,

Saeed²

2015

View full text Add to dashboard Cite

With the advent of high-throughput next-generation sequencing (NGS) techniques, the amount of data being generated represents challenges including storage, analysis and transport of huge datasets. One solution to storage and transmission of data is compression using specialized compression algorithms. However, these specialized algorithms suffer from poor scalability with increasing size of the datasets and best available solutions can take hours to compress Gigabytes of data. In this paper we introduce paraDSRC, a parallel implementation of DSRC using a message passing model that presents reduction of the compression time complexity by a factor of O(1 p). Our experimental results show that paraDSRC achieves compression times that are 43% to 99% faster than DSRC and compression throughputs of up to 8.4GB/s on a moderate size cluster. For many of the datasets used in our experiments super-linear speedups have been registered, making the implementation strongly scalable. We also show that paraDSRC is more than 25.6x faster than comparable parallel compression algorithms. The code will be available in author's website if paper is accepted.

show abstract

Scalable data structure to compress next-generation sequencing files and its application to compressive genomics

Perez

Saeed

2017

View full text Add to dashboard Cite

It is now possible to compress and decompress large-scale Next-Generation Sequencing files taking advantage of high-performance computing techniques. To this end, we have recently introduced a scalable hybrid parallel algorithm, called phyN-GSC, which allows fast compression as well as decompression of big FASTQ datasets using distributed and shared memory programming models via MPI and OpenMP. In this paper we present the design and implementation of a novel parallel data structure which lessens the dependency on decompression and facilitates the handling of DNA sequences in their compressed state using fine-grained decompression in a technique that is identified as in compresso data processing. Using our data structure compression and decompression throughputs of up to 8.71 GB/s and 10.12 GB/s were observed. Our proposed structure and methodology brings us one step closer to compressive genomics and sublinear analysis of big NGS datasets. The code for this implementation is

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.