S.C. Goswami scite author profile

BackgroundLong-read sequencing has shown the promises to overcome the short length limitations of second-generation sequencing by providing more complete assembly. However, the computation of the long sequencing reads is challenged by their higher error rates (e.g., 13% vs. 1%) and higher cost ($0.3 vs. $0.03 per Mbp) compared to the short reads.MethodsIn this paper, we present a new hybrid error correction tool, called ParLECH (Parallel Long-read Error Correction using Hybrid methodology). The error correction algorithm of ParLECH is distributed in nature and efficiently utilizes the k-mer coverage information of high throughput Illumina short-read sequences to rectify the PacBio long-read sequences.ParLECH first constructs a de Bruijn graph from the short reads, and then replaces the indel error regions of the long reads with their corresponding widest path (or maximum min-coverage path) in the short read-based de Bruijn graph. ParLECH then utilizes the k-mer coverage information of the short reads to divide each long read into a sequence of low and high coverage regions, followed by a majority voting to rectify each substituted error base.ResultsParLECH outperforms latest state-of-the-art hybrid error correction methods on real PacBio datasets. Our experimental evaluation results demonstrate that ParLECH can correct large-scale real-world datasets in an accurate and scalable manner. ParLECH can correct the indel errors of human genome PacBio long reads (312 GB) with Illumina short reads (452 GB) in less than 29 h using 128 compute nodes. ParLECH can align more than 92% bases of an E. coli PacBio dataset with the reference genome, proving its accuracy.ConclusionParLECH can scale to over terabytes of sequencing data using hundreds of computing nodes. The proposed hybrid error correction methodology is novel and rectifies both indel and substitution errors present in the original long reads or newly introduced by the short reads.

show abstract

Parthenogenetic reproduction of Diaphanosoma celebensis (Crustacea: Cladocera). Effect of algae and algal density on survival, growth, life span and neonate production

Shrivastava

Mahambre

Achuthankutty

et al. 1999

Marine Biology

View full text Add to dashboard Cite

Towards Distributed Cyberinfrastructure for Smart Cities Using Big Data and Deep Learning Technologies

Shams

Goswami

Lee

et al. 2018

View full text Add to dashboard Cite

Preliminary studies on prawn, Penaeus merguiensis, for selection of broodstock in genetic improvement programs

1986

View full text Add to dashboard Cite

Large-scale parallel genome assembler over cloud computing environment

Das

Koppa

Goswami

et al. 2017

J. Bioinform. Comput. Biol.

View full text Add to dashboard Cite

The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.

show abstract

Formulation of cheaper artificial feeds for shrimp culture: Preliminary biochemical, physical and biological evaluation

Goswami

1979

Aquaculture

View full text Add to dashboard Cite

Karyotypic studies inGarra lamta andMystus cavassius (Pisces)

1980

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

S.C. Goswami

Parthenogenetic reproduction of Diaphanosoma celebensis (Crustacea: Cladocera): influence of salinity on feeding, survival, growth and neonate production

A hybrid and scalable error correction algorithm for indel and substitution errors of long reads

Parthenogenetic reproduction of Diaphanosoma celebensis (Crustacea: Cladocera). Effect of algae and algal density on survival, growth, life span and neonate production

Towards Distributed Cyberinfrastructure for Smart Cities Using Big Data and Deep Learning Technologies

Preliminary studies on prawn, Penaeus merguiensis, for selection of broodstock in genetic improvement programs

Large-scale parallel genome assembler over cloud computing environment

Formulation of cheaper artificial feeds for shrimp culture: Preliminary biochemical, physical and biological evaluation

Karyotypic studies inGarra lamta andMystus cavassius (Pisces)

Contact Info

Product

Resources

About