Survey of MapReduce frame operation in bioinformatics

Zou, Quan; Li, Xubin; Wenrui, Jiang; Lin, Zi-Yu; Li, Guilin; Chen, Ke

doi:10.1093/bib/bbs088

Cited by 160 publications

(90 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In previous technique distribution is used because of that the data of the topics varies which determine the drawback of different entities than proposed work which include Compressive Sensing search and distribution algorithm [5].…”

Section: Previous Technique Such As Blast and Othermentioning

confidence: 99%

A Hybrid Approach for Sequence Alignment over Genome Data using Compressive Sensing and HBLAST

Gupta¹,

Pandey²,

Deen³

et al. 2018

IJCA

View full text Add to dashboard Cite

Medical data is an exponential growth in all the hospitality service area. Genome is an special type of data which deals with the small unit of medical cells. Various matching operation over the genome data is required because of some medical issues arise in various cases. DNA matching, sequence matching, pattern analysis and matching is so called requirement in this area. There are some techniques such as BLAST, HBLAST, RMAP is involved and performed by past researcher. The technique use pre-processing and other filteration , sequence finding is performed. Past approach finds limitation where the large data processing, sequence detection and combine score generation for overall data processing is not performed. In this paper proposed approach is given which work towards the enhancement of previous approach extended with compressive sensing usage for prefetching of data and its filteration. It make use of compressive sensing with which a noise removal, filtering process is executed and thus a refined data is observed for Hadoop processing Mapping approach. Our proposed technique executed with different data set of sequence, count of data present in millions and it gives an effective results while comparing with existing scenario. A further implementation on security usage can performed by us.

show abstract

Section: Previous Technique Such As Blast and Othermentioning

confidence: 99%

A Hybrid Approach for Sequence Alignment over Genome Data using Compressive Sensing and HBLAST

Gupta¹,

Pandey²,

Deen³

et al. 2018

IJCA

View full text Add to dashboard Cite

show abstract

“…MapReduce [70,71], developed by Google, is an easy-to-use and general-purpose parallel programming model that is suitable for large data set analysis on a commodity hardware cluster. MapReduce is a software framework, written in Java, designed to run over a cluster of machines in a distributed way.…”

Section: Most Bioinformatics Tools Are Not Cloud-awarementioning

confidence: 99%

“…Hadoop allows for the distributed processing of large datasets across multiple computer nodes, supports big data scaling, and enables fault-tolerant parallel analysis. The Hadoop framework has been recently deemed as the most suitable method for handling bioinformatics data [70]. Unfortunately, many traditional bioinformatics tools and algorithms have to be redesigned and implemented in order to support and benefit from Hadoop MapReduce infrastructure.…”

Section: Most Bioinformatics Tools Are Not Cloud-awarementioning

confidence: 99%

Cloud Computing for Next-Generation Sequencing Data Analysis

Zhao¹,

Watrous²,

Zhang³

et al. 2017

Cloud Computing - Architecture and Applications

View full text Add to dashboard Cite

High-throughput next-generation sequencing (NGS) technologies have evolved rapidly and are reshaping the scope of genomics research. The substantial decrease in the cost of NGS techniques in the past decade has led to its rapid adoption in biological research and drug development. Genomics studies of large populations are producing a huge amount of data, giving rise to computational issues around the storage, transfer, and analysis of the data. Fortunately, cloud computing has recently emerged as a viable option to quickly and easily acquire the computational resources for large-scale NGS data analyses. Some cloud-based applications and resources have been developed specifically to address the computational challenges of working with very large volumes of data generated by NGS technology. In this chapter, we will review some cloud-based systems and solutions for NGS data analysis, discuss the practical hurdles and limitations in cloud computing, including data transfer and security, and share the lessons we learned from the implementation of Rainbow, a cloud-based tool for large-scale genome sequencing data analysis.

show abstract

“…True parallelisation/distribution frameworks can also be achieved by means of MapReduce [19] and its most widely distributed implementation, Hadoop [20]. A promising, new resource is YARN [21], which introduces a generic scheduling abstraction that allows multiple parallelisation/distribution frameworks (for example, Hadoop and MPI) to coexist on the same physical cluster.…”

Section: Related Workmentioning

confidence: 99%

SCBI_MapReduce, a New Ruby Task-Farm Skeleton for Automated Parallelisation and Distribution in Chunks of Sequences: The Implementation of a Boosted Blast+

Guerrero-Fernández

Falgueras

Claros

2013

Computational Biology Journal

View full text Add to dashboard Cite

Current genomic analyses often require the managing and comparison of big data using desktop bioinformatic software that was not developed regarding multicore distribution. The task-farm SCBI MapReduce is intended to simplify the trivial parallelisation and distribution of new and legacy software and scripts for biologists who are interested in using computers but are not skilled programmers. In the case of legacy applications, there is no need of modification or rewriting the source code. It can be used from multicore workstations to heterogeneous grids. Tests have demonstrated that speed-up scales almost linearly and that distribution in small chunks increases it. It is also shown that SCBI MapReduce takes advantage of shared storage when necessary, is faulttolerant, allows for resuming aborted jobs, does not need special hardware or virtual machine support, and provides the same results than a parallelised, legacy software. The same is true for interrupted and relaunched jobs. As proof-of-concept, distribution of a compiled version of Blast+ in the SCBI Distributed Blast gem is given, indicating that other blast binaries can be used while maintaining the same SCBI Distributed Blast code. Therefore, SCBI MapReduce suits most parallelisation and distribution needs in, for example, gene and genome studies.

show abstract

Survey of MapReduce frame operation in bioinformatics

Cited by 160 publications

References 40 publications

A Hybrid Approach for Sequence Alignment over Genome Data using Compressive Sensing and HBLAST

A Hybrid Approach for Sequence Alignment over Genome Data using Compressive Sensing and HBLAST

Cloud Computing for Next-Generation Sequencing Data Analysis

SCBI_MapReduce, a New Ruby Task-Farm Skeleton for Automated Parallelisation and Distribution in Chunks of Sequences: The Implementation of a Boosted Blast+

Contact Info

Product

Resources

About