2013
DOI: 10.1093/bib/bbs088
|View full text |Cite
|
Sign up to set email alerts
|

Survey of MapReduce frame operation in bioinformatics

Abstract: Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale data from high-throughput sequencing. The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and reliable computing performance on Linux clusters and on cloud computing services. In this article, we present MapReduce frame-based applications that can be em… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
90
0

Year Published

2013
2013
2020
2020

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 160 publications
(90 citation statements)
references
References 40 publications
0
90
0
Order By: Relevance
“…In previous technique distribution is used because of that the data of the topics varies which determine the drawback of different entities than proposed work which include Compressive Sensing search and distribution algorithm [5].…”
Section: Previous Technique Such As Blast and Othermentioning
confidence: 99%
“…In previous technique distribution is used because of that the data of the topics varies which determine the drawback of different entities than proposed work which include Compressive Sensing search and distribution algorithm [5].…”
Section: Previous Technique Such As Blast and Othermentioning
confidence: 99%
“…MapReduce [70,71], developed by Google, is an easy-to-use and general-purpose parallel programming model that is suitable for large data set analysis on a commodity hardware cluster. MapReduce is a software framework, written in Java, designed to run over a cluster of machines in a distributed way.…”
Section: Most Bioinformatics Tools Are Not Cloud-awarementioning
confidence: 99%
“…Hadoop allows for the distributed processing of large datasets across multiple computer nodes, supports big data scaling, and enables fault-tolerant parallel analysis. The Hadoop framework has been recently deemed as the most suitable method for handling bioinformatics data [70]. Unfortunately, many traditional bioinformatics tools and algorithms have to be redesigned and implemented in order to support and benefit from Hadoop MapReduce infrastructure.…”
Section: Most Bioinformatics Tools Are Not Cloud-awarementioning
confidence: 99%
“…True parallelisation/distribution frameworks can also be achieved by means of MapReduce [19] and its most widely distributed implementation, Hadoop [20]. A promising, new resource is YARN [21], which introduces a generic scheduling abstraction that allows multiple parallelisation/distribution frameworks (for example, Hadoop and MPI) to coexist on the same physical cluster.…”
Section: Related Workmentioning
confidence: 99%