2012
DOI: 10.1093/bioinformatics/bts054
|View full text |Cite
|
Sign up to set email alerts
|

Hadoop-BAM: directly manipulating next generation sequencing data in the cloud

Abstract: Summary: Hadoop-BAM is a novel library for the scalable manipulation of aligned next-generation sequencing data in the Hadoop distributed computing framework. It acts as an integration layer between analysis applications and BAM files that are processed using Hadoop. Hadoop-BAM solves the issues related to BAM data access by presenting a convenient API for implementing map and reduce functions that can directly operate on BAM records. It builds on top of the Picard SAM JDK, so tools that rely on the Picard API… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
69
0

Year Published

2013
2013
2020
2020

Publication Types

Select...
5
3
2

Relationship

2
8

Authors

Journals

citations
Cited by 128 publications
(71 citation statements)
references
References 8 publications
0
69
0
Order By: Relevance
“…have implemented REST ful API web services or SOAP to share or integrate data in the form of FTP, HTML, XML, JSON, plain text, or AWK commands [29].Moreover, cloud computing services were offered to handle, analyze, or interpret big datasets through various remote applications/servers. There are many cloud servers such as Cloud BLAST [30], Myrna [31], Cloud Burst [32], Hadoop-BAM [33], GPU-BLAST [34], Hydra [35], Peak Ranger [36],Crossbow [37], etc. were available over cloud for analyzing different types of big datasets [38][39][40][41].…”
Section: Comprehensive Data Integration Methodsmentioning
confidence: 99%
“…have implemented REST ful API web services or SOAP to share or integrate data in the form of FTP, HTML, XML, JSON, plain text, or AWK commands [29].Moreover, cloud computing services were offered to handle, analyze, or interpret big datasets through various remote applications/servers. There are many cloud servers such as Cloud BLAST [30], Myrna [31], Cloud Burst [32], Hadoop-BAM [33], GPU-BLAST [34], Hydra [35], Peak Ranger [36],Crossbow [37], etc. were available over cloud for analyzing different types of big datasets [38][39][40][41].…”
Section: Comprehensive Data Integration Methodsmentioning
confidence: 99%
“…Cloudflow provides a variety of already implemented utilities, which facilitate the creation of pipelines in the field of Bioinformatics (especially for NGS data in Genetics). For that purpose, we implemented, based on HadoopBAM [10], several record types and loader classes in order to process FASTQ, BAM and VCF files. Moreover, we created several operations and filters for the analysis of biological datasets.…”
Section: Pipeline Execution On Sparkmentioning
confidence: 99%
“…For that purpose, we implemented, based on HadoopBAM [7], several record types and loader classes in order to process FASTQ, BAM and VCF files. Moreover, we created several operations and filters for the analysis of biological datasets (see Table I for an overview of all currently implemented operations and filters).…”
Section: Cloudflow For Bioinformaticsmentioning
confidence: 99%