2012
DOI: 10.1186/1471-2105-13-324
|View full text |Cite
|
Sign up to set email alerts
|

Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework

Abstract: BackgroundFor shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed.ResultsWe present a sequence database search engine that … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 50 publications
(21 citation statements)
references
References 16 publications
0
21
0
Order By: Relevance
“…have implemented REST ful API web services or SOAP to share or integrate data in the form of FTP, HTML, XML, JSON, plain text, or AWK commands [29].Moreover, cloud computing services were offered to handle, analyze, or interpret big datasets through various remote applications/servers. There are many cloud servers such as Cloud BLAST [30], Myrna [31], Cloud Burst [32], Hadoop-BAM [33], GPU-BLAST [34], Hydra [35], Peak Ranger [36],Crossbow [37], etc. were available over cloud for analyzing different types of big datasets [38][39][40][41].…”
Section: Comprehensive Data Integration Methodsmentioning
confidence: 99%
“…have implemented REST ful API web services or SOAP to share or integrate data in the form of FTP, HTML, XML, JSON, plain text, or AWK commands [29].Moreover, cloud computing services were offered to handle, analyze, or interpret big datasets through various remote applications/servers. There are many cloud servers such as Cloud BLAST [30], Myrna [31], Cloud Burst [32], Hadoop-BAM [33], GPU-BLAST [34], Hydra [35], Peak Ranger [36],Crossbow [37], etc. were available over cloud for analyzing different types of big datasets [38][39][40][41].…”
Section: Comprehensive Data Integration Methodsmentioning
confidence: 99%
“…Big data challenge was observed and solved in various works devoted to intelligent transport and smart cities [11,19,42,43,74,75,84], water monitoring [12,22,90], social networks analysis [13,14,77], multimedia processing [72,82], internet of things (IoT) [9], social media monitoring [50], Life sciences [3,31,32,44,58,69] and disease data analysis [6,45,81], telecommunication [27], and finance [2], to mention just a few. Many hot issues in various sub-fields of bioinformatics were also solved with the use of Big Data ecosystems and Cloud computing, e.g., mapping nextgeneration sequence data to the human genome and other reference genomes, for use in a variety of biological analyzes including SNP discovery, genotyping and personal genomics [65], sequence analysis and assembly [17,30,34,35,47,62], multiple alignments of DNA and RNA sequences [86,91], codon analysis with local MapReduce aggregations [63], NGS data analysis [8], phylogeny [24,48], proteomics [37], analysis of proteinligand binding sites…”
Section: Related Workmentioning
confidence: 99%
“…It is a MapReduce framework based on K-score algorithm, used to handle large amount of peptide sequences, their modifications and the spectra generated by mass spectroscopy. The software was scalable in handling datasets, as clusters associated with numerous processors (Lewis et al, 2012).…”
Section: Technologiesmentioning
confidence: 99%