2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis 2008
DOI: 10.1109/sc.2008.5214891
|View full text |Cite
|
Sign up to set email alerts
|

An efficient parallel approach for identifying protein families in large-scale metagenomic data sets

Abstract: Abstract-Metagenomics is the study of environmental microbial communities using state-of-the-art genomic tools. Recent advancements in high-throughput technologies have enabled the accumulation of large volumes of metagenomic data that was until a couple of years back was deemed impractical for generation. A primary bottleneck, however, is in the lack of scalable algorithms and open source software for largescale data processing. In this paper, we present the design and implementation of a novel parallel appro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
28
0

Year Published

2010
2010
2014
2014

Publication Types

Select...
6
1

Relationship

3
4

Authors

Journals

citations
Cited by 13 publications
(28 citation statements)
references
References 29 publications
0
28
0
Order By: Relevance
“…The experimental results on multiple platforms demonstrate that our integrated scheduling approach allows large-scale sequence search to efficiently scale on general parallel computers. In the future, we plan to generalize our study to other scientific applications with irregular computation and I/O patterns such as parallel HMMER [53] and large-scale protein family identification [54].…”
Section: Resultsmentioning
confidence: 99%
“…The experimental results on multiple platforms demonstrate that our integrated scheduling approach allows large-scale sequence search to efficiently scale on general parallel computers. In the future, we plan to generalize our study to other scientific applications with irregular computation and I/O patterns such as parallel HMMER [53] and large-scale protein family identification [54].…”
Section: Resultsmentioning
confidence: 99%
“…For instance: it can be used to reduce redundancy within sequence repositories; identify complexes within metabolic networks [2]; identify core groups of proteins that constitute a protein family [26,28,29] and in the process also help assign family memberships for newly found peptide candidates [29]; help in the construction of mass spectral libraries for peptides [17]; and can be used to condense the space of plausible computer-generated phylogenetic trees [23].…”
Section: Introductionmentioning
confidence: 99%
“…In our earlier work, we implemented a serial version of this heuristic, and applied it in the context of metagenomic protein family detection [28]. Put briefly, this approach [28], called pClust, transforms the problem into one of bipartite graph clustering so that the approach developed by Gibson et al originally for web community detection can be used.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations