Local homology recognition and distance measures in linear time using compressed amino acid alphabets

Edgar, Robert C.

doi:10.1093/nar/gkh180

Cited by 134 publications

(103 citation statements)

References 17 publications

Supporting

Mentioning

101

Contrasting

Order By: Relevance

“…Since large-scale microbial community profiling becomes more accessible to scientists, scalable yet accurate tools like CRiSPy are crucial for research in this area. Although CRiSPy is designed for microbial studies targeting DNA sequence analysis, the individual k-mer distance and genetic distance modules on GPUs can easily be extended to support protein sequence analysis and be used in general sequence analysis studies such as the usage of k-mer distance for fast, approximate phylogenetic tree construction by Edgar [17] or the utilization of pairwise genetic distance matrix in multiple sequence alignment programs such as ClustalW [18]. Availability: CRiSPy is available from the authors upon request.…”

Section: Resultsmentioning

confidence: 99%

CRiSPy-CUDA: Computing Species Richness in 16S rRNA Pyrosequencing Datasets with CUDA

Zheng

Nguyen

Schmidt

2011

Pattern Recognition in Bioinformatics

View full text Add to dashboard Cite

Abstract. Pyrosequencing technologies are frequently used for sequencing the 16S rRNA marker gene for metagenomic studies of microbial communities. Computing a pairwise genetic distance matrix from the produced reads is an important but highly time consuming task. In this paper, we present a parallelized tool (called CRiSPy) for scalable pairwise genetic distance matrix computation and clustering that is based on the processing pipeline of the popular ESPRIT software package. To achieve high computational efficiency, we have designed massively parallel CUDA algorithms for pairwise k-mer distance and pairwise genetic distance computation. We have also implemented a memory-efficient sparse matrix clustering program to process the distance matrix. On a single-GPU, CRiSPy achieves speedups of around two orders of magnitude compared to the sequential ESPRIT program for both the time-consuming pairwise genetic distance module and the whole processing pipeline, thus making CRiSPy particularly suitable for high-throughput microbial studies.

show abstract

Section: Resultsmentioning

confidence: 99%

CRiSPy-CUDA: Computing Species Richness in 16S rRNA Pyrosequencing Datasets with CUDA

Zheng

Nguyen

Schmidt

2011

Pattern Recognition in Bioinformatics

View full text Add to dashboard Cite

show abstract

“…(2) K-mer distance [10,12], which is linear time complexity for pairwise distance estimation. (3) Random generation by computer pseudo number, which is just to provide a fast method to generate data.…”

Section: Experiments Resultsmentioning

confidence: 99%

Fine Grain Parallel Construction of Neighbour-joining Phylogenetic Trees with Reduced Redundancy Using Multithreading

Sahoo¹,

Behura²,

Padhy³

2010

IJDPS

View full text Add to dashboard Cite

show abstract

“…On the other hand, the enhancement of efficiency is widely observed in sequence-related bioinformatics studies. For example, a check for the accuracy and efficiency in the local homolog recognition with simplified alphabet is carried out [146]. It is found that the compressed alphabets could greatly improve the performance in local similarity discovery with a comparable coverage.…”

Section: Implications Of Simplified Amino Acid Alphabetsmentioning

confidence: 99%

Simplification of complexity in protein molecular systems by grouping amino acids: a view from physics

Wang

2016

Advances in Physics: X

View full text Add to dashboard Cite

Local homology recognition and distance measures in linear time using compressed amino acid alphabets

Cited by 134 publications

References 17 publications

CRiSPy-CUDA: Computing Species Richness in 16S rRNA Pyrosequencing Datasets with CUDA

CRiSPy-CUDA: Computing Species Richness in 16S rRNA Pyrosequencing Datasets with CUDA

Fine Grain Parallel Construction of Neighbour-joining Phylogenetic Trees with Reduced Redundancy Using Multithreading

Simplification of complexity in protein molecular systems by grouping amino acids: a view from physics

Contact Info

Product

Resources

About