Recent metagenomics studies of environmental samples suggested that microbial communities are much more diverse than previously reported, and deep sequencing will significantly increase the estimate of total species diversity. Massively parallel pyrosequencing technology enables ultra-deep sequencing of complex microbial populations rapidly and inexpensively. However, computational methods for analyzing large collections of 16S ribosomal sequences are limited. We proposed a new algorithm, referred to as ESPRIT, which addresses several computational issues with prior methods. We developed two versions of ESPRIT, one for personal computers (PCs) and one for computer clusters (CCs). The PC version is used for small- and medium-scale data sets and can process several tens of thousands of sequences within a few minutes, while the CC version is for large-scale problems and is able to analyze several hundreds of thousands of reads within one day. Large-scale experiments are presented that clearly demonstrate the effectiveness of the newly proposed algorithm. The source code and user guide are freely available at http://www.biotech.ufl.edu/people/sun/esprit.html.
Recent advances in massively parallel sequencing technology have created new opportunities to probe the hidden world of microbes. Taxonomy-independent clustering of the 16S rRNA gene is usually the first step in analyzing microbial communities. Dozens of algorithms have been developed in the last decade, but a comprehensive benchmark study is lacking. Here, we survey algorithms currently used by microbiologists, and compare seven representative methods in a large-scale benchmark study that addresses several issues of concern. A new experimental protocol was developed that allows different algorithms to be compared using the same platform, and several criteria were introduced to facilitate a quantitative evaluation of the clustering performance of each algorithm. We found that existing methods vary widely in their outputs, and that inappropriate use of distance levels for taxonomic assignments likely resulted in substantial overestimates of biodiversity in many studies. The benchmark study identified our recently developed ESPRIT-Tree, a fast implementation of the average linkage-based hierarchical clustering algorithm, as one of the best algorithms available in terms of computational efficiency and clustering accuracy.
Taxonomy-independent analysis plays an essential role in microbial community analysis. Hierarchical clustering is one of the most widely employed approaches to finding operational taxonomic units, the basis for many downstream analyses. Most existing algorithms have quadratic space and computational complexities, and thus can be used only for small or medium-scale problems. We propose a new online learning-based algorithm that simultaneously addresses the space and computational issues of prior work. The basic idea is to partition a sequence space into a set of subspaces using a partition tree constructed using a pseudometric, then recursively refine a clustering structure in these subspaces. The technique relies on new methods for fast closest-pair searching and efficient dynamic insertion and deletion of tree nodes. To avoid exhaustive computation of pairwise distances between clusters, we represent each cluster of sequences as a probabilistic sequence, and define a set of operations to align these probabilistic sequences and compute genetic distances between them. We present analyses of space and computational complexity, and demonstrate the effectiveness of our new algorithm using a human gut microbiota data set with over one million sequences. The new algorithm exhibits a quasilinear time and space complexity comparable to greedy heuristic clustering algorithms, while achieving a similar accuracy to the standard hierarchical clustering algorithm.
Atrial fibrillation (AF) is one of the most common sustained chronic cardiac arrhythmia in elderly population, associated with a high mortality and morbidity in stroke, heart failure, coronary artery disease, systemic thromboembolism, etc. The early detection of AF is necessary for averting the possibility of disability or mortality. However, AF detection remains problematic due to its episodic pattern. In this paper, a multiscaled fusion of deep convolutional neural network (MS-CNN) is proposed to screen out AF recordings from single lead short electrocardiogram (ECG) recordings. The MS-CNN employs the architecture of two-stream convolutional networks with different filter sizes to capture features of different scales. The experimental results show that the proposed MS-CNN achieves 96.99% of classification accuracy on ECG recordings cropped/padded to 5 s. Especially, the best classification accuracy, 98.13%, is obtained on ECG recordings of 20 s. Compared with artificial neural network, shallow single-stream CNN, and VisualGeometry group network, the MS-CNN can achieve the better classification performance. Meanwhile, visualization of the learned features from the MS-CNN demonstrates its superiority in extracting linear separable ECG features without hand-craft feature engineering. The excellent AF screening performance of the MS-CNN can satisfy the most elders for daily monitoring with wearable devices.
Reticulitermes flavipes (Isoptera: Rhinotermitidae) is a highly eusocial insect that thrives on recalcitrant lignocellulosic diets through nutritional symbioses with gut-dwelling prokaryotes and eukaryotes. In the R. flavipes hindgut, there are up to 12 eukaryotic protozoan symbionts; the number of prokaryotic symbionts has been estimated in the hundreds. Despite its biological relevance, this diverse community, to date, has been investigated only by culture- and cloning-dependent methods. Moreover, it is unclear how termite gut microbiomes respond to diet changes and what roles they play in lignocellulose digestion. This study utilized high-throughput 454 pyrosequencing of 16S V5-V6 amplicons to sample the hindgut lumen prokaryotic microbiota of R. flavipes and to examine compositional changes in response to lignin-rich and lignin-poor cellulose diets after a 7-day feeding period. Of the ~475,000 high-quality reads that were obtained, 99.9% were annotated as bacteria and 0.11% as archaea. Major bacterial phyla included Spirochaetes (24.9%), Elusimicrobia (19.8%), Firmicutes (17.8%), Bacteroidetes (14.1%), Proteobacteria (11.4%), Fibrobacteres (5.8%), Verrucomicrobia (2.0%), Actinobacteria (1.4%) and Tenericutes (1.3%). The R. flavipes hindgut lumen prokaryotic microbiota was found to contain over 4761 species-level phylotypes. However, diet-dependent shifts were not statistically significant or uniform across colonies, suggesting significant environmental and/or host genetic impacts on colony-level microbiome composition. These results provide insights into termite gut microbiome diversity and suggest that (i) the prokaryotic gut microbiota is much more complex than previously estimated, and (ii) environment, founding reproductive pair effects and/or host genetics influence microbiome composition.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.