Michael Nute scite author profile

Over the past decade several studies have reported that the gut microbiomes of mammals with similar dietary niches exhibit similar compositional and functional traits. However, these studies rely heavily on samples from captive individuals and often confound host phylogeny, gut morphology, and diet. To more explicitly test the influence of host dietary niche on the mammalian gut microbiome we use 16S rRNA gene amplicon sequencing and shotgun metagenomics to compare the gut microbiota of 18 species of wild non-human primates classified as either folivores or closely related non-folivores, evenly distributed throughout the primate order and representing a range of gut morphological specializations. While folivory results in some convergent microbial traits, collectively we show that the influence of host phylogeny on both gut microbial composition and function is much stronger than that of host dietary niche. This pattern does not result from differences in host geographic location or actual dietary intake at the time of sampling, but instead appears to result from of differences in host physiology. These findings indicate that mammalian gut microbiome plasticity in response to dietary shifts over both the lifespan of an individual host and the evolutionary history of a given host species is constrained by host physiological evolution. Therefore, the gut microbiome cannot be considered separately from host physiology when describing host nutritional strategies and the emergence of host dietary niches.

show abstract

Emu: species-level microbial community profiling of full-length 16S rRNA Oxford Nanopore sequencing data

Curry

Wang

Nute

et al. 2022

Nat Methods

View full text Add to dashboard Cite

Current progress and open challenges for applying deep learning across the biosciences

et al. 2022

View full text Add to dashboard Cite

Deep Learning (DL) has recently enabled unprecedented advances in one of the grand challenges in computational biology: the half-century-old problem of protein structure prediction. In this paper we discuss recent advances, limitations, and future perspectives of DL on five broad areas: protein structure prediction, protein function prediction, genome engineering, systems biology and data integration, and phylogenetic inference. We discuss each application area and cover the main bottlenecks of DL approaches, such as training data, problem scope, and the ability to leverage existing DL architectures in new contexts. To conclude, we provide a summary of the subject-specific and general challenges for DL across the biosciences.

show abstract

Long-Branch Attraction in Species Tree Estimation: Inconsistency of Partitioned Likelihood and Topology-Based Summary Methods

Roch

Nute

Warnow

2018

View full text Add to dashboard Cite

With advances in sequencing technologies, there are now massive amounts of genomic data from across all life, leading to the possibility that a robust Tree of Life can be constructed. However, "gene tree heterogeneity", which is when different genomic regions can evolve differently, is a common phenomenon in multi-locus datasets, and reduces the accuracy of standard methods for species tree estimation that do not take this heterogeneity into account. New methods have been developed for species tree estimation that specifically address gene tree heterogeneity, and that have been proven to converge to the true species tree when the number of loci and number of sites per locus both increase (i.e., the methods are said to be "statistically consistent"). Yet, little is known about the biologically realistic condition where the number of sites per locus is bounded. We show that when the sequence length of each locus is bounded (by any arbitrarily chosen value), the most common approaches to species tree estimation that take heterogeneity into account (i.e., traditional fully partitioned concatenated maximum likelihood and newer approaches, called summary methods, that estimate the species tree by combining estimated gene trees) are not statistically consistent, even when the heterogeneity is extremely constrained. The main challenge is the presence of conditions such as long branch attraction that create biased tree estimation when the number of sites is restricted. Hence, our study uncovers a fundamental challenge to species tree estimation using both traditional and new methods.

show abstract

The performance of coalescent-based species tree estimation methods under models of missing data

et al. 2018

View full text Add to dashboard Cite

BackgroundEstimation of species trees from multiple genes is complicated by processes such as incomplete lineage sorting, gene duplication and loss, and horizontal gene transfer, that result in gene trees that differ from each other and from the species phylogeny. Methods to estimate species trees in the presence of gene tree discord due to incomplete lineage sorting have been developed and proved to be statistically consistent when gene tree discord is due only to incomplete lineage sorting and every gene tree includes the full set of species.ResultsWe establish statistical consistency of certain coalescent-based species tree estimation methods under some models of taxon deletion from genes. We also evaluate the impact of missing data on four species tree estimation methods (ASTRAL-II, ASTRID, MP-EST, and SVDquartets) using simulated datasets with varying levels of incomplete lineage sorting, gene tree estimation error, and degrees/patterns of missing data.ConclusionsAll the species tree estimation methods improved in accuracy as the number of genes increased and often produced highly accurate species trees even when the amount of missing data was large. These results together indicate that accurate species tree estimation is possible under a variety of conditions, even when there are substantial amounts of missing data.Electronic supplementary materialThe online version of this article (10.1186/s12864-018-4619-8) contains supplementary material, which is available to authorized users.

show abstract

Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows

Shah

Nute

Warnow

et al. 2018

View full text Add to dashboard Cite

Evaluating Statistical Multiple Sequence Alignment in Comparison to Other Alignment Methods on Protein Data Sets

Nute

Saleh

Warnow

2018

View full text Add to dashboard Cite

The estimation of multiple sequence alignments of protein sequences is a basic step in many bioinformatics pipelines, including protein structure prediction, protein family identification, and phylogeny estimation. Statistical coestimation of alignments and trees under stochastic models of sequence evolution has long been considered the most rigorous technique for estimating alignments and trees, but little is known about the accuracy of such methods on biological benchmarks. We report the results of an extensive study evaluating the most popular protein alignment methods as well as the statistical coestimation method BAli-Phy on 1192 protein data sets from established benchmarks as well as on 120 simulated data sets. Our study (which used more than 230 CPU years for the BAli-Phy analyses alone) shows that BAli-Phy has better precision and recall (with respect to the true alignments) than the other alignment methods on the simulated data sets but has consistently lower recall on the biological benchmarks (with respect to the reference alignments) than many of the other methods. In other words, we find that BAli-Phy systematically underaligns when operating on biological sequence data but shows no sign of this on simulated data. There are several potential causes for this change in performance, including model misspecification, errors in the reference alignments, and conflicts between structural alignment and evolutionary alignments, and future research is needed to determine the most likely explanation. We conclude with a discussion of the potential ramifications for each of these possibilities. [BAli-Phy; homology; multiple sequence alignment; protein sequences; structural alignment.]

show abstract

Emu: Species-Level Microbial Community Profiling for Full-Length Nanopore 16S Reads

Curry

Wang

Nute

et al. 2021

Preprint

View full text Add to dashboard Cite

16S rRNA based analysis is the established standard for elucidating microbial community composition. While short read 16S analyses are largely confined to genus-level resolution at best since only a portion of the gene is sequenced, full-length 16S sequences have the potential to provide species-level accuracy. However, existing taxonomic identification algorithms are not optimized for the increased read length and error rate of long-read data. Here we present Emu, a novel approach that employs an expectation-maximization (EM) algorithm to generate taxonomic abundance profiles from full-length 16S rRNA reads. Results produced from one simulated data set and two mock communities prove Emu capable of accurate microbial community profiling while obtaining fewer false positives and false negatives than alternative methods. Additionally, we illustrate a real-world application of our new software by comparing clinical sample composition estimates generated by an established whole-genome shotgun sequencing workflow to those returned by full-length 16S sequences processed with Emu.

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Michael Nute

Evolutionary trends in host physiology outweigh dietary niche in structuring primate gut microbiomes

Emu: species-level microbial community profiling of full-length 16S rRNA Oxford Nanopore sequencing data

Current progress and open challenges for applying deep learning across the biosciences

Long-Branch Attraction in Species Tree Estimation: Inconsistency of Partitioned Likelihood and Topology-Based Summary Methods

The performance of coalescent-based species tree estimation methods under models of missing data

Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows

Evaluating Statistical Multiple Sequence Alignment in Comparison to Other Alignment Methods on Protein Data Sets

Emu: Species-Level Microbial Community Profiling for Full-Length Nanopore 16S Reads

Contact Info

Product

Resources

About