SUMMARY Within each bacterial species, different strains may vary in the set of genes they encode or in the copy number of these genes. Yet, taxonomic characterization of the human microbiota is often limited to the species level or to previously sequenced strains, and accordingly, the prevalence of intra-species variation, its functional role, and its relation to host health remain unclear. Here we present a first comprehensive large-scale analysis of intra-species copy number variation in the gut microbiome, introducing a rigorous computational pipeline for detecting such variation directly from shotgun metagenomic data. We uncover a large set of variable genes in numerous species and demonstrate that this variation has significant functional and clinically-relevant implications. We additionally infer intra-species compositional profiles, identifying population structure shifts and the presence of yet uncharacterized variants. Our results highlight the complex relationship between microbiome composition and functional capacity, linking metagenome-level compositional shifts to strain-level variation.
Cystic fibrosis (CF) results in inflammation, malabsorption of fats and other nutrients, and obstruction in the gastrointestinal (GI) tract, yet the mechanisms linking these disease manifestations to microbiome composition remain largely unexplored. Here we used metagenomic analysis to systematically characterize fecal microbiomes of children with and without CF, demonstrating marked CF-associated taxonomic dysbiosis and functional imbalance. We further showed that these taxonomic and functional shifts were especially pronounced in young children with CF and diminished with age. Importantly, the resulting dysbiotic microbiomes had significantly altered capacities for lipid metabolism, including decreased capacity for overall fatty acid biosynthesis and increased capacity for degrading anti-inflammatory short-chain fatty acids. Notably, these functional differences correlated with fecal measures of fat malabsorption and inflammation. Combined, these results suggest that enteric fat abundance selects for pro-inflammatory GI microbiota in young children with CF, offering novel strategies for improving the health of children with CF-associated fat malabsorption.
Protein structure comparison is a fundamental problem for structural genomics, with applications to drug design, fold prediction, protein clustering, and evolutionary studies. Despite its importance, there are very few rigorous methods and widely accepted similarity measures known for this problem. In this paper we describe the last few years of developments on the study of an emerging measure, the contact map overlap (CMO), for protein structure comparison. A contact map is a list of pairs of residues which lie in three-dimensional proximity in the protein's native fold. Although this measure is in principle computationally hard to optimize, we show how it can in fact be computed with great accuracy for related proteins by integer linear programming techniques. These methods have the advantage of providing certificates of near-optimality by means of upper bounds to the optimal alignment value. We also illustrate effective heuristics, such as local search and genetic algorithms. We were able to obtain for the first time optimal alignments for large similar proteins (about 1,000 residues and 2,000 contacts) and used the CMO measure to cluster proteins in families. The clusters obtained were compared to SCOP classification in order to validate the measure. Extensive computational experiments showed that alignments which are off by at most 10% from the optimal value can be computed in a short time. Further experiments showed how this measure reacts to the choice of the threshold defining a contact and how to choose this threshold in a sensible way.
We present a new technique for the design of approximation algorithms that can be viewed as a generfllzation of randomized rounding. We derive new or improved approximation guaranteesfor a class of generalized congestion problems such as multicast congestion, multipleTSP etc. Our main mathematicaltool is a structuraldecomposition theorem related to the integraMy gap of a relaxation. Introduction 1Randomized rounding has become a standard technique in the design of approximation algorithms for NP-hard optimization problems, especially for packing and covering problems. and set it to zero with probability 1 -z:. The resulting integral setting is not necessarily a solution to the 1P, but it has several nice properties. In particular, the expected cost is cx* and it approximately satisfies the constraints with high probability.
We present a new algorithm to canonize molecular graphs using the signature molecular descriptor introduced in the previous papers of this series. While developed specifically for molecular structures, the algorithm can be used for any graph and is not limited to acyclic graphs, planar graphs, bounded valence, or bounded genus graphs, for which polynomial time algorithms exist. The algorithm is tested with benzenoid hydrocarbons and a database of 126,705 organic compounds. The algorithm's performances are compared against Brendan Mc Kay's Nauty algorithm, which is believed to be the fastest graph canonization algorithm for general graphs, with five series of graphs each comprising up to 30,000 vertices: 2D meshes (pericondensed benzenoids), 3D cages (fullerenes and nanotubes), 3D meshes (crystal lattices), 4D cages, and power law graphs (protein and gene networks). The algorithm can be downloaded as an open source code at http://www.cs.sandia.gov/ approximately jfaulon/QSAR.
BackgroundHost-microbe and microbe-microbe interactions are often governed by the complex exchange of metabolites. Such interactions play a key role in determining the way pathogenic and commensal species impact their host and in the assembly of complex microbial communities. Recently, several studies have demonstrated how such interactions are reflected in the organization of the metabolic networks of the interacting species, and introduced various graph theory-based methods to predict host-microbe and microbe-microbe interactions directly from network topology. Using these methods, such studies have revealed evolutionary and ecological processes that shape species interactions and community assembly, highlighting the potential of this reverse-ecology research paradigm.ResultsNetCooperate is a web-based tool and a software package for determining host-microbe and microbe-microbe cooperative potential. It specifically calculates two previously developed and validated metrics for species interaction: the Biosynthetic Support Score which quantifies the ability of a host species to supply the nutritional requirements of a parasitic or a commensal species, and the Metabolic Complementarity Index which quantifies the complementarity of a pair of microbial organisms’ niches. NetCooperate takes as input a pair of metabolic networks, and returns the pairwise metrics as well as a list of potential syntrophic metabolic compounds.ConclusionsThe Biosynthetic Support Score and Metabolic Complementarity Index provide insight into host-microbe and microbe-microbe metabolic interactions. NetCooperate determines these interaction indices from metabolic network topology, and can be used for small- or large-scale analyses. NetCooperate is provided as both a web-based tool and an open-source Python module; both are freely available online at http://elbo.gs.washington.edu/software_netcooperate.html.
To assess the functional capacities of microbial communities, including those inhabiting the human body, shotgun metagenomic reads are often aligned to a database of known genes. Such homology-based annotation practices critically rely on the assumption that short reads can map to orthologous genes of similar function. This assumption, however, and the various factors that impact short read annotation, have not been systematically evaluated. To address this challenge, we generated an extremely large database of simulated reads (totaling 15.9 Gb), spanning over 500,000 microbial genes and 170 curated genomes and including, for many genomes, every possible read of a given length. We annotated each read using common metagenomic protocols, fully characterizing the effect of read length, sequencing error, phylogeny, database coverage, and mapping parameters. We additionally rigorously quantified gene-, genome-, and protocol-specific annotation biases. Overall, our findings provide a first comprehensive evaluation of the capabilities and limitations of functional metagenomic annotation, providing crucial goal-specific best-practice guidelines to inform future metagenomic research.
The human microbiome represents a vastly complex ecosystem that is tightly linked to our development, physiology, and health. Our increased capacity to generate multiple channels of omic data from this system, brought about by recent advances in high throughput molecular technologies, calls for the development of systems-level methods and models that take into account not only the composition of genes and species in a microbiome but also the interactions between these components. Such models should aim to study the microbiome as a community of species whose metabolisms are tightly intertwined with each other and with that of the host, and should be developed with a view towards an integrated, comprehensive, and predictive modeling framework. Here, we review recent work specifically in metabolic modeling of the human microbiome, highlighting both novel methodologies and pressing challenges. We discuss various modeling approaches that lay the foundation for a full-scale predictive model, focusing on models of interactions between microbial species, metagenome-scale models of community-level metabolism, and models of the interaction between the microbiome and the host. Continued development of such models and of their integration into a multi-scale model of the microbiome will lead to a deeper mechanistic understanding of how variation in the microbiome impacts the host, and will promote the discovery of clinically- and ecologically-relevant insights from the rich trove of data now available.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.