BackgroundLikelihood-based phylogenetic inference is generally considered to be the most reliable classification method for unknown sequences. However, traditional likelihood-based phylogenetic methods cannot be applied to large volumes of short reads from next-generation sequencing due to computational complexity issues and lack of phylogenetic signal. "Phylogenetic placement," where a reference tree is fixed and the unknown query sequences are placed onto the tree via a reference alignment, is a way to bring the inferential power offered by likelihood-based approaches to large data sets.ResultsThis paper introduces , a software package for phylogenetic placement and subsequent visualization. The algorithm can place twenty thousand short reads on a reference tree of one thousand taxa per hour per processor, has essentially linear time and memory complexity in the number of reference taxa, and is easy to run in parallel. features calculation of the posterior probability of a placement on an edge, which is a statistically rigorous way of quantifying uncertainty on an edge-by-edge basis. It also can inform the user of the positional uncertainty for query sequences by calculating expected distance between placement locations, which is crucial in the estimation of uncertainty with a well-sampled reference tree. The software provides visualizations using branch thickness and color to represent number of placements and their uncertainty. A simulation study using reads generated from 631 COG alignments shows a high level of accuracy for phylogenetic placement over a wide range of alignment diversity, and the power of edge uncertainty estimates to measure placement confidence.Conclusions enables efficient phylogenetic placement and subsequent visualization, making likelihood-based phylogenetics methodology practical for large collections of reads; it is freely available as source code, binaries, and a web service.
Background Bacterial vaginosis (BV) is a common condition that is associated with numerous adverse health outcomes and is characterized by poorly understood changes in the vaginal microbiota. We sought to describe the composition and diversity of the vaginal bacterial biota in women with BV using deep sequencing of the 16S rRNA gene coupled with species-level taxonomic identification. We investigated the associations between the presence of individual bacterial species and clinical diagnostic characteristics of BV. Methodology/Principal Findings Broad-range 16S rRNA gene PCR and pyrosequencing were performed on vaginal swabs from 220 women with and without BV. BV was assessed by Amsel’s clinical criteria and confirmed by Gram stain. Taxonomic classification was performed using phylogenetic placement tools that assigned 99% of query sequence reads to the species level. Women with BV had heterogeneous vaginal bacterial communities that were usually not dominated by a single taxon. In the absence of BV, vaginal bacterial communities were dominated by either Lactobacillus crispatus or Lactobacillus iners . Leptotrichia amnionii and Eggerthella sp. were the only two BV-associated bacteria (BVABs) significantly associated with each of the four Amsel’s criteria. Co-occurrence analysis revealed the presence of several sub-groups of BVABs suggesting metabolic co-dependencies. Greater abundance of several BVABs was observed in Black women without BV. Conclusions/Significance The human vaginal bacterial biota is heterogeneous and marked by greater species richness and diversity in women with BV; no species is universally present. Different bacterial species have different associations with the four clinical criteria, which may account for discrepancies often observed between Amsel and Nugent (Gram stain) diagnostic criteria. Several BVABs exhibited race-dependent prevalence when analyzed in separate groups by BV status which may contribute to increased incidence of BV in Black women. Tools developed in this project can be used to study microbial ecology in diverse settings at high resolution.
Like all organisms on the planet, environmental microbes are subject to the forces of molecular evolution. Metagenomic sequencing provides a means to access the DNA sequence of uncultured microbes. By combining DNA sequencing of microbial communities with evolutionary modeling and phylogenetic analysis we might obtain new insights into microbiology and also provide a basis for practical tools such as forensic pathogen detection.In this work we present an approach to leverage phylogenetic analysis of metagenomic sequence data to conduct several types of analysis. First, we present a method to conduct phylogeny-driven Bayesian hypothesis tests for the presence of an organism in a sample. Second, we present a means to compare community structure across a collection of many samples and develop direct associations between the abundance of certain organisms and sample metadata. Third, we apply new tools to analyze the phylogenetic diversity of microbial communities and again demonstrate how this can be associated to sample metadata.These analyses are implemented in an open source software pipeline called PhyloSift. As a pipeline, PhyloSift incorporates several other programs including LAST, HMMER, and pplacer to automate phylogenetic analysis of protein coding and RNA sequences in metagenomic datasets generated by modern sequencing platforms (e.g., Illumina, 454).
The human SAMHD1 protein potently restricts lentiviral infection in dendritic cells and monocyte/macrophages, but is antagonized by the primate lentiviral protein Vpx which targets SAMHD1for degradation. However, only two of eight primate lentivirus lineages encode Vpx whereas its paralog, Vpr, is conserved across all extant primate lentiviruses. We find that not only multiple Vpx but also some Vpr proteins are able to degrade SAMHD1 and such antagonism led to dramatic positive selection of SAMHD1 in the primate subfamily Cercopithecinae. Residues that have evolved under positive selection precisely determine sensitivity to Vpx/Vpr degradation and alter binding specificity. By overlaying these functional analyses on a phylogenetic framework of Vpr and Vpx evolution, we can decipher the chronology of acquisition of SAMHD1-degrading abilities in lentiviruses. We conclude that vpr neofunctionalized to degrade SAMHD1 even prior to the birth of a separate vpx gene, thereby initiating an evolutionary arms race with SAMHD1.
VDJ rearrangement and somatic hypermutation work together to produce antibody-coding B cell receptor (BCR) sequences for a remarkable diversity of antigens. It is now possible to sequence these BCRs in high throughput; analysis of these sequences is bringing new insight into how antibodies develop, in particular for broadly-neutralizing antibodies against HIV and influenza. A fundamental step in such sequence analysis is to annotate each base as coming from a specific one of the V, D, or J genes, or from an N-addition (a.k.a. non-templated insertion). Previous work has used simple parametric distributions to model transitions from state to state in a hidden Markov model (HMM) of VDJ recombination, and assumed that mutations occur via the same process across sites. However, codon frame and other effects have been observed to violate these parametric assumptions for such coding sequences, suggesting that a non-parametric approach to modeling the recombination process could be useful. In our paper, we find that indeed large modern data sets suggest a model using parameter-rich per-allele categorical distributions for HMM transition probabilities and per-allele-per-position mutation probabilities, and that using such a model for inference leads to significantly improved results. We present an accurate and efficient BCR sequence annotation software package using a novel HMM “factorization” strategy. This package, called (https://github.com/psathyrella/partis/), is built on a new general-purpose HMM compiler that can perform efficient inference given a simple text description of an HMM.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.