Understanding which phenotypic traits are consistently correlated throughout evolution is a highly pertinent problem in modern evolutionary biology. Here, we propose a multivariate phylogenetic latent liability model for assessing the correlation between multiple types of data, while simultaneously controlling for their unknown shared evolutionary history informed through molecular sequences. The latent formulation enables us to consider in a single model combinations of continuous traits, discrete binary traits, and discrete traits with multiple ordered and unordered states. Previous approaches have entertained a single data type generally along a fixed history, precluding estimation of correlation between traits and ignoring uncertainty in the history. We implement our model in a Bayesian phylogenetic framework, and discuss inference techniques for hypothesis testing. Finally, we showcase the method through applications to columbine flower morphology, antibiotic resistance in Salmonella, and epitope evolution in influenza.
BackgroundQuaternary climatic changes led to variations in sea level and these variations played a significant role in the generation of marine terrace deposits in the South Atlantic Coastal Plain. The main consequence of the increase in sea level was local extinction or population displacement, such that coastal species would be found around the new coastline. Our main goal was to investigate the effects of sea level changes on the geographical structure and variability of genetic lineages from a Petunia species endemic to the South Atlantic Coastal Plain. We employed a phylogeographic approach based on plastid sequences obtained from individuals collected from the complete geographic distribution of Petunia integrifolia ssp. depauperata and its sister group. We used population genetics tests to evaluate the degree of genetic variation and structure among and within populations, and we used haplotype network analysis and Bayesian phylogenetic methods to estimate divergence times and population growth.ResultsWe observed three major genetic lineages whose geographical distribution may be related to different transgression/regression events that occurred in this region during the Pleistocene. The divergence time between the monophyletic group P. integrifolia ssp. depauperata and its sister group (P. integrifolia ssp. integrifolia) was compatible with geological estimates of the availability of the coastal plain. Similarly, the origin of each genetic lineage is congruent with geological estimates of habitat availability.ConclusionsDiversification of P. integrifolia ssp. depauperata possibly occurred as a consequence of the marine transgression/regression cycles during the Pleistocene. In periods of high sea level, plants were most likely restricted to a refuge area corresponding to fossil dunes and granitic hills, from which they colonized the coast once the sea level came down. The modern pattern of lineage geographical distribution and population variation was established by a range expansion with serial founder effects conditioned on soil availability.Electronic supplementary materialThe online version of this article (doi:10.1186/s12862-015-0363-8) contains supplementary material, which is available to authorized users.
The COVID-19 pandemic has already reached approximately 110 million people and it is associated with 2.5 million deaths worldwide. Brazil is the third worst-hit country, with approximately 10.2 million cases and 250 thousand deaths. Unprecedented international efforts have been established in order to share information about epidemiology, viral evolution and transmission dynamics. However, sequencing facilities and research investments are very heterogeneous across different regions and countries across the globe. The understanding of the SARS-CoV-2 biology is a vital part for the development of effective strategies for public health care and disease management. This work aims to analyze the available genomes sequenced in Brazil between February 2020 and February 2021, in order to identify mutation hotspots, geographical and temporal distribution of SARS-CoV-2 lineages in the Brazilian territory by using phylogenetics and phylodynamics analyses from high-quality genomes. We describe heterogeneous and episodic sequencing efforts, the progression of the different lineages along time, evaluating mutational spectra and frequency oscillations derived from the prevalence of novel and specific lineages across different Brazilian regions. We found at least seven major (1-7) and two minor clades (4.2 and 5.3) related to the six most prevalent Brazilian lineages and described its distribution across the Brazilian territory. The emergence and recent frequency shift of lineages (P.1 and P.2) containing mutations of concern in the spike protein (e. g., E484K, N501Y) draws attention due to their association with immune evasion and enhanced receptor binding affinity. Improvements in genomic surveillance are of paramount importance and should be extended in Brazil to better inform policy makers and enable precise evidence-based decisions to fight the COVID-19 pandemic.
Genetic data are frequently categorical and have complex dependence structures that are not always well understood. For this reason, clustering and classification based on genetic data, while highly relevant, are challenging statistical problems. Here we consider a highly versatile U-statistics based approach built on dissimilarities between pairs of data points for nonparametric clustering. In this work we propose statistical tests to assess group homogeneity taking into account the multiple testing issues, and a clustering algorithm based on dissimilarities within and between groups that highly speeds up the homogeneity test. We also propose a test to verify classification significance of a sample in one of two groups. A Monte Carlo simulation study is presented to evaluate power of the classification test, considering different group sizes and degree of separation. Size and power of the homogeneity test are also analyzed through simulations that compare it to competing methods. Finally, the methodology is applied to three different genetic datasets: global human genetic diversity, breast tumor gene expression and Dengue virus serotypes. These applications showcase this statistical framework's ability to answer diverse biological questions while adapting to the specificities of the different datatypes.
Background Brazil is the third country most affected by Coronavirus disease-2019 (COVID-19), but viral evolution in municipality resolution is still poorly understood in Brazil and it is crucial to understand the epidemiology of viral spread. We aimed to track molecular evolution and spread of Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in Esteio (Southern Brazil) using phylogenetics and phylodynamics inferences from 21 new genomes in global and regional context. Importantly, the case fatality rate (CFR) in Esteio (3.26%) is slightly higher compared to the Rio Grande do Sul (RS) state (2.56%) and the entire Brazil (2.74%). Results We provided a comprehensive view of mutations from a representative sampling from May to October 2020, highlighting two frequent mutations in spike glycoprotein (D614G and V1176F), an emergent mutation (E484K) in spike Receptor Binding Domain (RBD) characteristic of the B.1.351 and P.1 lineages, and the adjacent replacement of 2 amino acids in Nucleocapsid phosphoprotein (R203K and G204R). E484K was found in two genomes from mid-October, which is the earliest description of this mutation in Southern Brazil. Lineages containing this substitution must be subject of intense surveillance due to its association with immune evasion. We also found two epidemiologically-related clusters, including one from patients of the same neighborhood. Phylogenetics and phylodynamics analysis demonstrates multiple introductions of the Brazilian most prevalent lineages (B.1.1.33 and B.1.1.248) and the establishment of Brazilian lineages ignited from the Southeast to other Brazilian regions. Conclusions Our data show the value of correlating clinical, epidemiological and genomic information for the understanding of viral evolution and its spatial distribution over time. This is of paramount importance to better inform policy making strategies to fight COVID-19.
Brazil is the third country most affected by Covid-19 pandemic. In spite of this, viral evolution in municipality resolution is poorly understood in Brazil and it is crucial to understand the epidemiology of viral spread. We identified four main circulating lineages in Esteio (Southern Brazil) and their relationship with global, national and regional lineages using phylogenetics and phylodynamics inferences from 21 SARS-CoV-2 genome sequences. We provided a comprehensive view of viral mutations from a time- and age-representative sampling from May to October 2020, in Esteio (RS, Brazil), highlighting two frequent mutations in Spike glycoprotein (D614G and V1176F), an emergent mutation (E484K) in Spike Receptor Binding Domain (RBD) characteristic of the South African lineage B.1.351, and the adjacent replacement of 2 amino acids in Nucleocapsid phosphoprotein (R203K and G204R). A significant viral diversity was evidenced with the identification of 80 different SNPs. The E484K replacement was found in two genomes (9.5%) from samples obtained in mid-October, which is to our best knowledge the earliest description of E484K harboring SARS-CoV-2 in South Brazil. This mutation identified in a small municipality from the RS state demonstrates that it was probably widely distributed in the Brazilian territory, but went unnoticed so far by the lack of genomic surveillance in Brazil. The introduction of E484K mutants shows temporal correlation with later increases in new cases in our state. Importantly, since it has been associated with immune evasion and enhanced interaction with hACE-2, lineages containing this substitution must be the subject of intense surveillance. Our date demonstrates multiple introductions of the most prevalent lineages (B.1.1.33 and B.1.1.248) and the major role of community transmission in viral spreading and the establishment of Brazilian lineages. This represents an important contribution to the epidemiology of SARS-CoV-2.
One contribution of 18 to a Discussion Meeting Issue 'Next-generation molecular and evolutionary epidemiology of infectious disease'. Bayesian phylogeographic methods simultaneously integrate geographical and evolutionary modelling, and have demonstrated value in assessing spatial spread patterns of measurably evolving organisms. We improve on existing phylogeographic methods by combining information from multiple phylogeographic datasets in a hierarchical setting. Consider N exchangeable datasets or strata consisting of viral sequences and locations, each evolving along its own phylogenetic tree and according to a conditionally independent geographical process. At the hierarchical level, a random graph summarizes the overall dispersion process by informing which migration rates between sampling locations are likely to be relevant in the strata. This approach provides an efficient and improved framework for analysing inherently hierarchical datasets. We first examine the evolutionary history of multiple serotypes of dengue virus in the Americas to showcase our method. Additionally, we explore an application to intrahost HIV evolution across multiple patients.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.