Background: Genomic GC content varies both within and, substantially, between microbial genomes. While some of this variation can be explained by evolutionary divergence and environmental factors, a notable portion is not understood. To investigate further, we explore a non-linear mathematical model (gcMOD) of singlenucleotide polymorphism (SNP) GC content (sbGC, the GC content of substituted bases) as a function of core genome GC content (cgGC). We estimate the model's parameters using Bayesian inference on empirical genetic data from the microbial core genomes of 35 bacterial species, each of which contains at least 10 representative strains. We utilize 716 bacterial genomes in total. We also explore some possible implications that result from the mathematical properties of gcMOD. Results: We find that the median GC → AT substitution rates (β) are almost always considerably higher than the corresponding AT → GC substitution rates (α) for all 35 core genomes. The distribution of β is also noticeably more concentrated (i.e. thinner) than the corresponding distribution of α for almost all species, excepting the bacteria with the most GC-rich genomes. We also demonstrate that at the singularity point of gcMOD (where α = β), the model is reduced to a linear equation. By analyzing the linear model, we show that due to the constraints on gcMOD, the mutation rates can have profound influence on both cgGC as well as sbGC. Moreover, by examining the mathematical properties of gcMOD's inverse function, we find that change in cgGC, and hence in genomic GC content, can potentially occur quite rapidly. Conclusions: Examining the distributions of the GC → AT and AT → GC substitution rates for 35 bacterial species, we demonstrate that the former (β) are remarkably similar for all species examined. In addition, GC → AT substitution rate distributions were considerably more concentrated for all species, with the mode consistently peaking at higher rates than for AT → GC substitution rates.
Earlier research has suggested that approximate Bayesian computation (ABC) makes it possible to fit simulator-based intractable birth-death models to investigate communicable disease outbreak dynamics with accuracy comparable to that of exact Bayesian methods. However, recent findings have indicated that key parameters such as the reproductive number R may remain poorly identifiable with these models. Here we show that the identifiability issue can be resolved by taking into account disease-specific characteristics of the transmission process in closer detail. Using tuberculosis (TB) in the San Francisco Bay area as a case-study, we consider a model that generates genotype data from a mixture of three stochastic processes, each with their distinct dynamics and clear epidemiological interpretation. We show that our model allows for accurate posterior inferences about outbreak dynamics from aggregated annual case data with genotype information. As a by-product of the inference, the model provides an estimate of the infectious population size at the time the data was collected. The acquired estimate is approximately two orders of magnitude smaller compared to the assumptions made in the earlier related studies, and much better aligned with epidemiological knowledge about active TB prevalence. Similarly, the reproductive number R related to the primary underlying transmission process is estimated to be nearly three-fold compared with the previous estimates, which has a substantial impact on the interpretation of the fitted outbreak model.
Earlier research has suggested that approximate Bayesian computation (ABC) makes it possible to fit simulator-based intractable birth-death models to investigate communicable disease outbreak dynamics with accuracy comparable to that of exact Bayesian methods. However, recent findings have indicated that key parameters, such as the reproductive number , may remain poorly identifiable with these models. Here we show R that this identifiability issue can be resolved by taking into account disease-specific characteristics of the transmission process in closer detail. Using tuberculosis (TB) in the San Francisco Bay area as a case study, we consider a model that generates genotype data from a mixture of three stochastic processes, each with its own distinct dynamics and clear epidemiological interpretation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.