Accurate estimates of virus mutation rates are important to understand the evolution of the viruses and to combat them. However, methods of estimation are varied and often complex. Here, we critically review over 40 original studies and establish criteria to facilitate comparative analyses. The mutation rates of 23 viruses are presented as substitutions per nucleotide per cell infection (s/n/c) and corrected for selection bias where necessary, using a new statistical method. The resulting rates range from 10 ؊8 to10 ؊6 s/n/c for DNA viruses and from 10 ؊6 to 10 ؊4 s/n/c for RNA viruses. Similar to what has been shown previously for DNA viruses, there appears to be a negative correlation between mutation rate and genome size among RNA viruses, but this result requires further experimental testing. Contrary to some suggestions, the mutation rate of retroviruses is not lower than that of other RNA viruses. We also show that nucleotide substitutions are on average four times more common than insertions/deletions (indels). Finally, we provide estimates of the mutation rate per nucleotide per strand copying, which tends to be lower than that per cell infection because some viruses undergo several rounds of copying per cell, particularly double-stranded DNA viruses. A regularly updated virus mutation rate data set will be available at www.uv.es/rsanjuan/virmut.The mutation rate is a critical parameter for understanding viral evolution and has important practical implications. For instance, the estimate of the mutation rate of HIV-1 demonstrated that any single mutation conferring drug resistance should occur within a single day and that simultaneous treatment with multiple drugs was therefore necessary (72). Also, in theory, viruses with high mutation rates could be combated by the administration of mutagens (1,5,21,44,53,83). This strategy, called lethal mutagenesis, has proved effective in cell cultures or animal models against several RNA viruses, including enteroviruses (11,39,44), aphtoviruses (83), vesiculoviruses (44), hantaviruses (10), arenaviruses (40), and lentiviruses (15, 53), and appears to at least partly contribute to the effectiveness of the combined ribavirin-interferon treatment against hepatitis C virus (HCV) (13). The viral mutation rate also plays a role in the assessment of possible vaccination strategies (16), and it has been shown to influence the stability of live attenuated polio vaccines (91). Finally, at both the epidemiological and evolutionary levels, the mutation rate is one of the factors that can determine the risk of emergent infectious disease, i.e., pathogens crossing the species barrier (46). Slight changes of the mutation rate can also determine whether or not some virus infections are cleared by the host immune system and can produce dramatic differences in viral fitness and virulence (75,90), clearly stressing the need to have accurate estimates. However, our knowledge of viral mutation rates is somewhat incomplete, partly due to the inherent difficulty of measuring a rare and r...
QSARINS (QSAR-INSUBRIA) is a new software for the development and validation of multiple linear regression (MLR) Quantitative Structure-Activity Relationship (QSAR) models by Ordinary Least Squares (OLS) method and Genetic Algorithm (GA) for variable selection. This program is mainly focused on the external validation of QSAR models. Various tools for explorative analysis of the datasets by Principal Component Analysis, pre-reduction of input molecular descriptors, splitting of datasets in training and prediction sets, detection of outliers and interpolated or extrapolated predictions, internal and external validation by different parameters, consensus modeling and various plots for visualizations are implemented. QSARINS is a user-friendly platform for QSAR modeling in agreement with the OECD Principles and for the analysis of the reliability of the obtained predicted data. The Insubria PBT Index model for the prediction of the cumulative behaviour of new chemicals as Persistent Bioaccumulative and Toxics (PBTs) is implemented. Additionally, QSARINS allows the user to validate single models, pre-developed using also different software
The main utility of QSAR models is their ability to predict activities/properties for new chemicals, and this external prediction ability is evaluated by means of various validation criteria. As a measure for such evaluation the OECD guidelines have proposed the predictive squared correlation coefficient Q(2)(F1) (Shi et al.). However, other validation criteria have been proposed by other authors: the Golbraikh-Tropsha method, r(2)(m) (Roy), Q(2)(F2) (Schüürmann et al.), Q(2)(F3) (Consonni et al.). In QSAR studies these measures are usually in accordance, though this is not always the case, thus doubts can arise when contradictory results are obtained. It is likely that none of the aforementioned criteria is the best in every situation, so a comparative study using simulated data sets is proposed here, using threshold values suggested by the proponents or those widely used in QSAR modeling. In addition, a different and simple external validation measure, the concordance correlation coefficient (CCC), is proposed and compared with other criteria. Huge data sets were used to study the general behavior of validation measures, and the concordance correlation coefficient was shown to be the most restrictive. On using simulated data sets of a more realistic size, it was found that CCC was broadly in agreement, about 96% of the time, with other validation measures in accepting models as predictive, and in almost all the examples it was the most precautionary. The proposed concordance correlation coefficient also works well on real data sets, where it seems to be more stable, and helps in making decisions when the validation measures are in conflict. Since it is conceptually simple, and given its stability and restrictiveness, we propose the concordance correlation coefficient as a complementary, or alternative, more prudent measure of a QSAR model to be externally predictive.
The evaluation of regression QSAR model performance, in fitting, robustness, and external prediction, is of pivotal importance. Over the past decade, different external validation parameters have been proposed: Q(F1)(2), Q(F2)(2), Q(F3)(2), r(m)(2), and the Golbraikh-Tropsha method. Recently, the concordance correlation coefficient (CCC, Lin), which simply verifies how small the differences are between experimental data and external data set predictions, independently of their range, was proposed by our group as an external validation parameter for use in QSAR studies. In our preliminary work, we demonstrated with thousands of simulated models that CCC is in good agreement with the compared validation criteria (except r(m)(2)) using the cutoff values normally applied for the acceptance of QSAR models as externally predictive. In this new work, we have studied and compared the general trends of the various criteria relative to different possible biases (scale and location shifts) in external data distributions, using a wide range of different simulated scenarios. This study, further supported by visual inspection of experimental vs predicted data scatter plots, has highlighted problems related to some criteria. Indeed, if based on the cutoff suggested by the proponent, r(m)(2) could also accept not predictive models in two of the possible biases (location, location plus scale), while in the case of scale shift bias, it appears to be the most restrictive. Moreover, Q(F1)(2) and Q(F2)(2) showed some problems in one of the possible biases (scale shift). This analysis allowed us to also propose recalibrated, and intercomparable for the same data scatter, new thresholds for each criterion in defining a QSAR model as really externally predictive in a more precautionary approach. An analysis of the results revealed that the scatter plot of experimental vs predicted external data must always be evaluated to support the statistical criteria values: in some cases high statistical parameter values could hide models with unacceptable predictions.
A database of environmentally hazardous chemicals, collected and modeled by QSAR by the Insubria group, is included in the updated version of QSARINS, software recently proposed for the development and validation of QSAR models by the genetic algorithm-ordinary least squares method. In this version, a module, named QSARINS-Chem, includes several datasets of chemical structures and their corresponding endpoints (physicochemical properties and biological activities). The chemicals are accessible in different ways (CAS, SMILES, names and so forth) and their three-dimensional structure can be visualized. Some of the QSAR models, previously published by our group, have been redeveloped using the free online software for molecular descriptor calculation, PaDEL-Descriptor. The new models can be easily applied for future predictions on chemicals without experimental data, also verifying the applicability domain to new chemicals. The QSAR model reporting format (QMRF) of these models is also here downloadable. Additional chemometric analyses can be done by principal component analysis and multicriteria decision making for screening and ranking chemicals to prioritize the most dangerous.
The genomes of most virus species have overlapping genes—two or more proteins coded for by the same nucleotide sequence. Several explanations have been proposed for the evolution of this phenomenon, and we test these by comparing the amount of gene overlap in all known virus species. We conclude that gene overlap is unlikely to have evolved as a way of compressing the genome in response to the harmful effect of mutation because RNA viruses, despite having generally higher mutation rates, have less gene overlap on average than DNA viruses of comparable genome length. However, we do find a negative relationship between overlap proportion and genome length among viruses with icosahedral capsids, but not among those with other capsid types that we consider easier to enlarge in size. Our interpretation is that a physical constraint on genome length by the capsid has led to gene overlap evolving as a mechanism for producing more proteins from the same genome length. We consider that these patterns cannot be explained by other factors, namely the possible roles of overlap in transcription regulation, generating more divergent proteins and the relationship between gene length and genome length.
Overlapping genes represent a fascinating evolutionary puzzle, since they encode two functionally unrelated proteins from the same DNA sequence. They originate by a mechanism of overprinting, in which point mutations in an existing frame allow the expression (the "birth") of a completely new protein from a second frame. In viruses, in which overlapping genes are abundant, these new proteins often play a critical role in infection, yet they are frequently overlooked during genome annotation. This results in erroneous interpretation of mutational studies and in a significant waste of resources. Therefore, overlapping genes need to be correctly detected, especially since they are now thought to be abundant also in eukaryotes. Developing better detection methods and conducting systematic evolutionary studies require a large, reliable benchmark dataset of known cases. We thus assembled a high-quality dataset of 80 viral overlapping genes whose expression is experimentally proven. Many of them were not present in databases. We found that overall, overlapping genes differ significantly from non-overlapping genes in their nucleotide and amino acid composition. In particular, the proteins they encode are enriched in high-degeneracy amino acids and depleted in low-degeneracy ones, which may alleviate the evolutionary constraints acting on overlapping genes. Principal component analysis revealed that the vast majority of overlapping genes follow a similar composition bias, despite their heterogeneity in length and function. Six proven mammalian overlapping genes also followed this bias. We propose that this apparently near-universal composition bias may either favour the birth of overlapping genes, or/and result from selection pressure acting on them.
Cyanobacteria blooms are a worldwide concern for water bodies and may be promoted by eutrophication and climate change. The prediction of cyanobacterial blooms and identification of the main triggering factors are of paramount importance for water management. In this study, we analyzed a comprehensive dataset including ten-years measurements collected at Lake Varese, an eutrophic lake in Northern Italy. Microscopic analysis of the water samples was performed to characterize the community distribution and dynamics along the years. We observed that cyanobacteria represented a significant fraction of the phytoplankton community, up to 60% as biovolume, and a shift in the phytoplankton community distribution towards cyanobacteria dominance onwards 2010 was detected. The relationships between cyanobacteria biovolume, nutrients, and environmental parameters were investigated through simple and multiple linear regressions. We found that 14-days average air temperature together with total phosphorus may only partly explain the cyanobacteria biovolume variance at Lake Varese. However, weather forecasts can be used to predict an algal outbreak two weeks in advance and, eventually, to adopt management actions. The prediction of cyanobacteria algal blooms remains challenging and more frequent samplings, combined with the microscopy analysis and the metagenomics technique, would allow a more conclusive analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.