2023
DOI: 10.1093/molbev/msad150
|View full text |Cite
|
Sign up to set email alerts
|

Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses

Abstract: Inference and interpretation of evolutionary processes, in particular of the types and targets of natural selection affecting coding sequences, are critically influenced by the assumptions built into statistical models and tests. If certain aspects of the substitution process (even when they are not of direct interest) are presumed absent or are modeled with too crude of a simplification, estimates of key model parameters can become biased, often systematically, and lead to poor statistical performance. Previo… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 16 publications
(7 citation statements)
references
References 69 publications
0
7
0
Order By: Relevance
“…The ABSREL and BUSTED models are variants of the branch-sites random effect likelihood (BSREL) model of coding sequence evolution (Pond et al, 2011a; Smith et al, 2015), which allows for variation in d N / d S (ω) rates across lineages and sites; however, while ABSREL selects the optimal number of rate categories for each gene, BUSTED imposes three rate categories such that ω 1 ≤ω 2 ≤ω 3 ( Figure 1B ). Both models also accommodate synonymous rate variation across sites [S] (Pond and Muse, 2005; Wisotsky et al, 2020), double and triple multi-nucleotide mutations per codon [MH] (Lucaci et al, 2021), and both synonymous rate variation and multi-nucleotide mutations [SMH]; failure to account for synonymous rate variation (Dunn et al, 2019; Rahman et al, 2021; Wisotsky et al, 2020) and multi-nucleotide mutations can lead to widespread false positive inferences of positive selection (Dunn et al, 2019; Lucaci et al, 2023a; Nozawa et al, 2009; Suzuki, 2008; Venkat et al, 2018; Yang and Reis, 2011). We used small sample corrected Akaike Information Criterion values (ΔAICc≥10) to select for the best [S], [MH], and [SMH] model and inferred positive selection for a gene when the best fitting model included a class of sites with ω>1 and a likelihood ratio test P ≤0.05.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The ABSREL and BUSTED models are variants of the branch-sites random effect likelihood (BSREL) model of coding sequence evolution (Pond et al, 2011a; Smith et al, 2015), which allows for variation in d N / d S (ω) rates across lineages and sites; however, while ABSREL selects the optimal number of rate categories for each gene, BUSTED imposes three rate categories such that ω 1 ≤ω 2 ≤ω 3 ( Figure 1B ). Both models also accommodate synonymous rate variation across sites [S] (Pond and Muse, 2005; Wisotsky et al, 2020), double and triple multi-nucleotide mutations per codon [MH] (Lucaci et al, 2021), and both synonymous rate variation and multi-nucleotide mutations [SMH]; failure to account for synonymous rate variation (Dunn et al, 2019; Rahman et al, 2021; Wisotsky et al, 2020) and multi-nucleotide mutations can lead to widespread false positive inferences of positive selection (Dunn et al, 2019; Lucaci et al, 2023a; Nozawa et al, 2009; Suzuki, 2008; Venkat et al, 2018; Yang and Reis, 2011). We used small sample corrected Akaike Information Criterion values (ΔAICc≥10) to select for the best [S], [MH], and [SMH] model and inferred positive selection for a gene when the best fitting model included a class of sites with ω>1 and a likelihood ratio test P ≤0.05.…”
Section: Resultsmentioning
confidence: 99%
“…Goodman et al (2009), for example, used the free ratio model in PAML (Yang, 2007) to identify positively selected genes in the elephant lineage using a dataset of 11 species and 5,714 protein-coding genes; they identified 67 genes with d N / d S ≥1.01, which were enriched in mitochondrial functions but none of the GO terms we identified ( Supplementary dataset 1 ). There are several important differences between Goodman et al (2009) and our study: 1) The free ratio model does not test if the d N / d S ratio in a particular lineage is significantly different than one, rather it tests if d N / d S rate variation is different across lineages (Yang, 1998); 2) Unlike ABSREL and BUSTED, the free ratio model does not allow for rate variation across sites, d N / d S >1 is only inferred if the average d N / d S across all sites is greater than 1; and 3) The method implemented in PAML does not account for synonymous rate variation across sites or multiple simultaneous codon substitutions, which can bias inferences of both d S (Dunn et al, 2019; Rahman et al, 2021; Wisotsky et al, 2020) and d N (Dunn et al, 2019; Lucaci et al, 2023a; Nozawa et al, 2009; Suzuki, 2008; Venkat et al, 2018; Yang and Reis, 2011). While we tested 81.6% (4,663) of the genes tested by Goodman et al (2009), only one ( CD3G ) was inferred to have been positively selected in our analyses; therefore, it is unsurprising that there was little overlap in our GO enrichment results.…”
Section: Discussionmentioning
confidence: 99%
“…This indicates that this change possibly had an adaptive character in the ancestral background. It is important to note, however, that models used for the analysis do not account for variation of synonymous substitution rate and multinucleotide substitution events, which in some cases might lead to false positive results (Lucaci et al, 2023). Because of that, the alternative hypothesis that substitutions Q66H and G109D occurred in the common ancestor of Erwiniaceae due to genetic drift, enabling the subsequent loss of ibpB gene, cannot be fully discounted.…”
Section: Discussionmentioning
confidence: 99%
“…Genomic instability is a general characteristic of viruses, particularly in single-strand RNA viruses [8,9]. Having short replication cycles, recombination, error-prone replication, and strong selection drive genome instability in RNA viruses [10].…”
Section: Introductionmentioning
confidence: 99%