2006
DOI: 10.1093/molbev/msl144
|View full text |Cite
|
Sign up to set email alerts
|

Evolutionary Model Selection with a Genetic Algorithm: A Case Study Using Stem RNA

Abstract: The choice of a probabilistic model to describe sequence evolution can and should be justified. Underfitting the data through the use of overly simplistic models may miss out on interesting phenomena and lead to incorrect inferences. Overfitting the data with models that are too complex may ascribe biological meaning to statistical artifacts and result in falsely significant findings. We describe a likelihood-based approach for evolutionary model selection. The procedure employs a genetic algorithm (GA) to qui… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
17
0

Year Published

2007
2007
2021
2021

Publication Types

Select...
9

Relationship

1
8

Authors

Journals

citations
Cited by 20 publications
(17 citation statements)
references
References 41 publications
0
17
0
Order By: Relevance
“…To formally characterize the similarities in the substitution process across the eight matrices, we computed a neighbor-joining tree on the Markov processes defined by each matrix using the total variation metric (TVM) [31]. Briefly, given a specific evolutionary time scale, TVM computes the distance between the expected distributions of characters generated under the two evolutionary models.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…To formally characterize the similarities in the substitution process across the eight matrices, we computed a neighbor-joining tree on the Markov processes defined by each matrix using the total variation metric (TVM) [31]. Briefly, given a specific evolutionary time scale, TVM computes the distance between the expected distributions of characters generated under the two evolutionary models.…”
Section: Resultsmentioning
confidence: 99%
“…The c-AIC score of a model is defined as c-AIC = 2(− L + ps /( s − p −1)), where L is the log-likelihood score of the model, p is the number of estimated model parameters, and s is the number of independent samples. There are a number of possible ways to estimate the number of independent samples in the alignment [32] and we chose to use the number of alignment columns as an estimate of s. c-AIC performed well in selecting appropriate evolutionary models on biological and simulated alignments of paired RNA sequences [33]. In 2/47 cases, the HIV-W m model with the frequencies from the training set was the best and in 1/47 cases, the HIV-B m model was the best.…”
Section: Resultsmentioning
confidence: 99%
“…Average p-distances (pairwise exclusion of gaps) were calculated using MEGA version 4 (Tamura et al 2007), which was also used to calculate neighbour-joining (NJ) trees (Saitou & Nei 1987). For the NJ tree of the nc sequences, we used mid-point rooting.…”
Section: Methodsmentioning
confidence: 99%
“…The sequences were aligned in the BioEdit sequence alignment editor version 7.0.9.0[10] by using the Clustal W Multiple alignment. [11] Phylogenetic trees for HCV which were based on Core/E1 and NS5B sequences and genetic distances were calculated with MEGA software version 4[12] using the Maximum Likelihood model. The sequences of Core/E1 and NS5B of HCV strains in Sri Lanka were deposited in NCBI GenBank under the accession numbers given in Table 1.…”
Section: Methodsmentioning
confidence: 99%