Evolutionary Model Selection with a Genetic Algorithm: A Case Study Using Stem RNA

Pond, Sergei L. Kosakovsky; Mannino, Frank; Gravenor, Michael B.; Muse, Spencer V.; Frost, Simon D. W.

doi:10.1093/molbev/msl144

Cited by 20 publications

(17 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To formally characterize the similarities in the substitution process across the eight matrices, we computed a neighbor-joining tree on the Markov processes defined by each matrix using the total variation metric (TVM) [31]. Briefly, given a specific evolutionary time scale, TVM computes the distance between the expected distributions of characters generated under the two evolutionary models.…”

Section: Resultsmentioning

confidence: 99%

“…The c-AIC score of a model is defined as c-AIC = 2(− L + ps /( s − p −1)), where L is the log-likelihood score of the model, p is the number of estimated model parameters, and s is the number of independent samples. There are a number of possible ways to estimate the number of independent samples in the alignment [32] and we chose to use the number of alignment columns as an estimate of s. c-AIC performed well in selecting appropriate evolutionary models on biological and simulated alignments of paired RNA sequences [33]. In 2/47 cases, the HIV-W m model with the frequencies from the training set was the best and in 1/47 cases, the HIV-B m model was the best.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

HIV-Specific Probabilistic Models of Protein Evolution

et al. 2007

View full text Add to dashboard Cite

Comparative sequence analyses, including such fundamental bioinformatics techniques as similarity searching, sequence alignment and phylogenetic inference, have become a mainstay for researchers studying type 1 Human Immunodeficiency Virus (HIV-1) genome structure and evolution. Implicit in comparative analyses is an underlying model of evolution, and the chosen model can significantly affect the results. In general, evolutionary models describe the probabilities of replacing one amino acid character with another over a period of time. Most widely used evolutionary models for protein sequences have been derived from curated alignments of hundreds of proteins, usually based on mammalian genomes. It is unclear to what extent these empirical models are generalizable to a very different organism, such as HIV-1–the most extensively sequenced organism in existence. We developed a maximum likelihood model fitting procedure to a collection of HIV-1 alignments sampled from different viral genes, and inferred two empirical substitution models, suitable for describing between-and within-host evolution. Our procedure pools the information from multiple sequence alignments, and provided software implementation can be run efficiently in parallel on a computer cluster. We describe how the inferred substitution models can be used to generate scoring matrices suitable for alignment and similarity searches. Our models had a consistently superior fit relative to the best existing models and to parameter-rich data-driven models when benchmarked on independent HIV-1 alignments, demonstrating evolutionary biases in amino-acid substitution that are unique to HIV, and that are not captured by the existing models. The scoring matrices derived from the models showed a marked difference from common amino-acid scoring matrices. The use of an appropriate evolutionary model recovered a known viral transmission history, whereas a poorly chosen model introduced phylogenetic error. We argue that our model derivation procedure is immediately applicable to other organisms with extensive sequence data available, such as Hepatitis C and Influenza A viruses.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

HIV-Specific Probabilistic Models of Protein Evolution

et al. 2007

View full text Add to dashboard Cite

show abstract

“…Average p-distances (pairwise exclusion of gaps) were calculated using MEGA version 4 (Tamura et al 2007), which was also used to calculate neighbour-joining (NJ) trees (Saitou & Nei 1987). For the NJ tree of the nc sequences, we used mid-point rooting.…”

Section: Methodsmentioning

confidence: 99%

Paraphyly and budding speciation in the hairy snail (Pulmonata, Hygromiidae)

et al. 2014

View full text Add to dashboard Cite

Delimitation of species is often complicated by discordance of morphological and genetic data. This may be caused by the existence of cryptic or polymorphic species. The latter case is particularly true for certain snail species showing an exceptionally high intraspecific genetic diversity. The present investigation deals with the Trochulus hispidus complex, which has a complicated taxonomy. Our analyses of the COI sequence revealed that individuals showing a T. hispidus phenotype are distributed in nine highly differentiated mitochondrial clades (showing p-distances up to 19%). The results of a parallel morphometric investigation did not reveal any differentiation between these clades, although the overall variability is quite high. The phylogenetic analyses based on 12S, 16S and COI sequences show that the T. hispidus complex is paraphyletic with respect to several other morphologically well-defined Trochulus species (T. clandestinus, T. villosus, T. villosulus and T. striolatus) which form well-supported monophyletic groups. The nc marker sequence (5.8S–ITS2–28S) shows only a clear separation of T. o. oreinos and T. o. scheerpeltzi, and a weakly supported separation of T. clandestinus, whereas all other species and the clades of the T. hispidus complex appear within one homogeneous group. The paraphyly of the T. hispidus complex reflects its complicated history, which was probably driven by geographic isolation in different glacial refugia and budding speciation. At our present state of knowledge, it cannot be excluded that several cryptic species are embedded within the T. hispidus complex. However, the lack of morphological differentiation of the T. hispidus mitochondrial clades does not provide any hints in this direction. Thus, we currently do not recommend any taxonomic changes. The results of the current investigation exemplify the limitations of barcoding attempts in highly diverse species such as T. hispidus.

show abstract

“…The sequences were aligned in the BioEdit sequence alignment editor version 7.0.9.0[10] by using the Clustal W Multiple alignment. [11] Phylogenetic trees for HCV which were based on Core/E1 and NS5B sequences and genetic distances were calculated with MEGA software version 4[12] using the Maximum Likelihood model. The sequences of Core/E1 and NS5B of HCV strains in Sri Lanka were deposited in NCBI GenBank under the accession numbers given in Table 1.…”

Section: Methodsmentioning

confidence: 99%

Hepatitis C virus in healthy blood donors in Sri Lanka

Senevirathna

Amuduwage

Weerasingam

et al. 2011

Asian J Transfus Sci

View full text Add to dashboard Cite

Introduction:Hepatitis C virus (HCV) is the etiological agent for the majority of cases of non-A, non-B hepatitis. As a blood-borne virus, HCV is widely recognized as a major causative agent of post-transfusion non-A, non-B hepatitis. The prevalence of HCV and the distribution of HCV genotypes in Sri Lanka in comparison with the rest of Asia are not well known.Materials and Methods:The blood samples collected from healthy blood donors at the National Blood Transfusion Centre of Sri Lanka were screened to determine the prevalence and the genotypes of HCV among blood donors in Sri Lanka.Results:HCV antibodies were found in 53 of 4980 blood donors. However, of the 53 only 8 positive results were confirmed by Reverse Transcription-PCR, which suggests frequent false-positive results or viral clearance. The PCR positive samples were genotyped by DNA sequencing of the Core/E1 regions of HCV genome, and all the HCV viruses belonged to genotype 3, of which 7 were 3a and 1 was 3b.Conclusion:HCV is relatively rare among blood donors in Sri Lanka and only genotype 3 was detected in the studied group.

show abstract

Evolutionary Model Selection with a Genetic Algorithm: A Case Study Using Stem RNA

Cited by 20 publications

References 41 publications

HIV-Specific Probabilistic Models of Protein Evolution

HIV-Specific Probabilistic Models of Protein Evolution

Paraphyly and budding speciation in the hairy snail (Pulmonata, Hygromiidae)

Hepatitis C virus in healthy blood donors in Sri Lanka

Contact Info

Product

Resources

About