2017
DOI: 10.1093/gbe/evx109
|View full text |Cite
|
Sign up to set email alerts
|

Further Simulations and Analyses Demonstrate Open Problems of Phylostratigraphy

Abstract: Phylostratigraphy, originally designed for gene age estimation by BLAST-based protein homology searches of sequenced genomes, has been widely used for studying patterns and inferring mechanisms of gene origination and evolution. We previously showed by computer simulation that phylostratigraphy underestimates gene age for a nonnegligible fraction of genes and that the underestimation is severer for genes with certain properties such as fast evolution and short protein sequences. Consequently, many previously r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
47
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 48 publications
(51 citation statements)
references
References 36 publications
4
47
0
Order By: Relevance
“…There is growing interest in the topic of de novo gene birth, but identifying de novo genes is plagued with high rates of both false positives and false negatives (McLysaght and Hurst 2016), with phylostratigraphy tools being particularly controversial due to homology detection biases (Moyers and Zhang 2017). The overlapping viral genes that we study are unlikely either to be nongenes, and must have arisen via de novo gene birth, and so circumvent many of these difficulties.…”
Section: Discussionmentioning
confidence: 99%
“…There is growing interest in the topic of de novo gene birth, but identifying de novo genes is plagued with high rates of both false positives and false negatives (McLysaght and Hurst 2016), with phylostratigraphy tools being particularly controversial due to homology detection biases (Moyers and Zhang 2017). The overlapping viral genes that we study are unlikely either to be nongenes, and must have arisen via de novo gene birth, and so circumvent many of these difficulties.…”
Section: Discussionmentioning
confidence: 99%
“…44 In a series of papers, Moyers and Zhang argued that Basic Local Alignment Search Tool (BLAST), which is used to detect homologs in phylostratigraphy, have complications in detecting remote homologs, and therefore, could be a source of inherent bias. [45][46][47] On the other hand, the developers of phylostratigraphy showed that the age of some genes may be underestimated; however, irrespective of whether these genes were excluded, the patterns of the emergence and evolution of genes remain the same. 48 The above-mentioned arguments reflect the major difficulty in gene-age estimation: it is extremely difficult to detect all the correct homologs for genes in multiple species.…”
Section: Evolutionary Patterns Of Drug Targetsmentioning
confidence: 99%
“…Homology detection bias also cannot explain why trends in ISD or amino acid composition are lineage-specific, nor the absence of correlation with amino acids' evolutionary changeabilities. Additionally, old sequences are expected to be most affected by homology detection bias (Moyers & Zhang 2017), but it is more recent animal domains that drive the ISD result, which is a priori the trend most likely to be driven by homology detection bias. Overall, we find substantial evidence contradicting the suggestion that homology detection bias drives trends.…”
Section: Discussionmentioning
confidence: 99%
“…Gene ages are based on the date of the most basal node in the phylogeny of lineages containing homologs (Domazet-Lošo et al 2007). But when sequences are highly divergent, programs used to detect homologs, such as BLASTp (Altschul et al 1990) are prone to false negatives, and thus underestimate gene age (Elhaik et al 2006;McLysaght & Hurst 2016;Moyers & Zhang 2015;Moyers & Zhang 2017;Wolfe 2004). This is particularly problematic when studying protein properties such as length, evolutionary rate, and degree of conserved structure, because these properties themselves directly impact our ability to detect sequence similarity.…”
Section: Introductionmentioning
confidence: 99%