Whole-genome duplication (WGD) occurs broadly and repeatedly across the history of eukaryotes and is recognized as a prominent evolutionary force, especially in plants. Immediately following WGD, most genes are present in two copies as paralogs. Due to this redundancy, one copy of a paralog pair commonly undergoes pseudogenization and is eventually lost. When speciation occurs shortly after WGD; however, differential loss of paralogs may lead to spurious phylogenetic inference resulting from the inclusion of pseudoorthologs–paralogous genes mistakenly identified as orthologs because they are present in single copies within each sampled species. The influence and impact of including pseudoorthologs versus true orthologs as a result of gene extinction (or incomplete laboratory sampling) are only recently gaining empirical attention in the phylogenomics community. Moreover, few studies have yet to investigate this phenomenon in an explicit coalescent framework. Here, using mathematical models, numerous simulated data sets, and two newly assembled empirical data sets, we assess the effect of pseudoorthologs on species tree estimation under varying degrees of incomplete lineage sorting (ILS) and differential gene loss scenarios following WGD. When gene loss occurs along the terminal branches of the species tree, alignment-based (BPP) and gene-tree-based (ASTRAL, MP-EST, and STAR) coalescent methods are adversely affected as the degree of ILS increases. This can be greatly improved by sampling a sufficiently large number of genes. Under the same circumstances, however, concatenation methods consistently estimate incorrect species trees as the number of genes increases. Additionally, pseudoorthologs can greatly mislead species tree inference when gene loss occurs along the internal branches of the species tree. Here, both coalescent and concatenation methods yield inconsistent results. These results underscore the importance of understanding the influence of pseudoorthologs in the phylogenomics era. [Coalescent method; concatenation method; incomplete lineage sorting; pseudoorthologs; single-copy gene; whole-genome duplication.]
We analyzed the complete mitochondrial genome of the recently discovered Xinyuan honey bee, Apis mellifera sinisxinyuan using single molecule real-time sequencing. The mitochondrial genome of A. m. sinisxinyuan is a circular molecule of 16,886 bp, comprising 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes and a control region rich in A þ T. Phylogenetic analysis using 13 protein-coding genes supports a close relationship to another M-lineage honey bee, A. m. mellifera.
A consensus species tree is reconstructed from 11 gene trees for human, bat, and pangolin beta coronaviruses from samples taken early in the pandemic (prior to April 1, 2020). Using coalescent theory, the shallow (short branches relative to the hosts) consensus species tree provides evidence of recent gene flow events between bat and pangolin beta coronaviruses predating the zoonotic transfer to humans. The consensus species tree was also used to reconstruct the ancestral sequence of human SARS-CoV-2, which was 2 nucleotides different from the Wuhan sequence. The time to most recent common ancestor was estimated to be Dec 8, 2019 with a bat origin. Some human, bat, and pangolin coronavirus lineages found in China are phylogenetically distinct, a rare example of a class II phylogeography pattern (Avise et al. in Ann Rev Eco Syst 18:489–422, 1987). The consensus species tree is a product of evolutionary factors, providing evidence of repeated zoonotic transfers between bat and pangolin as a reservoir for future zoonotic transfers to humans.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.