The arthropods are the most speciose, and among the most morphologically diverse, of the animal phyla. Their evolution has been the subject of intense research for well over a century, yet the relationships among the four extant arthropod subphyla - chelicerates, crustaceans, hexapods, and myriapods - are still not fully resolved. Morphological taxonomies have often placed hexapods and myriapods together (the Atelocerata) [1, 2], but recent molecular studies have generally supported a hexapod/crustacean clade [2-9]. A cluster of regulatory genes, the Hox genes, control segment identity in arthropods, and comparisons of the sequences and functions of Hox genes can reveal evolutionary relationships [10]. We used Hox gene sequences from a range of arthropod taxa, including new data from a basal hexapod and a myriapod, to estimate a phylogeny of the arthropods. Our data support the hypothesis that insects and crustaceans form a single clade within the arthropods to the exclusion of myriapods. They also suggest that myriapods are more closely allied to the chelicerates than to this insect/crustacean clade.
Identifying units of biological diversity is a major goal of organismal biology. An increasing literature has focused on the importance of cryptic diversity, defined as the presence of deeply diverged lineages within a single species. While most discoveries of cryptic lineages proceed on a taxon-by-taxon basis, rapid assessments of biodiversity are needed to inform conservation policy and decision-making. Here, we introduce a predictive framework for phylogeography that allows rapidly identifying cryptic diversity. Our approach proceeds by collecting environmental, taxonomic and genetic data from codistributed taxa with known phylogeographic histories. We define these taxa as a reference set, and categorize them as either harbouring or lacking cryptic diversity. We then build a random forest classifier that allows us to predict which other taxa endemic to the same biome are likely to contain cryptic diversity. We apply this framework to data from two sets of disjunct ecosystems known to harbour taxa with cryptic diversity: the mesic temperate forests of the Pacific Northwest of North America and the arid lands of Southwestern North America. The predictive approach presented here is accurate, with prediction accuracies placed between 65% and 98.79% depending of the ecosystem. This seems to indicate that our method can be successfully used to address ecosystemlevel questions about cryptic diversity. Further, our application for the prediction of the cryptic/non-cryptic nature of unknown species is easily applicable and provides results that agree with recent discoveries from those systems. Our results demonstrate that the transition of phylogeography from a descriptive to a predictive discipline is possible and effective.
Most approaches to species delimitation to date have considered divergence‐only models. Although these models are appropriate for allopatric speciation, their failure to incorporate many of the population‐level processes that drive speciation, such as gene flow (e.g., in sympatric speciation), places an unnecessary limit on our collective understanding of the processes that produce biodiversity. To consider these processes while inferring species boundaries, we introduce the R‐package delimitR and apply it to identify species boundaries in the reticulate taildropper slug (Prophysaon andersoni). Results suggest that secondary contact is an important mechanism driving speciation in this system. By considering process, we both avoid erroneous inferences that can be made when population‐level processes such as secondary contact drive speciation but only divergence is considered, and gain insight into the process of speciation in terrestrial slugs. Further, we apply delimitR to three published empirical datasets and find results corroborating previous findings. Finally, we evaluate the performance of delimitR using simulation studies, and find that error rates are near zero when comparing models that include lineage divergence and gene flow for three populations with a modest number of Single Nucleotide Polymorphisms (SNPs; 1500) and moderate divergence times (<100,000 generations). When we apply delimitR to a complex model set (i.e., including divergence, gene flow, and population size changes), error rates are moderate (∼0.15; 10,000 SNPs), and, when present, misclassifications occur among highly similar models.
Phylogeographic data sets have grown from tens to thousands of loci in recent years, but extant statistical methods do not take full advantage of these large data sets. For example, approximate Bayesian computation (ABC) is a commonly used method for the explicit comparison of alternate demographic histories, but it is limited by the "curse of dimensionality" and issues related to the simulation and summarization of data when applied to next-generation sequencing (NGS) data sets. We implement here several improvements to overcome these difficulties. We use a Random Forest (RF) classifier for model selection to circumvent the curse of dimensionality and apply a binned representation of the multidimensional site frequency spectrum (mSFS) to address issues related to the simulation and summarization of large SNP data sets. We evaluate the performance of these improvements using simulation and find low overall error rates (~7%). We then apply the approach to data from Haplotrema vancouverense, a land snail endemic to the Pacific Northwest of North America. Fifteen demographic models were compared, and our results support a model of recent dispersal from coastal to inland rainforests. Our results demonstrate that binning is an effective strategy for the construction of a mSFS and imply that the statistical power of RF when applied to demographic model selection is at least comparable to traditional ABC algorithms. Importantly, by combining these strategies, large sets of models with differing numbers of populations can be evaluated.
Many recent phylogenetic methods have focused on accurately inferring species trees when there is gene tree discordance due to incomplete lineage sorting (ILS). For almost all of these methods, and for phylogenetic methods in general, the data for each locus is assumed to consist of orthologous, single-copy sequences. Loci that are present in more than a single copy in any of the studied genomes are excluded from the data. These steps greatly reduce the number of loci available for analysis. The question we seek to answer in this study is: What happens if one runs such species tree inference methods on data where paralogy is present, in addition to or without ILS being present? Through simulation studies and analyses of two large biological data sets, we show that running such methods on data with paralogs can still provide accurate results. We use multiple different methods, some of which are based directly on the multispecies coalescent (MSC) model, and some of which have been proven to be statistically consistent under it. We also treat the paralogous loci in multiple ways: from explicitly denoting them as paralogs, to randomly selecting one copy per species. In all cases the inferred species trees are as accurate as equivalent analyses using single-copy orthologs. Our results have significant implications for the use of ILS-aware phylogenomic analyses, demonstrating that they do not have to be restricted to single-copy loci. This will greatly increase the amount of data that can be used for phylogenetic inference.
Received ...; reviews returned ...; accepted ...Abstract.---The multispecies coalescent (MSC) has emerged as a powerful and desirable framework for species tree inference in phylogenomic studies. Under this framework, the data for each locus is assumed to consist of orthologous, single-copy genes, and heterogeneity across loci is assumed to be due to incomplete lineage sorting (ILS). These assumptions have led biologists that use ILS-aware inference methods, whether based directly on the MSC or proven to be statistically consistent under it (collectively referred to here as MSC-based methods), to exclude all loci that are present in more than a single copy in any of the studied genomes. Furthermore, such analyses entail orthology assignment to avoid the potential of hidden paralogy in the data. The question we seek to answer in this study is: What happens if one runs such species tree inference methods on data where paralogy is present, in addition to or without ILS being present? Through simulation studies and analyses of two biological data sets, we show that running such methods on data with paralogs provide very accurate results, either by treating all gene copies within a family as alleles from multiple individuals or by randomly selecting one copy per species. Our results have significant implications for the use of MSC-based phylogenomic analyses, demonstrating that they do not have to be restricted to single-copy loci, thus greatly increasing the amount of data that can be used. [Multispecies coalescent; incomplete lineage sorting; gene duplication and loss; orthology; paralogy.]
Ethiopia is a world biodiversity hotspot and harbours levels of biotic endemism unmatched in the Horn of Africa, largely due to topographic—and thus habitat—complexity, which results from a very active geological and climatic history. Among Ethiopian vertebrate fauna, amphibians harbour the highest levels of endemism, making amphibians a compelling system for the exploration of the impacts of Ethiopia's complex abiotic history on biotic diversification. Grass frogs of the genus Ptychadena are notably diverse in Ethiopia, where they have undergone an evolutionary radiation. We used molecular data and expanded taxon sampling to test for cryptic diversity and to explore diversification patterns in both the highland radiation and two widespread lowland Ptychadena. Species delimitation results support the presence of nine highland species and four lowland species in our dataset, and divergence dating suggests that both geologic events and climatic fluctuations played a complex and confounded role in the diversification of Ptychadena in Ethiopia. We rectify the taxonomy of the endemic P. neumanni species complex, elevating one formally synonymized name and describing three novel taxa. Finally, we describe two novel lowland Ptychadena species that occur in Ethiopia and may be more broadly distributed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.