DNA barcoding as a method for species identification is rapidly increasing in popularity. However, there are still relatively few rigorous methodological tests of DNA barcoding. Current distance-based methods are frequently criticized for treating the nearest neighbor as the closest relative via a raw similarity score, lacking an objective set of criteria to delineate taxa, or for being incongruent with classical character-based taxonomy. Here, we propose an artificial intelligence-based approach - inferring species membership via DNA barcoding with back-propagation neural networks (named BP-based species identification) - as a new advance to the spectrum of available methods. We demonstrate the value of this approach with simulated data sets representing different levels of sequence variation under coalescent simulations with various evolutionary models, as well as with two empirical data sets of COI sequences from East Asian ground beetles (Carabidae) and Costa Rican skipper butterflies. With a 630-to 690-bp fragment of the COI gene, we identified 97.50% of 80 unknown sequences of ground beetles, 95.63%, 96.10%, and 100% of 275, 205, and 9 unknown sequences of the neotropical skipper butterfly to their correct species, respectively. Our simulation studies indicate that the success rates of species identification depend on the divergence of sequences, the length of sequences, and the number of reference sequences. Particularly in cases involving incomplete lineage sorting, this new BP-based method appears to be superior to commonly used methods for DNA-based species identification.
Reliable assignment of an unknown query sequence to its correct species remains a methodological problem for the growing field of DNA barcoding. While great advances have been achieved recently, species identification from barcodes can still be unreliable if the relevant biodiversity has been insufficiently sampled. We here propose a new notion of species membership for DNA barcoding-fuzzy membership, based on fuzzy set theory-and illustrate its successful application to four real data sets (bats, fishes, butterflies and flies) with more than 5000 random simulations. Two of the data sets comprise especially dense species/population-level samples. In comparison with current DNA barcoding methods, the newly proposed minimum distance (MD) plus fuzzy set approach, and another computationally simple method, 'best close match', outperform two computationally sophisticated Bayesian and BootstrapNJ methods. The new method proposed here has great power in reducing false-positive species identification compared with other methods when conspecifics of the query are absent from the reference database.
As part of the German Barcode of Life campaign, over 3500 arachnid specimens have been collected and analyzed: ca. 3300 Araneae and 200 Opiliones, belonging to almost 600 species (median: 4 individuals/species). This covers about 60% of the spider fauna and more than 70% of the harvestmen fauna recorded for Germany. The overwhelming majority of species could be readily identified through DNA barcoding: median distances between closest species lay around 9% in spiders and 13% in harvestmen, while in 95% of the cases, intraspecific distances were below 2.5% and 8% respectively, with intraspecific medians at 0.3% and 0.2%. However, almost 20 spider species, most notably in the family Lycosidae, could not be separated through DNA barcoding (although many of them present discrete morphological differences). Conspicuously high interspecific distances were found in even more cases, hinting at cryptic species in some instances. A new program is presented: DiStats calculates the statistics needed to meet DNA barcode release criteria. Furthermore, new generic COI primers useful for a wide range of taxa (also other than arachnids) are introduced.
The climatic fluctuations during the Pleistocene as well as the Holocene warming caused numerous disjunctions of cold-adapted, arctic-alpine, and alpine biota. However, the depths of the genetic splits among the disjunct parts of the species distributions vary considerably. The arctic ranges are usually weakly differentiated, and great similarity with at least some areas in more Southern regions is frequently found. Likewise, major mountain ranges in geographic proximity often share genetically similar populations. However, the genetic constitution of populations from more remote (predominantly Southern) mountain systems is strongly different from all other populations. This suggests recent vicariance events in the two former groups, but long-lasting isolation in the latter group, which apparently is mostly composed of relics of a more distant cold past.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.