Summary1. Species identification via DNA barcodes has recently become an important and routine task in many biodiversity projects using DNA sequence data. 2. Here, we present BarcodingR, an integrated software package that provides a comprehensive implementation of species identification methods, including artificial intelligence, fuzzy-set, Bayesian and kmer-based methods, that are not readily available in other packages. 3. BarcodingR additionally provides new functions for barcode evaluation, barcoding gap analysis, delimitation comparison analysis, species membership analysis and consensus identification. 4. Comparison with other barcoding methods using 11 empirical data sets indicates that on average, FZKMER (implemented in BarcodingR) and one extant barcoding method BRONX outperform all other methods examined in this study. Two other methods, BP and FZ (both implemented in BarcodingR), present similar performance as SVM and BLOG, respectively, and all display better performance than that of Jrip. 5. The software of BarcodingR is open source under GNU General Public License and freely available for all major operating systems.
Integrative taxonomy is central to modern taxonomy and systematic biology, including behaviour, niche preference, distribution, morphological analysis and DNA barcoding. However, decades of use demonstrate that these methods can face challenges when used in isolation, for instance, potential misidentifications due to phenotypic plasticity for morphological methods, and incorrect identifications because of introgression, incomplete lineage sorting and horizontal gene transfer for DNA barcoding. Although researchers have advocated the use of integrative taxonomy, few detailed algorithms have been proposed. Here, we develop a convolutional neural network method (morphology-molecule network (MMNet)) that integrates morphological and molecular data for species identification. The newly proposed method (MMNet) worked better than four currently-available alternative methods when tested with 10 independent datasets representing varying genetic diversity from different taxa. High accuracies were achieved for all groups, including beetles (98.1% of 123 species), butterflies (98.8% of 24 species), fishes (96.3% of 214 species) and moths (96.4% of 150 total species). Further, MMNet demonstrated a high degree of accuracy (>98%) in four datasets including closely related species from the same genus. The average accuracy of two modest sub-genomic (single nucleotide polymorphism) datasets, comprising eight putative subspecies respectively, is 90%. Additional tests show that the success rate of species identification under this method most strongly depends on the amount of training data, and is robust to sequence length and image size. Analyses on the contribution of different data types (image versus gene) indicate that both morphological and genetic data are important to the model, and that genetic data contribute slightly more. The approaches developed here serve as a foundation for the future integration of multi-modal information for integrative taxonomy, such as image, audio, video, 3D scanning and biosensor data, to characterize organisms more comprehensively as a basis for improved investigation, monitoring and conservation of biodiversity.
Over the past 16 years, more than half (59.68%) of research papers in China on DNA barcoding have been published in Chinese rather than English. Using the records in the BOLD (Barcode of Life Data) system, we found Chinese scientists have contributed nearly 120,000 DNA barcodes for more than 16,000 species as of September 2019, with barcoded species distributed throughout China. Based on 2,624 articles and 494 dissertations published during the last 16 years, we reviewed the basic statistics of these studies as well as the type of articles contributed by Chinese scientists, the preference of taxonomic groups, the characteristic of barcoding studies in China, the current limitations, and potential future directions as well. We found that most barcode data pertain primarily to plants and animals. Most work in China has focused on verification of the authenticity of species used in traditional Chinese medicine, while other applications have paid more attention to food safety, inspection and quarantine, and the control of pests and invasive species. In methodology and technology, a number of new DNA barcoding methods have been developed by Chinese scientists. However, there are several significant limitations to research into DNA barcoding in China in general, such as the lack of leadership in pioneering international projects, the absence of an open bioinformatics infrastructure, and the fact that some Chinese journals do not clearly require data transparency and availability for DNA barcodes, impeding the further development of barcode libraries and research in China. In the future, Chinese scientists should build authoritative online libraries, while aiming for theoretical innovations for both concepts and methodology of DNA barcoding.
Understanding diversity patterns requires accounting for the roles of both historical and contemporary factors in the assembly of communities. Here, we compared diversity patterns of two moth assemblages sampled from Taihang and Yanshan mountains in Northern China and performed ancestral range reconstructions using the Multi‐State Speciation and Extinction model, to track the origins of these patterns. Further, we estimated diversification rates of the two moth assemblages and explored the effects of contemporary ecological factors. From 7,788 specimens we identified 835 species belonging to 23 families, using both DNA barcode analysis and morphology. Moths in Yanshan mountains showed higher species diversity than in Taihang mountains. Ancestral range analysis indicated Yanshan as the origin, with significant historical dispersals from Yanshan to Taihang. Asymmetrical diversification, population expansion, along with frequent and considerable gene flow were detected between communities. Moreover, dispersal limitation or the joint effect of environment filtering and dispersal limitation were inferred as main driving forces shaping current diversity patterns. In summary, we demonstrate that a multiscale (community, population and species level) analysis incorporating both historical and contemporary factors can be useful in delineating factors contributing to community assembly and patterning in diversity.
Species identification through DNA barcoding or metabarcoding has become a key approach for biodiversity evaluation and ecological studies. However, the rapid accumulation of barcoding data has created some difficulties: for instance, global enquiries to a large reference library can take a very long time. We here devise a two-step searching strategy to speed identification procedures of such queries. This firstly uses a Hidden Markov Model (HMM) algorithm to narrow the searching scope to genus level and then determines the corresponding species using minimum genetic distance. Moreover, using a fuzzy membership function, our approach also estimates the credibility of assignment results for each query. To perform this task, we developed a new software pipeline, FuzzyID2, using Python and C++. Performance of the new method was assessed using eight empirical data sets ranging from 70 to 234,535 barcodes. Five data sets (four animal, one plant) deployed the conventional barcode approach, one used metabarcodes, and two were eDNA-based. The results showed mean accuracies of generic and species identification of 98.60% (with a minimum of 95.00% and a maximum of 100.00%) and 94.17% (with a range of 84.40%-100.00%), respectively. Tests with simulated NGS sequences based on realistic eDNA and metabarcode data demonstrated that FuzzyID2 achieved a significantly higher identification success rate than the commonly used Blast method, and the TIPP method tends to find many fewer species than either FuzztID2 or Blast. Furthermore, data sets with tens of thousands of barcodes need only a few seconds for each query assignment using FuzzyID2. Our approach provides an efficient and accurate species identification protocol for biodiversity-related projects with large DNA sequence data sets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.