Here we present evidence, based on 10 datasets comprising 5283 sequences for 200 genera, that the use of the Kimura‐2‐parameter (K2P) model in DNA‐barcoding studies is poorly justified. We demonstrate that K2P is neither expected nor confirmed to be an appropriate model for closely related COI sequences. In addition, we show that the use of uncorrected distances yields higher or similar identification success rates for neighbour‐joining trees and distance‐based identification techniques. K2P also does not widen the barcoding gap for closely related sequences. We conclude that the spread of K2P through the barcoding literature is difficult to explain, and urge the use of evidence‐based approaches to DNA barcoding. © The Willi Hennig Society 2011.
Several of the biggest challenges in taxonomy and systematics are related to a toxic mixture of small size, abundance, and rarity. There are too many species in groups with too few taxonomists and many of these species are very rare and hard to find because they are hidden in mass samples. To make matters worse, these species often have life‐history stages that are morphologically so different that it is difficult to identify them as semaphoronts of the same species. We demonstrate that these biodiversity challenges can be addressed with cost‐effective molecular markers. Here, we describe a next‐generation‐sequencing protocol that can yield barcodes at a chemical cost of < 0.40 USD per specimen. We use this protocol to generate molecular markers for 1015 specimens of tropical midges (Diptera: Chironomidae). The barcodes cluster into 52–61 molecular operational taxonomic units (OTUs) depending on whether Objective Clustering (OC), Generalized Mixed Yule Coalescent (GMYC), or Poisson Tree Process (PTP) is used. More than half of the putative species are rare (< 10 specimens) and we are able to match larvae and adults for 24 of these OTUs. We argue that the proposed protocol will help with processing specimen‐rich biodiversity samples at low cost.
Biologists frequently sort specimen-rich samples to species. This process is daunting when based on morphology, and disadvantageous if performed using molecular methods that destroy vouchers (e.g., metabarcoding). An alternative is barcoding every specimen in a bulk sample and then presorting the specimens using DNA barcodes, thus mitigating downstream morphological work on presorted units. Such a "reverse workflow" is too expensive using Sanger sequencing, but we here demonstrate that is feasible with an next-generation sequencing (NGS) barcoding pipeline that allows for cost-effective high-throughput generation of short specimen-specific barcodes (313 bp of COI; laboratory cost <$0.50 per specimen) through next-generation sequencing of tagged amplicons. We applied our approach to a large sample of tropical ants, obtaining barcodes for 3,290 of 4,032 specimens (82%). NGS barcodes and their corresponding specimens were then sorted into molecular operational taxonomic units (mOTUs) based on objective clustering and Automated Barcode Gap Discovery (ABGD). High diversity of 88-90 mOTUs (4% clustering) was found and morphologically validated based on preserved vouchers. The mOTUs were overwhelmingly in agreement with morphospecies (match ratio 0.95 at 4% clustering). Because of lack of coverage in existing barcode databases, only 18 could be accurately identified to named species, but our study yielded new barcodes for 48 species, including 28 that are potentially new to science. With its low cost and technical simplicity, the NGS barcoding pipeline can be implemented by a large range of laboratories. It accelerates invertebrate species discovery, facilitates downstream taxonomic work, helps with building comprehensive barcode databases and yields precise abundance information.
Faecal samples are of great value as a non-invasive means to gather information on the genetics, distribution, demography, diet and parasite infestation of endangered species. Direct shotgun sequencing of faecal DNA could give information on these simultaneously, but this approach is largely untested. Here, we used two faecal samples to characterize the diet of two red-shanked doucs langurs (Pygathrix nemaeus) that were fed known foliage, fruits, vegetables and cereals. Illumina HiSeq produced ~74 and 67 million paired reads for these samples, of which ~ 10,000 (0.014%) and ~ 44,000 (0.066%), respectively, were of chloroplast origin. Sequences were matched against a database of available chloroplast 'barcodes' for angiosperms. The results were compared with 'metabarcoding' using PCR amplification of the P6 loop of trnL. Metagenomics identified seven and nine of the likely 16 diet plants while six and five were identified by metabarcoding. Metabarcoding produced thousands of reads consistent with the known diet, but the barcodes were too short to identify several plant species to genus. Metagenomics utilized multiple, longer barcodes that combined had greater power of identification. However, rare diet items were not recovered. Read numbers for diet species in metagenomic and metabarcoding data were correlated, indicating that both are useful for determining relative sequence abundance. Metagenomic reads were uniformly distributed across the chloroplast genomes; thus, if chloroplast genomes were used as reference, the precision of identifications and species recovery would improve further. Metagenomics also recovered the host mitochondrial genome and numerous intestinal parasite sequences in addition to generating data useful for characterizing the microbiome.
Background: More than 80% of all animal species remain unknown to science. Most of these species live in the tropics and belong to animal taxa that combine small body size with high specimen abundance and large species richness. For such clades, using morphology for species discovery is slow because large numbers of specimens must be sorted based on detailed microscopic investigations. Fortunately, species discovery could be greatly accelerated if DNA sequences could be used for sorting specimens to species. Morphological verification of such "molecular operational taxonomic units" (mOTUs) could then be based on dissection of a small subset of specimens. However, this approach requires cost-effective and low-tech DNA barcoding techniques because wellequipped, well-funded molecular laboratories are not readily available in many biodiverse countries. Results: We here document how MinION sequencing can be used for large-scale species discovery in a specimenand species-rich taxon like the hyperdiverse fly family Phoridae (Diptera). We sequenced 7059 specimens collected in a single Malaise trap in Kibale National Park, Uganda, over the short period of 8 weeks. We discovered > 650 species which exceeds the number of phorid species currently described for the entire Afrotropical region. The barcodes were obtained using an improved low-cost MinION pipeline that increased the barcoding capacity sevenfold from 500 to 3500 barcodes per flowcell. This was achieved by adopting 1D sequencing, resequencing weak amplicons on a used flowcell, and improving demultiplexing. Comparison with Illumina data revealed that the MinION barcodes were very accurate (99.99% accuracy, 0.46% Ns) and thus yielded very similar species units (match ratio 0.991). Morphological examination of 100 mOTUs also confirmed good congruence with morphology (93% of mOTUs; > 99% of specimens) and revealed that 90% of the putative species belong to the neglected, megadiverse genus Megaselia. We demonstrate for one Megaselia species how the molecular data can guide the description of a new species (Megaselia sepsioides sp. nov.). Conclusions: We document that one field site in Africa can be home to an estimated 1000 species of phorids and speculate that the Afrotropical diversity could exceed 200,000 species. We furthermore conclude that low-cost MinION sequencers are very suitable for reliable, rapid, and large-scale species discovery in hyperdiverse taxa. MinION sequencing could quickly reveal the extent of the unknown diversity and is especially suitable for biodiverse countries with limited access to capital-intensive sequencing facilities.
Background DNA barcodes are a useful tool for discovering, understanding, and monitoring biodiversity which are critical tasks at a time of rapid biodiversity loss. However, widespread adoption of barcodes requires cost-effective and simple barcoding methods. We here present a workflow that satisfies these conditions. It was developed via “innovation through subtraction” and thus requires minimal lab equipment, can be learned within days, reduces the barcode sequencing cost to < 10 cents, and allows fast turnaround from specimen to sequence by using the portable MinION sequencer. Results We describe how tagged amplicons can be obtained and sequenced with the real-time MinION sequencer in many settings (field stations, biodiversity labs, citizen science labs, schools). We also provide amplicon coverage recommendations that are based on several runs of the latest generation of MinION flow cells (“R10.3”) which suggest that each run can generate barcodes for > 10,000 specimens. Next, we present a novel software, ONTbarcoder, which overcomes the bioinformatics challenges posed by MinION reads. The software is compatible with Windows 10, Macintosh, and Linux, has a graphical user interface (GUI), and can generate thousands of barcodes on a standard laptop within hours based on only two input files (FASTQ, demultiplexing file). We document that MinION barcodes are virtually identical to Sanger and Illumina barcodes for the same specimens (> 99.99%) and provide evidence that MinION flow cells and reads have improved rapidly since 2018. Conclusions We propose that barcoding with MinION is the way forward for government agencies, universities, museums, and schools because it combines low consumable and capital cost with scalability. Small projects can use the flow cell dongle (“Flongle”) while large projects can rely on MinION flow cells that can be stopped and re-used after collecting sufficient data for a given project.
BackgroundRapid habitat loss and degradation are responsible for population decline in a growing number of species. Understanding the natural history of these species is important for designing conservation strategies, such as habitat enhancements or ex-situ conservation. The acquisition of observational data may be difficult for rare and declining species, but metagenomics and metabarcoding can provide novel kinds of information. Here we use these methods for analysing fecal samples from an endangered population of a colobine primate, the banded leaf monkey (Presbytis femoralis).ResultsWe conducted metagenomics via shotgun sequencing on six fecal samples obtained from a remnant population of P. femoralis in a species-rich rainforest patch in Singapore. Shotgun sequencing and identification against a plant barcode reference database reveals a broad dietary profile consisting of at least 53 plant species from 33 families. The diet includes exotic plant species and is broadly consistent with > 2 years of observational data. Metagenomics identified 15 of the 24 plant genera for which there is observational data, but also revealed at least 36 additional species. DNA traces for the diet species were recovered and identifiable in the feces despite long digestion times and a large number of potential food plants within the rainforest habitat (>700 species). We also demonstrate that metagenomics provides greater taxonomic resolution of food plant species by utilizing multiple genetic markers as compared to single-marker metabarcoding. In addition, full mitochondrial genomes of P. femoralis individuals were reconstructed from fecal metagenomic shotgun reads, showing very low levels of genetic diversity in the focal population, and the presence of gut parasites could also be confirmed. Metagenomics thus allows for the simultaneous assessment of diet, population genetics and gut parasites based on fecal samples.ConclusionsOur study demonstrates that metagenomic shotgun sequencing of fecal samples can be successfully used to rapidly obtain natural history data for understudied species with a complex diet. We predict that metagenomics will become a routinely used tool in conservation biology once the cost per sample reduces to ~100 USD within the next few years.Electronic supplementary materialThe online version of this article (doi:10.1186/s12983-016-0150-4) contains supplementary material, which is available to authorized users.
DNA barcodes are useful for species discovery and species identification, but obtaining barcodes currently requires a well-equipped molecular laboratory and is time-consuming, and/or expensive. We here address these issues by developing a barcoding pipeline for Oxford Nanopore MinION™ and demonstrating that one flow cell can generate barcodes for ~500 specimens despite the high basecall error rates of MinION™ reads. The pipeline overcomes these errors by first summarizing all reads for the same tagged amplicon as a consensus barcode. Consensus barcodes are overall mismatch-free but retain indel errors that are concentrated in homopolymeric regions. They are addressed with an optional error correction pipeline that is based on conserved amino acid motifs from publicly available barcodes. The effectiveness of this pipeline is documented by analysing reads from three MinION™ runs that represent three different stages of MinION™ development. They generated data for (i) 511 specimens of a mixed Diptera sample, (ii) 575 specimens of ants and (iii) 50 specimens of Chironomidae. The run based on the latest chemistry yielded MinION™ barcodes for 490 of the 511 specimens which were assessed against reference Sanger barcodes (N = 471). Overall, the MinION™ barcodes have an accuracy of 99.3%-100% with the number of ambiguous bases after correction ranging from <0.01% to 1.5% depending on which correction pipeline is used. We demonstrate that it requires ~2 hr of sequencing to gather all information needed for obtaining reliable barcodes for most specimens (>90%). We estimate that up to 1,000 barcodes can be generated in one flow cell and that the cost per barcode can be
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.