High-throughput amplicon sequencing (HTAS) of conserved DNA regions is a powerful technique to characterize microbial communities. Recently, spike-in mock communities have been used to measure accuracy of sequencing platforms and data analysis pipelines. To assess the ability of sequencing platforms and data processing pipelines using fungal internal transcribed spacer (ITS) amplicons, we created two ITS spike-in control mock communities composed of cloned DNA in plasmids: a biological mock community, consisting of ITS sequences from fungal taxa, and a synthetic mock community (SynMock), consisting of non-biological ITS-like sequences. Using these spike-in controls we show that: (1) a non-biological synthetic control (e.g., SynMock) is the best solution for parameterizing bioinformatics pipelines, (2) pre-clustering steps for variable length amplicons are critically important, (3) a major source of bias is attributed to the initial polymerase chain reaction (PCR) and thus HTAS read abundances are typically not representative of starting values. We developed AMPtk, a versatile software solution equipped to deal with variable length amplicons and quality filter HTAS data based on spike-in controls. While we describe herein a non-biological SynMock community for ITS sequences, the concept and AMPtk software can be widely applied to any HTAS dataset to improve data quality.
DNA analysis of predator faeces using high-throughput amplicon sequencing (HTS) enhances our understanding of predator-prey interactions. However, conclusions drawn from this technique are constrained by biases that occur in multiple steps of the HTS workflow. To better characterize insectivorous animal diets, we used DNA from a diverse set of arthropods to assess PCR biases of commonly used and novel primer pairs for the mitochondrial gene, cytochrome oxidase C subunit 1 (COI). We compared diversity recovered from HTS of bat guano samples using a commonly used primer pair "ZBJ" to results using the novel primer pair "ANML." To parameterize our bioinformatics pipeline, we created an arthropod mock community consisting of single-copy (cloned) COI sequences. To examine biases associated with both PCR and HTS, mock community members were combined in equimolar amounts both pre-and post-PCR. We validated our system using guano from bats fed known diets and using composite samples of morphologically identified insects collected in pitfall traps. In PCR tests, the ANML primer pair amplified 58 of 59 arthropod taxa (98%), whereas ZBJ amplified 24-40 of 59 taxa (41%-68%). Furthermore, in an HTS comparison of field-collected samples, the ANML primers detected nearly fourfold more arthropod taxa than the ZBJ primers. The additional arthropods detected include medically and economically relevant insect groups such as mosquitoes. Results revealed biases at both the PCR and sequencing levels, demonstrating the pitfalls associated with using HTS read numbers as proxies for abundance. The use of an arthropod mock community allowed for improved bioinformatics pipeline parameterization. K E Y W O R D SAMPtk, arthropod mock community, bat guano, dietary analysis, insectivore, next-generation sequencing *Indicates shared first authorship based on equal contributions.
DNA analysis of predator feces using high-throughput amplicon sequencing (HTS) enhances our understanding of predator-prey interactions. However, conclusions drawn from this technique are constrained by biases that occur in multiple steps of the HTS workflow. To better characterize insectivorous animal diets, we used DNA from a diverse set of arthropods to assess PCR biases of commonly used and novel primer pairs for the mitochondrial gene, cytochrome oxidase C subunit 1 (CO1). We compared diversity recovered from HTS of bat guano samples using a commonly used primer pair “ZBJ” to results using the novel primer pair “ANML”. To parameterize our bioinformatics pipeline, we created an arthropod mock community consisting of single-copy (cloned) CO1 sequences. To examine biases associated with both PCR and HTS, mock community members were combined in equimolar amounts both pre- and post-PCR. We validated our system using guano from bats fed known diets and using composite samples of morphologically identified insects collected in pitfall traps. In PCR tests, the ANML primer pair amplified 58 of 59 arthropod taxa (98%) whereas ZBJ amplified 24 of 59 taxa (41%). Furthermore, in an HTS comparison of field-collected samples, the ANML primers detected nearly four-fold more arthropod taxa than the ZBJ primers. The additional arthropods detected include medically and economically relevant insect groups such as mosquitoes. Results revealed biases at both the PCR and sequencing levels, demonstrating the pitfalls associated with using HTS read numbers as proxies for abundance. The use of an arthropod mock community allowed for improved bioinformatics pipeline parameterization.
Thousands of species of ambrosia beetles excavate tunnels in wood to farm fungi. They maintain associations with particular lineages of fungi, but the phylogenetic extent and mechanisms of fidelity are unknown. We test the hypothesis that selectivity of their mycangium enforces fidelity at coarse phylogenetic scales, while permitting promiscuity among closely related fungal mutualists. We confirm a single evolutionary origin of the Xylosandrus complex—a group of several xyleborine genera that farm fungi in the genus Ambrosiella . Multi-level co-phylogenetic analysis revealed frequent symbiont switching within major Ambrosiella clades, but not between clades. The loss of the mycangium in Diuncus , a genus of evolutionary cheaters, was commensurate with the loss of fidelity to fungal clades, supporting the hypothesis that the mycangium reinforces fidelity. Finally, in vivo experiments tracked symbiotic compatibility throughout the symbiotic life cycle of Xylosandrus compactus and demonstrated that closely related Ambrosiella symbionts are interchangeable, but the probability of fungal uptake in the mycangium was significantly lower in more phylogenetically distant species of symbionts. Symbiont loads in experimental subjects were similar to wild-caught beetles. We conclude that partner choice in ambrosia beetles is achieved in the mycangium, and co-phylogenetic inferences can be used to predict the likelihood of specific symbiont switches.
High throughput amplicon sequencing (HTAS) of conserved DNA regions is a powerful technique to characterize microbial communities. Recently, spike-in mock communities have been used to measure accuracy of sequencing platforms and data analysis pipelines. To assess the ability of sequencing platforms and data processing pipelines using fungal ITS amplicons, we created two ITS spike-in control mock communities composed of cloned DNA in plasmids: a biological mock community (BioMock), consisting of ITS sequences from fungal taxa, and a synthetic mock community (SynMock), consisting of non-biological ITS-like sequences. Using these spike-in controls we show that: 1) a non-biological synthetic control (e.g., SynMock) is the best solution for parameterizing bioinformatics pipelines, 2) pre-clustering steps for variable length amplicons are critically important, 3) a major source of bias is attributed to initial PCR reactions and thus HTAS read abundances are typically not representative of starting values. We developed AMPtk, a versatile software solution equipped to deal with variable length amplicons and quality filter HTAS data based on spike-in controls. While we describe herein a non-biological synthetic mock community for ITS sequences, the concept and AMPtk software can be widely applied to any HTAS dataset to improve data quality.
Species of Ganoderma, commonly called reishi (in Japan) or lingzhi (in China), have been used in traditional medicine for thousands of years, and their use has gained interest from pharmaceutical industries in recent years. Globally, the taxonomy of Ganoderma species is chaotic, and the taxon name Ganoderma lucidum has been used for most laccate (shiny) Ganoderma species. However, it is now known that G. lucidum sensu stricto has a limited native distribution in Europe and some parts of China. It is likely that differences in the quality and quantity of medicinally relevant chemicals occur among Ganoderma species. To determine what species are being sold in commercially available products, twenty manufactured products (e.g., pills, tablets, teas, etc.) and seventeen grow your own (GYO) kits labeled as containing G. lucidum were analyzed. DNA was extracted, and the internal transcribed spacer (ITS) region and translation elongation factor 1-alpha (tef1α) were sequenced with specific fungal primers. The majority (93%) of the manufactured reishi products and almost half of the GYO kits were identified as Ganoderma lingzhi. G. lingzhi is native to Asia and is the most widely cultivated and studied taxon for medicinal use. Illumina MiSeq sequencing of the ITS1 region was performed to determine if multiple Ganoderma species were present. None of the manufactured products tested contained G. lucidum sensu stricto, and it was detected in only one GYO kit. G. lingzhi was detected in most products, but other Ganoderma species were also present, including G. applanatum, G. australe, G. gibbosum, G. sessile, and G. sinense. Our results indicate that the content of these products vary and that better labeling is needed to inform consumers before these products are ingested or marketed as medicine. Of the 17 GYO kits tested, 11 kits contained Ganoderma taxa that are not native to the United States. If fruiting bodies of exotic Ganoderma taxa are cultivated, these GYO kits will likely end up in the environment. The effects of these exotic species to natural ecosystems needs investigation.
Metabarcoding studies provide a powerful approach to estimate the diversity and abundance of organisms in mixed communities in nature. While strategies exist for optimizing sample and sequence library preparation, best practices for bioinformatic processing of amplicon sequence data are lacking in animal diet studies. Here we evaluate how decisions made in core bioinformatic processes, including sequence filtering, database design, and classification, can influence animal metabarcoding results. We show that denoising methods have lower error rates compared to traditional clustering methods, although these differences are largely mitigated by removing low‐abundance sequence variants. We also found that available reference datasets from GenBank and BOLD for the animal marker gene cytochrome oxidase I (COI) can be complementary, and we discuss methods to improve existing databases to include versioned releases. Taxonomic classification methods can dramatically affect results. For example, the commonly used Barcode of Life Database (BOLD) Classification API assigned fewer names to samples from order through species levels using both a mock community and bat guano samples compared to all other classifiers (vsearch‐SINTAX and q2‐feature‐classifier's BLAST + LCA, VSEARCH + LCA, and Naive Bayes classifiers). The lack of consensus on bioinformatics best practices limits comparisons among studies and may introduce biases. Our work suggests that biological mock communities offer a useful standard to evaluate the myriad computational decisions impacting animal metabarcoding accuracy. Further, these comparisons highlight the need for continual evaluations as new tools are adopted to ensure that the inferences drawn reflect meaningful biology instead of digital artifacts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.