DNA metabarcoding is an important tool for molecular ecology. However, its effectiveness hinges on the quality of reference sequence databases and classification parameters employed. Here we evaluate the performance of MiFish 12S taxonomic assignments using a case study of California Current Large Marine Ecosystem fishes to determine best practices for metabarcoding. Specifically, we use a taxonomy cross‐validation by identity framework to compare classification performance between a global database comprised of all available sequences and a curated database that only includes sequences of fishes from the California Current Large Marine Ecosystem. We demonstrate that the regional database provides higher assignment accuracy than the comprehensive global database. We also document a tradeoff between accuracy and misclassification across a range of taxonomic cutoff scores, highlighting the importance of parameter selection for taxonomic classification. Furthermore, we compared assignment accuracy with and without the inclusion of additionally generated reference sequences. To this end, we sequenced tissue from 597 species using the MiFish 12S primers, adding 252 species to GenBank's existing 550 California Current Large Marine Ecosystem fish sequences. We then compared species and reads identified from seawater environmental DNA samples using global databases with and without our generated references, and the regional database. The addition of new references allowed for the identification of 16 additional native taxa representing 17.0% of total reads from eDNA samples, including species with vast ecological and economic value. Together these results demonstrate the importance of comprehensive and curated reference databases for effective metabarcoding and the need for locus‐specific validation efforts.
DNA metabarcoding is an important tool for molecular ecology. However, its effectiveness hinges on the quality of reference sequence databases and classification parameters employed. Here we evaluate the performance of MiFish 12S taxonomic assignments using a case study of California Current Large Marine Ecosystem fishes to determine best practices for metabarcoding. Specifically, we use a taxonomy cross-validation by identity framework to compare classification performance between a global database comprised of all available sequences and a curated database that only includes sequences of fishes from the California Current Large Marine Ecosystem. We demonstrate that the curated, regional database provides higher assignment accuracy than the comprehensive global database. We also document a tradeoff between accuracy and misclassification across a range of taxonomic cutoff scores, highlighting the importance of parameter selection for taxonomic classification. Furthermore, we compared assignment accuracy with and without the inclusion of additionally generated reference sequences. To this end, we sequenced tissue from 605 species using the MiFish 12S primers, adding 253 species to GenBank’s existing 550 California Current Large Marine Ecosystem fish sequences. We then compared species and reads identified from seawater environmental DNA samples using global databases with and without our generated references, and the regional database. The addition of new references allowed for the identification of 16 native taxa and 17.0% of total reads from eDNA samples, including species with vast ecological and economic value. Together these results demonstrate the importance of comprehensive and curated reference databases for effective metabarcoding and the need for locus-specific validation efforts.
DNA metabarcoding is an important tool for molecular ecology. However, metabarcoding effectiveness hinges on the quality of reference databases for taxa and loci of interest. This limitation is true for metabarcoding of marine fishes in the California Current Large Marine Ecosystem where there is a paucity of reference 12S barcodes. Here we present FishCARD, a California Current-specific fish 12S-specific reference barcode database. We barcoded 612 species using the MiFish metabarcoding primers; an addition of 258 species to the 459 California Current fish species with existing 12S barcodes from GenBank. The resulting FishCARD database covers 82.7% of California Current fishes, and it includes virtually all fishes sampled by large marine monitoring programs such as the Partnership for Interdisciplinary Studies of Coastal Oceans and California Cooperative Oceanic Fisheries Investigation. To demonstrate the importance of complete reference databases for eDNA metabarcoding, we compared species and reads identified from three 1L seawater samples collected off Santa Cruz Island, CA using GenBank sequences with and without our generated barcodes, as well as the FishCARD database curated here. The inclusion of our generated barcodes allowed the additional identification of 15 native taxa and 21.8% of total reads from eDNA samples. However, we found that half of all amplicon sequence variants (ASVs) generated by MiFish 12S primers were of non-vertebrate 16S origin, demonstrating a clear limitation of a widely employed fish metabarcoding primers. Despite these limitations, FishCARD provides an important genetic resource to enhance the effectiveness of marine metabarcoding efforts in the California Current Large Marine Ecosystem.
We found a startling correlation (Pearson ρ > 0.97) between a single event in daily sea surface temperatures each spring, and peak fish egg abundance measurements the following summer, in 7 years of approximately weekly fish egg abundance data collected at Scripps Pier in La Jolla California. Even more surprising was that this event-based result persisted despite the large and variable number of fish species involved (up to 46), and the large and variable time interval between trigger and response (up to ~3 months). To mitigate potential over-fitting, we made an out-of-sample prediction beyond the publication process for the peak summer egg abundance observed at Scripps Pier in 2020 (available on bioRxiv). During peer-review, the prediction failed, and while it would be tempting to explain this away as a result of the record-breaking toxic algal bloom that occurred during the spring (9x higher concentration of dinoflagellates than ever previously recorded), a re-examination of our methodology revealed a potential source of over-fitting that had not been evaluated for robustness. This cautionary tale highlights the importance of testable true out-of-sample predictions of future values that cannot (even accidentally) be used in model fitting, and that can therefore catch model assumptions that may otherwise escape notice. We believe that this example can benefit the current push towards ecology as a predictive science and support the notion that predictions should live and die in the public domain, along with the models that made them.
Ichthyoplankton studies can be used to assess the abundance, distribution, and reproductive activity of marine fishes, but few studies have monitored spawning activity at inshore sites. This study utilized weekly plankton sampling to construct a year-long time series of fish spawning at 6 pier sites along the California coast—Santa Cruz, San Luis Obispo, Santa Barbara, Santa Monica, Newport Beach, and La Jolla; sampling at the La Jolla site continues ongoing monitoring initiated in 2012. Fish eggs were sorted from the collected plankton and identified to species level using DNA barcoding of the COI and 16S genes. While only one year of data has been collected from 5 of the sites, the 2 sites north of Point Conception show markedly reduced diversity compared to the southern sites. Although the species observed reflect the local environment of each site, this pattern of reduced diversity at the northern sites is consistent with the well-documented decline in species richness with latitude along the California coast. The 7-year time series from La Jolla has revealed that spawning activity varies greatly among years, both in terms of egg production and species diversity, with a continuing trend of highest egg numbers in years with colder average winter sea surface temperature.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.