Genomic data are being produced and archived at a prodigious rate, and current studies could become historical baselines for future global genetic diversity analyses and monitoring programs. However, when we evaluated the potential utility of genomic data from wild and domesticated eukaryote species in the world’s largest genomic data repository, we found that most archived genomic datasets (86%) lacked the spatiotemporal metadata necessary for genetic biodiversity surveillance. Labor-intensive scouring of a subset of published papers yielded geospatial coordinates and collection years for only 33% (39% if place names were considered) of these genomic datasets. Streamlined data input processes, updated metadata deposition policies, and enhanced scientific community awareness are urgently needed to preserve these irreplaceable records of today’s genetic biodiversity and to plug the growing metadata gap.
Findings from eDNA metabarcoding are strongly influenced by experimental approach, yet the effect of pre-PCR sample processing on taxon detection and estimates of biodiversity across different water types is still poorly resolved. To fill this data gap, we investigated the impact of sampling effort, extraction method, and filter pore size on DNA yield, PCR inhibition, and 16S rDNA metabarcoding results for fishes in water samples collected from inshore turbid-and offshore clear-water environments. The turbid-water samples had high concentrations of suspended organic and/ or inorganic material and yielded ~3.2× more DNA and exhibited high levels of PCR inhibition compared with the low-turbidity, clear-water samples. Importantly, there were no striking differences in the results of our metabarcoding experiments based on extraction method or filter pore size. While a small number of unique species of relatively low read count were detected in all turbid-water treatments, most species were consistently detected across samples. Results for the clear-water samples were strikingly different, with low DNA yield, high levels of variation across replicates, and a high number of non-overlapping species across treatments. These findings indicate a patchy distribution of eDNA in offshore environments, which means higher volumes of water (≥ 2 L per replicate) must be filtered in habitats where target DNA is likely to be sparse. In semi-closed systems such as estuaries, higher concentrations of target DNA are expected, and we found that either a 1.0 or 3.0 µm filter pore size was sufficient to capture standing diversity, while decreasing the risk of clogging. For economical DNA extraction and inhibitor removal, we recommend a combination of Omega Bio-tek E.Z.N.A Tissue DNA kit followed by a PCR inhibitor removal step using the Zymo Kit. Finally, we emphasize that pilot studies should be undertaken whenever sampling in a new environment to identify which protocol is most appropriate.
Metabarcoding of environmental DNA is increasingly used for biodiversity assessments in aquatic communities. The efficiency and outcome of these efforts are dependent upon either de novo primer design or selecting an appropriate primer set from the dozens that have already been published. Unfortunately, there is a lack of studies that have directly compared the efficacy of different metabarcoding primers in marine and estuarine systems. Here we evaluate five commonly used primer sets designed to amplify rRNA barcoding genes in fishes and compare their performance using water samples collected from estuarine sites in the highly biodiverse Indian River Lagoon in Florida. Three of the five primer sets amplify a portion of the mitochondrial 12S gene (MiFish_12S, 171bp; Riaz_12S, 106 bp; Valentini_12S, 63 bp), one amplifies 219 bp of the mitochondrial 16S gene (Berry_16S), and the other amplifies 271 bp of the nuclear 18S gene (MacDonald_18S). The vast majority of the metabarcoding reads (> 99%) generated using the 18S primer set assigned to non-target (non-fish) taxa and therefore this primer set was omitted from most analyses. Using a conservative 99% similarity threshold for species level assignments, we detected a comparable number of species (55 and 49, respectively) and similarly high Shannon’s diversity values for the Riaz_12S and Berry_16S primer sets. Meanwhile, just 34 and 32 species were detected using the MiFish_12S and Valentini_12S primer sets, respectively. We were able to amplify both bony and cartilaginous fishes using the four primer sets with the vast majority of reads (>99%) assigned to the former. We detected the greatest number of elasmobranchs (six species) with the Riaz_12S primer set suggesting that it may be a suitable candidate set for the detection of sharks and rays. Of the total 76 fish species that were identified across all datasets, the combined three 12S primer sets detected 85.5% (65 species) while the combination of the Riaz_12S and Berry_16S primers detected 93.4% (71 species). These results highlight the importance of employing multiple primer sets as well as using primers that target different genomic regions. Moreover, our results suggest that the widely adopted MiFish_12S primers may not be the best choice, rather we found that the Riaz_12S primer set was the most effective for eDNA-based fish surveys in our system.
Genetic diversity within species represents a fundamental yet underappreciated level of biodiversity. Because genetic diversity can indicate species resilience to changing climate, its measurement is relevant to many national and global conservation policy targets. Many studies produce large amounts of genome-scale genetic diversity data for wild populations, but most (87%) do not include the associated spatial and temporal metadata necessary for them to be reused in monitoring programs or for acknowledging the sovereignty of nations or Indigenous peoples. We undertook a distributed datathon to quantify the This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
Genetic diversity within species represents a fundamental yet underappreciated level of biodiversity. Because genetic diversity can indicate species and population resilience to changing climate, its measurement is relevant to many national and global conservation policy targets. Many studies of evolutionary biology, molecular ecology and conservation genetics produce large amounts of genome-scale genetic diversity data for wild populations. While open data policies have ensured an abundance of freely available genomic data stored in the databases of the International Nucleotide Sequence Database Collaboration (INSDC), only about 13% of current accessions have the associated spatial and temporal metadata in INSDC necessary to be reused in monitoring programs, macrogenetic studies, or for acknowledging the sovereignty of nations or Indigenous Peoples. We undertook a “distributed datathon” to quantify the availability of these missing metadata in sources external to the INSDC and to test the hypothesis that these metadata decay with time. We also worked to remediate these missing metadata by extracting them, when present, from associated published papers, online repositories, and/or from direct communication with authors. Starting with 848 programmatically identified candidate datasets (INSDC BioProjects), we manually determined that 492 contained samples from wild populations. We successfully restored spatiotemporal metadata (locality name and/or geospatial coordinates and collection year) for 82% of these 492 datasets (N = 401 BioProjects comprising 42,104 individuals or BioSamples). We also quantified the availability of 33 additional categories of metadata in sources external to the INSDC. Information about associated publications and the type of habitat from which the samples were taken was the most easily found; information about sampling permits was the most challenging to locate. Looking at papers and online repositories was much more fruitful than contacting authors, who only replied to our email requests 45% of the time. Overall, 23% of our email queries to authors discovered useful metadata. Importantly, we found that the probability of retrieving spatiotemporal metadata declines significantly with the age of the dataset, with a 13.5% yearly decrease for metadata located in published papers or online repositories and up to a 22% yearly decrease for metadata that were only available from authors. This observable metadata decay, mirrored in studies of other types of biological data, should motivate swift updates to data sharing policies and researcher practices to ensure that the valuable context provided by metadata is not lost forever.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.