Our world is in the midst of unprecedented change—climate shifts and sustained, widespread habitat degradation have led to dramatic declines in biodiversity rivaling historical extinction events. At the same time, new approaches to publishing and integrating previously disconnected data resources promise to help provide the evidence needed for more efficient and effective conservation and management. Stakeholders have invested considerable resources to contribute to online databases of species occurrences. However, estimates suggest that only 10% of biocollections are available in digital form. The biocollections community must therefore continue to promote digitization efforts, which in part requires demonstrating compelling applications of the data. Our overarching goal is therefore to determine trends in use of mobilized species occurrence data since 2010, as online systems have grown and now provide over one billion records. To do this, we characterized 501 papers that use openly accessible biodiversity databases. Our standardized tagging protocol was based on key topics of interest, including: database(s) used, taxa addressed, general uses of data, other data types linked to species occurrence data, and data quality issues addressed. We found that the most common uses of online biodiversity databases have been to estimate species distribution and richness, to outline data compilation and publication, and to assist in developing species checklists or describing new species. Only 69% of papers in our dataset addressed one or more aspects of data quality, which is low considering common errors and biases known to exist in opportunistic datasets. Globally, we find that biodiversity databases are still in the initial stages of data compilation. Novel and integrative applications are restricted to certain taxonomic groups and regions with higher numbers of quality records. Continued data digitization, publication, enhancement, and quality control efforts are necessary to make biodiversity science more efficient and relevant in our fast-changing environment.
Massive strides have been made in technologies for collecting genome-scale data. However, tools for efficiently and flexibly assembling raw outputs into downstream analytical workflows are still nascent. aTRAM 1.0 was designed to assemble any locus from genome sequencing data but was neither optimized for efficiency nor able to serve as a single toolkit for all assembly needs. We have completely re-implemented aTRAM and redesigned its structure for faster read retrieval while adding a number of key features to improve flexibility and functionality. The software can now (1) assemble single- or paired-end data, (2) utilize both read directions in the database, (3) use an additional de novo assembly module, and (4) leverage new built-in pipelines to automate common workflows in phylogenomics. Owing to reimplementation of databasing strategies, we demonstrate that aTRAM 2.0 is much faster across all applications compared to the previous version.
PremiseLarge phylogenetic data sets have often been restricted to small numbers of loci from GenBank, and a vetted sampling‐to‐sequencing phylogenomic protocol scaling to thousands of species is not yet available. Here, we report a high‐throughput collections‐based approach that empowers researchers to explore more branches of the tree of life with numerous loci.MethodsWe developed an integrated Specimen‐to‐Laboratory Information Management System (SLIMS), connecting sampling and wet lab efforts with progress tracking at each stage. Using unique identifiers encoded in QR codes and a taxonomic database, a research team can sample herbarium specimens, efficiently record the sampling event, and capture specimen images. After sampling in herbaria, images are uploaded to a citizen science platform for metadata generation, and tissue samples are moved through a simple, high‐throughput, plate‐based herbarium DNA extraction and sequencing protocol.ResultsWe applied this sampling‐to‐sequencing workflow to ~15,000 species, producing for the first time a data set with ~50% taxonomic representation of the “nitrogen‐fixing clade” of angiosperms.DiscussionThe approach we present is appropriate at any taxonomic scale and is extensible to other collection types. The widespread use of large‐scale sampling strategies repositions herbaria as accessible but largely untapped resources for broad taxonomic sampling with thousands of species.
We are in the midst of unprecedented change—climate shifts and sustained, widespread habitat degradation have led to dramatic declines in biodiversity rivaling historical extinction events. At the same time, new approaches to publishing and integrating previously disconnected data resources promise to help provide the evidence needed for more efficient and effective conservation and management. Stakeholders have invested considerable resources to contribute to online databases of species occurrences and genetic barcodes. However, estimates suggest that only 10% of biocollections are available in digital form. The biocollections community must therefore continue to promote digitization efforts, which in part requires demonstrating compelling applications of the data. Our overarching goal is therefore to determine trends in use of mobilized species occurrence data since 2010, as online systems have grown and now provide over one billion records. To do this, we characterized 501 papers that use openly accessible biodiversity databases. Our standardized tagging protocol was based on key topics of interest, including: database(s) used, taxa addressed, general uses of data, other data types linked to species occurrence data, and data quality issues addressed. We found that the most common uses of online biodiversity databases have been to estimate species distribution and richness, to outline data compilation and publication, and to assist in developing species checklists or describing new species. Only 69% of papers in our dataset addressed one or more aspects of data quality, which is low considering common errors and biases known to exist in opportunistic datasets. Globally, we find that biodiversity databases are still in the initial stages of data compilation. Novel and integrative applications are restricted to certain taxonomic groups and regions with higher numbers of quality records. Continued data digitization, publication, enhancement, and quality control efforts are necessary to make biodiversity science more efficient and relevant in our fast-changing world.
For vast areas of the globe and large parts of the tree of life, data needed to inform trait diversity is incomplete. Such trait data, when fully assembled, however, form the link between the evolutionary history of organisms, their assembly into communities, and the nature and functioning of ecosystems. Recent efforts to close data gaps have focused on collating trait-by-species databases, which only provide species-level, aggregated value ranges for traits of interest and often lack the direct observations on which those ranges are based. Perhaps under-appreciated is that digitized biocollection records collectively contain a vast trove of trait data measured directly from individuals, but this content remains hidden and highly heterogeneous, impeding discoverability and use. We developed and deployed a suite of openly accessible software tools in order to collate a full set of trait descriptions and extract two key traits, body length and mass, from >18 million specimen records in VertNet, a global biodiversity data publisher and aggregator. We tested success rate of these tools against hand-checked validation data sets and characterized quality and quantity. A post-processing toolkit was developed to standardize and harmonize data sets, and to integrate this improved content into VertNet for broadest reuse. The result of this work was to add more than 1.5 million harmonized measurements on vertebrate body mass and length directly to specimen records. Rates of false positives and negatives for extracted data were extremely low. We also created new tools for filtering, querying, and assembling this research-ready vertebrate trait content for view and download. Our work has yielded a novel database and platform for harmonized trait content that will grow as tools introduced here become part of publication workflows. We close by noting how this effort extends to new communities already developing similar digitized content.Database URL: http://portal.vertnet.org/search?advanced=1
Root nodule symbiosis (RNS) allows plants to access atmospheric nitrogen converted into usable forms through a mutualistic relationship with soil bacteria. RNS is a complex trait requiring coordination from both the plant host and the bacterial symbiont, and pinpointing the evolutionary origins of root nodules is critical for understanding the genetic basis of RNS. This endeavor is complicated by data limitations and the intermittent presence of RNS in a single clade of ca. 30,000 species of flowering plants, i.e., the nitrogen-fixing clade (NFC). We developed the most extensive de novo phylogeny for all major lineages of the NFC and an enhanced root nodule trait database to reconstruct the evolution of RNS. Through identification of the evolutionary pathway to RNS gain, we show that shifts among heterogeneous evolutionary rates can explain how a complex trait such as RNS can arise many times across a large phylogeny. Our analysis identifies a two-step process in which an ancestral precursor state gave rise to a more labile state from which RNS was quickly gained at specific points in the NFC. Our rigorous reconstruction of ancestral states illustrates how a two-step pathway could have led to multiple independent gains and losses of RNS, contrary to recent hypotheses invoking just a single gain and numerous losses. RNS may be an example of multi-level convergent evolution, thus requiring a broader phylogenetic and genetic scope for genome-phenome mapping to elucidate mechanisms enabling fully functional RNS.
A reciprocal subtraction differential RNA display (RSDD) approach has been developed that permits the rapid and efficient identification and cloning of both abundant and rare differentially expressed genes. RSDD comprises reciprocal subtraction of cDNA libraries followed by differential RNA display. The RSDD strategy was applied to analyze the gene expression alterations resulting during cancer progression as adenovirus-transformed rodent cells developed an aggressive transformed state, as documented by elevated anchorageindependence and enhanced in vivo oncogenesis in nude mice. This approach resulted in the identification and cloning of both known and a high proportion (>65%) of unknown sequences, including cDNAs displaying elevated expression as a function of progression (progression-elevated gene) and cDNAs displaying suppressed expression as a function of progression (progressionsuppressed gene). Sixteen differentially expressed genes, including five unknown progression-elevated genes and six unknown progression-suppressed genes, have been characterized. The RSDD scheme should find wide application for the effective detection and isolation of differentially expressed genes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.