1. Species occurrence records from online databases are an indispensable resource in ecological, biogeographical and palaeontological research. However, issues with data quality, especially incorrect geo-referencing or dating, can diminish their usefulness. Manual cleaning is time-consuming, error prone, difficult to reproduce and limited to known geographical areas and taxonomic groups, making it impractical for datasets with thousands or millions of records.2. Here, we present CoordinateCleaner, an r-package to scan datasets of species occurrence records for geo-referencing and dating imprecisions and data entry errors in a standardized and reproducible way. CoordinateCleaner is tailored to problems common in biological and palaeontological databases and can handle datasets with millions of records. The software includes (a) functions to flag potentially problematic coordinate records based on geographical gazetteers, (b) a global database of 9,691 geo-referenced biodiversity institutions to identify records that are likely from horticulture or captivity, (c) novel algorithms to identify datasets with rasterized data, conversion errors and strong decimal rounding and (d) spatio-temporal tests for fossils.3. We describe the individual functions available in CoordinateCleaner and demonstrate them on more than 90 million occurrences of flowering plants from the Global Biodiversity Information Facility (GBIF) and 19,000 fossil occurrences from the Palaeobiology Database (PBDB). We find that in GBIF more than 3.4 million records (3.7%) are potentially problematic and that 179 of the tested contributing This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
SignificanceAmazonia is not only the world’s most diverse rainforest but is also the region in tropical America that has contributed most to its total biodiversity. We show this by estimating and comparing the evolutionary history of a large number of animal and plant species. We find that there has been extensive interchange of evolutionary lineages among different regions and biomes, over the course of tens of millions of years. Amazonia stands out as the primary source of diversity, which can be mainly explained by the total amount of time Amazonian lineages have occupied the region. The exceedingly rich and heterogeneous diversity of the American tropics could only be achieved by high rates of dispersal events across the continent.
AimMassive digitalization of natural history collections is now leading to a steep accumulation of publicly available species distribution data. However, taxonomic errors and geographical uncertainty of species occurrence records are now acknowledged by the scientific community – putting into question to what extent such data can be used to unveil correct patterns of biodiversity and distribution. We explore this question through quantitative and qualitative analyses of uncleaned versus manually verified datasets of species distribution records across different spatial scales.LocationThe American tropics.MethodsAs test case we used the plant tribe Cinchoneae (Rubiaceae). We compiled four datasets of species occurrences: one created manually and verified through classical taxonomic work, and the rest derived from GBIF under different cleaning and filling schemes. We used new bioinformatic tools to code species into grids, ecoregions, and biomes following WWF's classification. We analysed species richness and altitudinal ranges of the species.ResultsAltitudinal ranges for species and genera were correctly inferred even without manual data cleaning and filling. However, erroneous records affected spatial patterns of species richness. They led to an overestimation of species richness in certain areas outside the centres of diversity in the clade. The location of many of these areas comprised the geographical midpoint of countries and political subdivisions, assigned long after the specimens had been collected.Main conclusionOpen databases and integrative bioinformatic tools allow a rapid approximation of large‐scale patterns of biodiversity across space and altitudinal ranges. We found that geographic inaccuracy affects diversity patterns more than taxonomic uncertainties, often leading to false positives, i.e. overestimating species richness in relatively species poor regions. Public databases for species distribution are valuable and should be more explored, but under scrutiny and validation by taxonomic experts. We suggest that database managers implement easy ways of community feedback on data quality.
Biogeographical regions (bioregions) reveal how different sets of species are spatially grouped and therefore are important units for conservation, historical biogeography, ecology, and evolution. Several methods have been developed to identify bioregions based on species distribution data rather than expert opinion. One approach successfully applies network theory to simplify and highlight the underlying structure in species distributions. However, this method lacks tools for simple and efficient analysis. Here, we present Infomap Bioregions, an interactive web application that inputs species distribution data and generates bioregion maps. Species distributions may be provided as georeferenced point occurrences or range maps, and can be of local, regional, or global scale. The application uses a novel adaptive resolution method to make best use of often incomplete species distribution data. The results can be downloaded as vector graphics, shapefiles, or in table format. We validate the tool by processing large data sets of publicly available species distribution data of the world’s amphibians using species ranges, and mammals using point occurrences. We then calculate the fit between the inferred bioregions and WWF ecoregions. As examples of applications, researchers can reconstruct ancestral ranges in historical biogeography or identify indicator species for targeted conservation.
Article impact statement-An automated conservation assessment with deep learning reveals global centers of orchid extinction risk.
Amazonia is an environmentally heterogeneous and biologically megadiverse region, and its biodiversity varies considerably over space. However, existing knowledge on Amazonian biodiversity and its environmental determinants stems almost exclusively from studies of macroscopic above‐ground organisms, notably vertebrates and trees. In contrast, diversity patterns of most other organisms remain elusive, although some of them, for instance microorganisms, constitute the overwhelming majority of taxa in any given location, both in terms of diversity and abundance. Here, we use DNA metabarcoding to estimate prokaryote and eukaryote diversity in environmental soil and litter samples from 39 survey plots in a longitudinal transect across Brazilian Amazonia using 16S and 18S gene sequences, respectively. We characterize richness and community composition based on operational taxonomic units (OTUs) and test their correlation with longitude and habitat. We find that prokaryote and eukaryote OTU richness and community composition differ significantly among localities and habitats, and that prokaryotes are more strongly structured by locality and habitat type than eukaryotes. Our results 1) provide a first large‐scale mapping of Amazonian soil biodiversity, suggesting that OTU richness patterns might follow substantially different patterns from those observed for macro‐organisms; and 2) indicate that locality and habitat factors interact in determining OTU richness patterns and community composition. This study shows the potential of DNA metabarcoding in unveiling Amazonia's outstanding diversity, despite the lack of complete reference sequence databases for the organisms sequenced.
The unparalleled biodiversity found in the American tropics (the Neotropics) has attracted the attention of naturalists for centuries. Despite major advances in recent years in our understanding of the origin and diversification of many Neotropical taxa and biotic regions, many questions remain to be answered. Additional biological and geological data are still needed, as well as methodological advances that are capable of bridging these research fields. In this review, aimed primarily at advanced students and early-career scientists, we introduce the concept of “trans-disciplinary biogeography,” which refers to the integration of data from multiple areas of research in biology (e.g., community ecology, phylogeography, systematics, historical biogeography) and Earth and the physical sciences (e.g., geology, climatology, palaeontology), as a means to reconstruct the giant puzzle of Neotropical biodiversity and evolution in space and time. We caution against extrapolating results derived from the study of one or a few taxa to convey general scenarios of Neotropical evolution and landscape formation. We urge more coordination and integration of data and ideas among disciplines, transcending their traditional boundaries, as a basis for advancing tomorrow’s ground-breaking research. Our review highlights the great opportunities for studying the Neotropical biota to understand the evolution of life.
Understanding the processes that have generated the latitudinal biodiversity gradient and the continental differences in tropical biodiversity remains a major goal of evolutionary biology. Here we estimate the timing and direction of range shifts of extant flowering plants (angiosperms) between tropical and non-tropical zones, and into and out of the major tropical regions of the world. We then calculate rates of speciation and extinction taking into account incomplete taxonomic sampling. We use a recently published fossil calibrated phylogeny and apply novel bioinformatic tools to code species into user-defined polygons. We reconstruct biogeographic history using stochastic character mapping to compute relative numbers of range shifts in proportion to the number of available lineages through time. Our results, based on the analysis of c. 22,600 species and c. 20 million geo-referenced occurrence records, show no significant differences between the speciation and extinction of tropical and non-tropical angiosperms. This suggests that at least in plants, the latitudinal biodiversity gradient primarily derives from other factors than differential rates of diversification. In contrast, the outstanding species richness found today in the American tropics (the Neotropics), as compared to tropical Africa and tropical Asia, is associated with significantly higher speciation and extinction rates. This suggests an exceedingly rapid evolutionary turnover, i.e., Neotropical species being formed and replaced by one another at unparalleled rates. In addition, tropical America stands out from other continents by having “pumped out” more species than it received through most of the last 66 million years. These results imply that the Neotropics have acted as an engine for global plant diversity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.