Species occurrence records provide the basis for many biodiversity studies. They derive from georeferenced specimens deposited in natural history collections and visual observations, such as those obtained through various mobile applications. Given the rapid increase in availability of such data, the control of quality and accuracy constitutes a particular concern. Automatic filtering is a scalable and reproducible means to identify potentially problematic records and tailor datasets from public databases such as the Global Biodiversity Information Facility (GBIF; http://www.gbif.org), for biodiversity analyses. However, it is unclear how much data may be lost by filtering, whether the same filters should be applied across all taxonomic groups, and what the effect of filtering is on common downstream analyses. Here, we evaluate the effect of 13 recently proposed filters on the inference of species richness patterns and automated conservation assessments for 18 Neotropical taxa, including terrestrial and marine animals, fungi, and plants downloaded from GBIF. We find that a total of 44.3% of the records are potentially problematic, with large variation across taxonomic groups (25–90%). A small fraction of records was identified as erroneous in the strict sense (4.2%), and a much larger proportion as unfit for most downstream analyses (41.7%). Filters of duplicated information, collection year, and basis of record, as well as coordinates in urban areas, or for terrestrial taxa in the sea or marine taxa on land, have the greatest effect. Automated filtering can help in identifying problematic records, but requires customization of which tests and thresholds should be applied to the taxonomic group and geographic area under focus. Our results stress the importance of thorough recording and exploration of the meta-data associated with species records for biodiversity research.
28Species occurrence records provide the basis for many biodiversity studies. They derive from geo-referenced specimens deposited in natural history collections and visual observations, such as those obtained through various mobile applications. Given the rapid increase in availability of such data, the control of quality and accuracy constitutes a particular concern. Automatic flagging and filtering are a scalable and reproducible means to identify potentially problematic records in datasets from public databases such as the Global Biodiversity Information Facility (GBIF; www.gbif.org). However, it is unclear how much data may be lost by filtering, whether the same tests should be applied across all taxonomic groups, and what is the effect of filtering for common downstream analyses. Here, we evaluate the effect of 13 recently proposed filters on the inference of species richness patterns and automated conservation assessments for 18 Neotropical taxa including animals, fungi, and plants, terrestrial and marine, downloaded from GBIF. We find that 29-90% of the records are potentially erroneous, with large variation across taxonomic groups. Tests for duplicated information, collection year, basis of record as well as urban areas and coordinates for terrestrial taxa in the sea or marine taxa on land have the greatest effect. While many flagged records might not be de facto erroneous, they could be overly imprecise and increase uncertainty in downstream analyses. Automated flagging can help in identifying problematic records, but requires customization of which tests and thresholds should be applied to the taxonomic group and geographic area under focus. Our results stress the importance of thorough exploration of the meta-data associated with species records for biodiversity research. 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44Publicly available species distribution data have become a crucial resource in biodiversity research, including studies in 46 ecology, biogeography, systematics and conservation biology. In particular, the availability of digitized collections from 47 museums and herbaria, and citizen science observations has increased drastically over the last few years. As of today, 48 the largest public aggregator for geo-referenced species occurrences data, the Global Biodiversity Information Facility 49 (www.gbif.org), provides access to more than 1.3 billion geo-referenced occurrence records for species from across the 50 globe and the tree of life. 51A central challenge to the use of these publicly available species occurrence data in research are erroneous geographic 52 coordinates (Anderson et al. 2016). Errors mostly arise because public databases integrate records collected with 53 different methodologies in different places, at different times; often without centralized curation and only rudimentary 54 meta-data. For instance, erroneous coordinates caused by data-entry errors or automated geo-referencing from vague 55 locality descriptions are common (Maldonado et al. 2015; Yesson et al. 2007)...
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
The Restinga forests of southern Bahia state, Brazil, grow on sandy coastal Quaternary sediments. As their floras are relatively poorly known, the present study assessed their floristic compositions. We surveyed four sites at Maraú and Itacaré and identified 302 angiosperm species belonging to 184 genera of 75 families. The most species rich families were: Fabaceae (35 species), Myrtaceae (25), Rubiaceae (21), Sapotaceae (13), Bromeliaceae (12), Annonaceae (11), Erythroxylaceae (10), Melastomataceae (9), and Apocynaceae (8). Local floras include elements with distributions restricted to the Atlantic Forest domain, those disjunct between the Amazon and Atlantic Forest domains, and those also occurring in moist forests and dry vegetation of central Brazil. The hypothesis that the floristic compositions of restinga forests are influenced by neighboring wet forests was tested using cluster and principal component analyses of eleven restinga forests and nine Atlantic wet forest sites. The results supported five main groups, with most of them including both restinga forests and their adjacent wet forest sites, thus corroborating the hypothesis that wet forests in geographical proximity greatly influence the floristic compositions of restinga forests. Key-word: coastal vegetation, Atlantic Forest domain, flora, similarity. ResumoAs florestas de Restinga do baixo-sul da Bahia, Brasil, encontram-se sobre sedimentos arenosos do Quaternário costeiro. Como sua flora é relativamente pouco conhecida, o presente estudo avaliou sua composição florística. Foram inventariadas quatro áreas nos municípios de Maraú e Itacaré e identificadas 302 espécies de angiospermas, distribuídas em 184 gêneros e 75 famílias. As famílias mais ricas em espécies foram: Fabaceae (35 espécies), Myrtaceae (25), Rubiaceae (21), Sapotaceae (13), Bromeliaceae (12), Annonaceae (11), Erythroxylaceae (10), Melastomataceae (9) e Apocynaceae (8). A flora local inclui elementos de distribuição restrita à Mata Atlântica, disjunta entre Amazônia e Mata Atlântica e florestas úmidas e a vegetação seca do Brasil central. A hipótese de que a composição florística das florestas de Restinga é influenciada pelas florestas pluviais geograficamente próximas foi testada usando análises de agrupamento e de componentes principais com onze áreas de florestas de Restinga e nove de florestas pluviais da Mata Altântica. Os resultados sustentaram cinco grupos principais, a maioria incluindo áreas de floresta de restinga e florestas pluviais adjacentes, corroborando a hipótese de que a proximidade geográfica aos estoques florísticos das florestas pluviais tem grande efeito na composição das florestas de Restinga. Palavras-chave: vegetação costeira, Domínio Mata Atlântica, flora, similaridade.
The Brazilian Caatinga is considered the richest nucleus of the Seasonally Dry Tropical Forests (SDTF) in the Neotropics, also exhibiting high levels of endemism, but the timing of origin and the evolutionary causes of its plant diversification are still poorly understood. In this study, we integrate comprehensive sampled dated molecular phylogenies of multiple flowering plant groups and estimations of ancestral areas to elucidate the forces driving diversification and historical assembly in the Caatinga flowering plants. Our results show a pervasive floristic exchange between Caatinga and other neotropical regions, particularly those adjacent. While some Caatinga lineages arose in the Eocene/Oligocene, most dry-adapted endemic plant lineages found in region emerged from the middle to late Miocene until the Pleistocene, indicating that only during this period the Caatinga started to coalesce into a SDTF like we see today. Our findings are temporally congruent with global and regional aridification events and extensive denudation of thick layers of sediments in Northeast (NE) Brazil. We hypothesize that global aridification processes have played important role in the ancient plant assembly and long-term Caatinga SDTF biome stability, whereas climate-induced vegetation shifts, as well as the newly opened habitats have largely contributed as drivers of in situ diversification in the region. Patterns of phylogenetic relatedness of Caatinga endemic clades revealed that much modern species diversity has originated in situ and likely evolved via recent (Pliocene/Pleistocene) ecological specialization triggered by increased environmental heterogeneity and the exhumation of edaphically disparate substrates. The continuous assembly of dry-adapted flora of the Caatinga has been complex, adding to growing evidence that the origins and historical assembly of the distinct SDTF patches are idiosyncratic across the Neotropics, driven not just by continental-scale processes but also by unique features of regional-scale geological history.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.