2020
DOI: 10.1101/2020.03.12.974543
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

No one-size-fits-all solution to clean GBIF

Abstract: 28Species occurrence records provide the basis for many biodiversity studies. They derive from geo-referenced specimens deposited in natural history collections and visual observations, such as those obtained through various mobile applications. Given the rapid increase in availability of such data, the control of quality and accuracy constitutes a particular concern. Automatic flagging and filtering are a scalable and reproducible means to identify potentially problematic records in datasets from public datab… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
43
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
7
1

Relationship

3
5

Authors

Journals

citations
Cited by 24 publications
(45 citation statements)
references
References 22 publications
(25 reference statements)
2
43
0
Order By: Relevance
“…Data quality. We compared the accuracy of ConR at three levels of data curation of the occurrence records (1) "raw", the data downloaded from GBIF scrubbed taxonomically, (2) "intermediate", the raw data subjected to automated removal of records with common geographic errors (Zizka et al 2020b), and (3) "filtered", the intermediate data with additional removal of records outside the known occurrence range of species from WCSP (WCSP 2019). We only ran this test with ConR, since we expect the two other index-based methods to yield similar results and because we expect IUC-NN to be more robust against erroneous coordinates, since the features used for IUC-NN prediction are mean values across all records of a species.…”
Section: The Reliability Of Aamentioning
confidence: 99%
“…Data quality. We compared the accuracy of ConR at three levels of data curation of the occurrence records (1) "raw", the data downloaded from GBIF scrubbed taxonomically, (2) "intermediate", the raw data subjected to automated removal of records with common geographic errors (Zizka et al 2020b), and (3) "filtered", the intermediate data with additional removal of records outside the known occurrence range of species from WCSP (WCSP 2019). We only ran this test with ConR, since we expect the two other index-based methods to yield similar results and because we expect IUC-NN to be more robust against erroneous coordinates, since the features used for IUC-NN prediction are mean values across all records of a species.…”
Section: The Reliability Of Aamentioning
confidence: 99%
“…We compared the accuracy of ConR assessments based on different datasets of species occurrences, representing three levels of data curation (1) "raw", the data downloaded from GBIF scrubbed taxonomically, (2) "intermediate", the raw data subjected to automated removal of records with common geographic errors (Zizka et al 2020b), and (3) "filtered", the intermediate data with additional removal of records outside the known occurrence range of species from WCSP (WCSP 2019). We only ran this test with ConR, since we expect the two other index-based methods to respond similarly to the issue and because we expect IUC-NN to be more robust against erroneous coordinates, since the features used for IUC-NN prediction are mean values across all records of a species.…”
Section: Data Qualitymentioning
confidence: 99%
“…3) How robust are AA to limitations with data availability, variable data quality and geographic sampling biases? Digitally available species occurrence records are often biased towards certain geographic regions (Meyer et al 2016;Zizka et al 2020a) and contain erroneous or very imprecise coordinates (Zizka et al 2019(Zizka et al , 2020b. Therefore, we expect a bias of AA towards well-sampled geographic areas and life forms, and an increase in accuracy with geographic cleaning.…”
Section: Introductionmentioning
confidence: 99%
“…Compared to Praptosuwiryo (2020) as the case study in our paper, the number of occurrence records stored in GBIF is more than three times of that in Praptosuwiryo (2020) who mainly used data from Herbarium Bogoriense. The use of GBIF records in Red List assessment, however, has to be performed carefully as the data might have errors in the geographic coordinates (Zizka et al 2020). Therefore, cleaning the data is necessary before mapping the occurrence points and conducting the assessment.…”
Section: The Value Of Open Access Species Occurrence Databasementioning
confidence: 99%
“…To use these occurrence records, data cleaning is an essential first step as the data might contain errors in the geographic coordinates. While manual data cleaning based on expert knowledge is feasible on small taxonomic or geographic areas, automated flagging methods can be used for a huge dataset of occurrence records (Zizka et al 2020).…”
Section: Suitable Habitat Of Cibotium Arachnoideummentioning
confidence: 99%