2022
DOI: 10.1111/2041-210x.13966
|View full text |Cite
|
Sign up to set email alerts
|

fossilbrush: An R package for automated detection and resolution of anomalies in palaeontological occurrence data

Abstract: 1. Fossil occurrence databases are indispensable resources to the palaeontological community, yet present unique data cleaning challenges. Many studies devote significant attention to cleaning fossil occurrence data prior to analysis, but such efforts are typically bespoke and difficult to reproduce. There are also no standardised methods to detect and resolve errors despite the development of an ecosystem of cleaning tools fuelled by the concurrent growth of neontological occurrence databases.2. As fossil occ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

3
4

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 69 publications
0
6
0
Order By: Relevance
“…gr ., “” ) with identical primary and accepted generic names were discarded, while those with different names (i.e., the species was re-assigned to another genus) were retained. In addition, absolute ages (maximum and minimum ages) of the occurrences were updated based on ages in the Geological Time Scale 2020 93 using the chrono_scale function of the fossilbrush R package 94 . Next, occurrences with a high temporal uncertainty (>10 Myr) but not from an international stage were removed from the dataset.…”
Section: Methodsmentioning
confidence: 99%
“…gr ., “” ) with identical primary and accepted generic names were discarded, while those with different names (i.e., the species was re-assigned to another genus) were retained. In addition, absolute ages (maximum and minimum ages) of the occurrences were updated based on ages in the Geological Time Scale 2020 93 using the chrono_scale function of the fossilbrush R package 94 . Next, occurrences with a high temporal uncertainty (>10 Myr) but not from an international stage were removed from the dataset.…”
Section: Methodsmentioning
confidence: 99%
“…In tax_check, Jaro distances are calculated via the stringdistmatrix function from the stringdist package (van der Loo, 2014). This function is provided to help researchers perform a spell check on their dataset, with additional functionality available in the fossilbrush package (Flannery‐Sutherland, Raja, et al, 2022). However, it should be made clear that this is no replacement for thorough taxonomic vetting.…”
Section: Package Descriptionmentioning
confidence: 99%
“…Today, palaeobiologists commonly use code to clean (e.g. Flannery‐Sutherland, Raja, et al, 2022; Zizka et al, 2019), analyse (e.g. Guillerme, 2018; Kocsis et al, 2019), and visualise data (e.g.…”
Section: Introductionmentioning
confidence: 99%
“…Species were assigned the speciation and extinction datums in accordance with Aze et al (2011) and Fenton and Woodhouse et al (2021), and all species occurrences located outside of these assigned stratigraphic ranges were removed. This range trimming was applied to eliminate much of the occurrence data likely attributable to misidentification and/or reworking that may create artificial "tails" within speciation and extinction data (Liow et al, 2010;Lazarus et al, 2012;Flannery-Sutherland et al, 2022). The trimming of taxa resulted in a dataset of 239 317 planktonic foraminiferal occurrences.…”
Section: Global Data Analysismentioning
confidence: 99%