2016
DOI: 10.1371/journal.pone.0151232
|View full text |Cite
|
Sign up to set email alerts
|

From GenBank to GBIF: Phylogeny-Based Predictive Niche Modeling Tests Accuracy of Taxonomic Identifications in Large Occurrence Data Repositories

Abstract: Accuracy of taxonomic identifications is crucial to data quality in online repositories of species occurrence data, such as the Global Biodiversity Information Facility (GBIF), which have accumulated several hundred million records over the past 15 years. These data serve as basis for large scale analyses of macroecological and biogeographic patterns and to document environmental changes over time. However, taxonomic identifications are often unreliable, especially for non-vascular plants and fungi including l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
28
0
1

Year Published

2016
2016
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 32 publications
(29 citation statements)
references
References 66 publications
0
28
0
1
Order By: Relevance
“…As the main global biodiversity database, a large proportion of the >1 billion records comprise observations rather than specimens (see below). Smith, Johnston & Lücking (2016) and Yesson et al (2007) discuss issues regarding data quality in GBIF, such as unreliable taxonomic identifications in the absence of voucher specimens, and non-global coverage of species distribution data. Verifying suspect occurrences through niche modelling, based on verified and geo-and DNA-referenced occurrences of the same species, is a step toward identifying unreliable records.…”
Section: Distribution Redundancy and Digitization Of Collectionsmentioning
confidence: 99%
“…As the main global biodiversity database, a large proportion of the >1 billion records comprise observations rather than specimens (see below). Smith, Johnston & Lücking (2016) and Yesson et al (2007) discuss issues regarding data quality in GBIF, such as unreliable taxonomic identifications in the absence of voucher specimens, and non-global coverage of species distribution data. Verifying suspect occurrences through niche modelling, based on verified and geo-and DNA-referenced occurrences of the same species, is a step toward identifying unreliable records.…”
Section: Distribution Redundancy and Digitization Of Collectionsmentioning
confidence: 99%
“…However, bioclimatic modelling of lichens has been used for a wide variety of purposes, and many such studies have explored the relationship of lichen distribution to baseline climate only, alongside a range of other covariables, and without projection to climate change scenarios (Table 1). Thus, lichen bioclimatic modelling has been used to test taxonomic hypotheses [32][33][34][35][36], improve understanding of threatened species [37][38][39][40][41][42][43][44][45] including through conservation design [46,47], identify indicator species [48][49][50][51], or to test and improve the practical application of bioclimatic methods [37,[52][53][54]. These studies at the baseline highlight three key interrelated decisions characterising the development process for any bioclimatic model: 1.…”
Section: Bioclimatic Analysis Of Lichensmentioning
confidence: 99%
“…Lichen distributions are dynamic [55] and any shifts within this baseline period are discounted; there is an important balance to be struck between a time period that extends to provide a reliable distribution, minimising issues of spatial bias [56,57], while constraining this period to represent as stable a distribution as is possible with respect to prevailing climate. Furthermore, field occurrence records provide 'presence-only' data which present an additional statistical challenge that has been handled along a continuum, as follows: (i) by generating a constrained set of pseudo-absences [58,59] for use with standard forms of regression such as generalised linear or additive models [33,39] or with alternative methods that facilitate nested interactions such as classification and regression trees including random forest [33], (ii) using a controlled selection of 'background' pseudo-absence points as applied in MAXENT [34,35,40,[43][44][45], or alternatively (iii) using presence-only statistical methods that compare occurrences to the properties of an entire environmental 'background' [38,46,47]. Lichen bioclimatic models have thus used a rich variety of statistical techniques (Table 1), extending to include nonparametric multiplicative regression that has been applied to >50% of studies with abundance or presence-absence data [36,48,50,51].…”
Section: Bioclimatic Analysis Of Lichensmentioning
confidence: 99%
See 1 more Smart Citation
“…The responsibility for accurate initial specimen identification and sequence data quality lies solely with the data generator for standard submissions to GenBank and there is often very little information upon which veracity can be assessed (e.g., voucher specimen meta-data, raw sequencing data files). Reliance on data that has not been derived from well curated and vouchered specimens poses risks of misidentification, an issue that is well recognized for GenBank data (Crocetta et al, 2015;Smith et al, 2016;Balakirev et al, 2017). A recent study of fish barcode sequences found evidence for potential errors in ∼4% of sequences (Li et al, 2018).…”
Section: Introductionmentioning
confidence: 99%