2018
DOI: 10.1093/database/bay103
|View full text |Cite
|
Sign up to set email alerts
|

Toward a service-based workflow for automated information extraction from herbarium specimens

Abstract: Over the past years, herbarium collections worldwide have started to digitize millions of specimens on an industrial scale. Although the imaging costs are steadily falling, capturing the accompanying label information is still predominantly done manually and develops into the principal cost factor. In order to streamline the process of capturing herbarium specimen metadata, we specified a formal extensible workflow integrating a wide range of automated specimen image analysis services. We implemented the workf… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
4
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
8

Relationship

1
7

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 23 publications
1
4
0
Order By: Relevance
“…Gathering the records for the plant occurrence data was a long-winded process, with each of the large databases presenting their own challenges: GBIF, for example, holds relatively few verified observations of naturally occurring Magnolia populations but offers excellent data transfer capabilities, Global Plants hosts a large number of records but makes data transfer challenging, whilst the Chinese Virtual Herbarium requires translation from Mandarin and an iterative process of positive identification and filtering to derive accurate records. The archival research in herbaria was highly effective but corroborated reports that only a small fraction of plant records are hosted by online databases (Harris and Marsico, 2017;Kirchhoff et al, 2018), and as a result future applications of this methodology should factor in the extensive desktop research. By contrast, climate data was straightforward to derive, with the principal short-comings being the grain and accuracy of the data, especially when assessing urban environments in comparison to their rural hinterlands.…”
Section: Clear Differences Between Literature Sourcessupporting
confidence: 57%
“…Gathering the records for the plant occurrence data was a long-winded process, with each of the large databases presenting their own challenges: GBIF, for example, holds relatively few verified observations of naturally occurring Magnolia populations but offers excellent data transfer capabilities, Global Plants hosts a large number of records but makes data transfer challenging, whilst the Chinese Virtual Herbarium requires translation from Mandarin and an iterative process of positive identification and filtering to derive accurate records. The archival research in herbaria was highly effective but corroborated reports that only a small fraction of plant records are hosted by online databases (Harris and Marsico, 2017;Kirchhoff et al, 2018), and as a result future applications of this methodology should factor in the extensive desktop research. By contrast, climate data was straightforward to derive, with the principal short-comings being the grain and accuracy of the data, especially when assessing urban environments in comparison to their rural hinterlands.…”
Section: Clear Differences Between Literature Sourcessupporting
confidence: 57%
“…Furthermore, machine learning technologies (e.g. OCR, optical character recognition) can be applied to the extraction of metadata from herbarium specimens, for example, collectors names (Silva, 2016), plant traits from textual descriptions (Dagtekin et al ., 2018), and label content (Heidorn & Wei, 2008; Kirchhoff et al ., 2018; Walton et al ., 2020), enhancing integrative and cross‐linked research on complex phenomena. Tiered data gathering and analysis from – to date – unconnected data expand the information that herbarium collections can provide (Soltis et al ., 2018; Lendemer et al ., 2019; Theeten et al ., 2019).…”
Section: The Emerging Role Of Machine Learning In Extracting Informat...mentioning
confidence: 99%
“…not actually implemented) (e.g. Haston et al, 2015 ; Kirchhoff et al, 2018 ; Moen et al, 2010 ). Some investigations (e.g.…”
Section: Introductionmentioning
confidence: 99%