2024
DOI: 10.1038/s41598-023-50179-0
|View full text |Cite
|
Sign up to set email alerts
|

A novel automated label data extraction and data base generation system from herbarium specimen images using OCR and NER

Atsuko Takano,
Theodor C. H. Cole,
Hajime Konagai

Abstract: Digital extraction of label data from natural history specimens along with more efficient procedures of data entry and processing is essential for improving documentation and global information availability. Herbaria have made great advances in this direction lately. In this study, using optical character recognition (OCR) and named entity recognition (NER) techniques, we have been able to make further advancements towards fully automatic extraction of label data from herbarium specimen images. This system can… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 21 publications
0
0
0
Order By: Relevance
“…At the same time, through online databases or other shared resources (Lendemer et al 2020;Monfils et al 2022;Hardisty et al 2023), digitisation makes museum collections more accessible to the wider scientific community and to researchers from disadvantaged or distant countries who may not have the opportunity to see the specimen in person (open science concept). The Extended Specimen Approach (ESA; Webster 2017; Lendemer et al 2020) is a method of digitisation that goes beyond the physical specimen, e.g., photographs, X-rays, CT scans (Stoev et al 2013;Akkari et al 2015Akkari et al , 2018, but also includes all its attributes, such as historical information stored in the collection in the form of acquisition and inventory books, inventory cards and labels (Haston et al 2012;Albano et al 2018;Price et al 2018;Zahiri et al 2021;Bogutskaya et al 2022;Takano et al 2024).…”
Section: Introductionmentioning
confidence: 99%
“…At the same time, through online databases or other shared resources (Lendemer et al 2020;Monfils et al 2022;Hardisty et al 2023), digitisation makes museum collections more accessible to the wider scientific community and to researchers from disadvantaged or distant countries who may not have the opportunity to see the specimen in person (open science concept). The Extended Specimen Approach (ESA; Webster 2017; Lendemer et al 2020) is a method of digitisation that goes beyond the physical specimen, e.g., photographs, X-rays, CT scans (Stoev et al 2013;Akkari et al 2015Akkari et al , 2018, but also includes all its attributes, such as historical information stored in the collection in the form of acquisition and inventory books, inventory cards and labels (Haston et al 2012;Albano et al 2018;Price et al 2018;Zahiri et al 2021;Bogutskaya et al 2022;Takano et al 2024).…”
Section: Introductionmentioning
confidence: 99%