2022
DOI: 10.3897/bdj.10.e89481
|View full text |Cite
|
Sign up to set email alerts
|

BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain

Abstract: Biodiversity is the assortment of life on earth covering evolutionary, ecological, biological, and social forms. To preserve life in all its variety and richness, it is imperative to monitor the current state of biodiversity and its change over time and to understand the forces driving it. This need has resulted in numerous works being published in this field. With this, a large amount of textual data (publications) and metadata (e.g. dataset description) has been generated. To support the management and analy… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(8 citation statements)
references
References 21 publications
(33 reference statements)
0
4
0
Order By: Relevance
“…It consists of abstract-length documents: 250 from BHL, 150 from journal articles, and 100 from government reports. The most recent addition to the list of biodiversity corpora is BiodivNERE (Abdelmageed et al, 2022 ). Drawn from biodiversity dataset metadata and abstracts, it comes with two datasets, one supporting NER and the other supporting Relation Extraction (RE).…”
Section: Related Workmentioning
confidence: 99%
“…It consists of abstract-length documents: 250 from BHL, 150 from journal articles, and 100 from government reports. The most recent addition to the list of biodiversity corpora is BiodivNERE (Abdelmageed et al, 2022 ). Drawn from biodiversity dataset metadata and abstracts, it comes with two datasets, one supporting NER and the other supporting Relation Extraction (RE).…”
Section: Related Workmentioning
confidence: 99%
“…Some models have been pre-trained using scientific corpora such as SciBERT, trained on a random sample of 1. Despite their demonstrated use in biomedical sciences, large language models are just beginning to be adopted in ecology and evolution [1,15,53,54], and to our knowledge there is currently only one large language model, BiodivBERT, trained explicitly on biodiversity-related texts [53].…”
Section: -Language Models: Deep Learning In Nlpmentioning
confidence: 99%
“…Despite many promising applications of NLP in ecology [1,18], we currently have few domain-specific NLP tools [15,53]. This is in part because ecology is a low-resource domain [57], meaning there are few large open-access text databases available for training foundation models [1], and there are few gold standard databases with task-specific labels needed for supervised learning.…”
Section: State-of-the-art In Ecology and Evolution: Learning In A Low...mentioning
confidence: 99%
See 1 more Smart Citation
“…Previous approaches in ecology have focused heavily on named entity recognition of species and taxonomy (Akella et al., 2012; Gerner et al., 2010; Le Guillarme & Thuiller, 2022; Millard et al., 2020) as well as geographical locations or population trends (Cornford et al., 2022) but also include document classification (Cornford et al., 2021) and relation extraction (Kaur et al., 2019). Furthermore, various gold standard databases of species names and taxonomy have been published to aid the evaluation of NER approaches in ecology (Abdelmageed et al., 2022; Nguyen et al., 2019).…”
Section: Introductionmentioning
confidence: 99%