SeqScore: Addressing Barriers to Reproducible Named Entity Recognition Evaluation

Palen-Michel, Chester; Holley, Nolan; Lignos, Constantine

doi:10.18653/v1/2021.eval4nlp-1.5

Cited by 4 publications

(4 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This section describes the features of SeqScore, focusing on the newest features that enable it to assist in many NER data workflows. Previous work (Palen-Michel et al, 2021) has described the scoring features of SeqScore, so they are not discussed in detail in this paper. SeqScore is released via PyPI (https://pypi.org/project/seqscore/) and development occurs on GitHub (https:// github.com/bltlab/seqscore).…”

Section: Seqscore's Featuresmentioning

confidence: 99%

“…Seq-Score supports several options to work with a wide variety of data files: setting the file encoding (older files often use ISO-8859-1), ignoring comment lines (which some files use for sentence provenance information), and automatic detection of field delimiters (older files use space, newer ones use tabs). Different strategies can be set regarding how to deal with invalid label transitions like O I-PER in BIO (for more details see Palen-Michel et al, 2021). SeqScore can maintain or discard the document boundaries specified using -DOCSTART-sentences inside CoNLL-format files, which enables scoring a reference with document boundaries against system output without them.…”

Section: Overviewmentioning

confidence: 99%

“…This paper describes the SeqScore toolkit and its applications for validating, summarizing, and transforming NER data. A previous publication (Palen-Michel et al, 2021), introduced SeqScore and described its value as a reproducibility-focused NER scorer. While this paper is also about Seq-Score, it has a different focus.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)

2023

View full text Add to dashboard Cite

We introduce calamanCy, an open-source toolkit for constructing natural language processing (NLP) pipelines for Tagalog. It is built on top of spaCy, enabling easy experimentation and integration with other frameworks. calamanCy addresses the development gap by providing a consistent API for building NLP applications and offering general-purpose multitask models with out-of-the-box support for dependency parsing, parts-of-speech (POS) tagging, and named entity recognition (NER). calamanCy aims to accelerate the progress of Tagalog NLP by consolidating disjointed resources in a unified framework. The cala-manCy toolkit is available on GitHub: https: //github.com/ljvmiranda921/calamanCy.

show abstract

Section: Seqscore's Featuresmentioning

confidence: 99%

Section: Overviewmentioning

confidence: 99%

See 1 more Smart Citation

Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)

2023

View full text Add to dashboard Cite

show abstract

“…Evaluation for all models required extracted spans to match the annotation exactly in span and type to be correct. Evaluation was performed with SeqScore (Palen-Michel et al, 2021), using conlleval-style repair for invalid label sequences. All models were trained using an AMD 2990WX CPU and a single RTX 2080 Ti GPU.…”

Section: Licensingmentioning

confidence: 99%

Detecting Unassimilated Borrowings in Spanish: An Annotated Corpus and Approaches to Modeling

Álvarez-Mellado¹,

Lignos²

2022

Preprint

Self Cite

View full text Add to dashboard Cite

This work presents a new resource for borrowing identification and analyzes the performance and errors of several models on this task. We introduce a new annotated corpus of Spanish newswire rich in unassimilated lexical borrowings-words from one language that are introduced into another without orthographic adaptation-and use it to evaluate how several sequence labeling models (CRF, BiLSTM-CRF, and Transformer-based models) perform. The corpus contains 370,000 tokens and is larger, more borrowing-dense, OOV-rich, and topic-varied than previous corpora available for this task. Our results show that a BiLSTM-CRF model fed with subword embeddings along with either Transformerbased embeddings pretrained on codeswitched data or a combination of contextualized word embeddings outperforms results obtained by a multilingual BERT-based model.

show abstract

Reproducibility in Named Entity Recognition: A Case Study Analysis

Cuevas Villarmin,

Cohen-Boulakia,

Naderi

2024

2024 IEEE 20th International Conference on E-Science (E-Science)

View full text Add to dashboard Cite

SeqScore: Addressing Barriers to Reproducible Named Entity Recognition Evaluation

Cited by 4 publications

References 28 publications

Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)

Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)

Detecting Unassimilated Borrowings in Spanish: An Annotated Corpus and Approaches to Modeling

Reproducibility in Named Entity Recognition: A Case Study Analysis

Contact Info

Product

Resources

About