E-NER — An Annotated Named Entity Recognition Corpus of Legal Text

Terence, Au, Ting Wai; Lampos, Vasileios; Cox, Ingemar J.

doi:10.18653/v1/2022.nllp-1.22

Cited by 3 publications

(2 citation statements)

References 34 publications

(36 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This has given rise to a number of adaptation techniques (Daumé III, 2007;Wiese et al, 2017;Ma et al, 2019;Cooper Stickland et al, 2021;Grangier and Iter, 2022;Ludwig et al, 2022). In the pretrain-fine-tune paradigm, for pretrained models to generalize over a task in a specific domain, it is advised to fine-tune them on domain-specific datasets, which requires domain-specific annotated resources (Tsatsaronis et al, 2015;Zhu et al, 2022;Au et al, 2022;Li et al, 2021). In this paper, we test whether in-domain pretraining improves performance on a domain-specific task, but we additionally try to gain a better understanding on these models' weaknesses by examining their generalization abilities.…”

Section: Domain Adaptationmentioning

confidence: 99%

“…In particular, Named Entity Recognition (NER) is a canonical information extraction task which consists of detecting text spans and classifying them into a predetermined set of entity types (Tjong Kim Sang and De Meulder, 2003;Lample et al, 2016a;Chiu and Nichols, 2016;Ni et al, 2017). In the past decade, numerous bench-mark datasets have enabled researchers to compare and improve the performances of NER models within specific domains such as science (Luan et al, 2018), medicine (Jin and Szolovits, 2018), law (Au et al, 2022), finance (Salinas Alvarado et al, 2015), and social media (Ushio et al, 2022); in some cases, these datasets have spanned multiple domains (Liu et al, 2020b) or languages (Tjong Kim Sang and De Meulder, 2003). Such datasets are crucial for building models capable of handling a wide range of downstream applications.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

EconBERTa: Towards Robust Extraction of Named Entities in Economics

Lasri,

de Castro,

Schirmer

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Adapting general-purpose language models has proven to be effective in tackling downstream tasks within specific domains. In this paper, we address the task of extracting entities from the economics literature on impact evaluation. To this end, we release EconBERTa, a large language model pretrained on scientific publications in economics, and ECON-IE, a new expert-annotated dataset of economics abstracts for Named Entity Recognition (NER). We find that EconBERTa reaches state-of-the-art performance on our downstream NER task. Additionally, we extensively analyze the model's generalization capacities, finding that most errors correspond to detecting only a subspan of an entity or failure to extrapolate to longer sequences. This limitation is primarily due to an inability to detect part-of-speech sequences unseen during training, and this effect diminishes when the number of unique instances in the training set increases. Examining the generalization abilities of domain-specific language models paves the way towards improving the robustness of NER models for causal knowledge extraction.

show abstract

Section: Domain Adaptationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

EconBERTa: Towards Robust Extraction of Named Entities in Economics

Lasri,

de Castro,

Schirmer

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

show abstract

Deep learning-based automatic analysis of legal contracts: a named entity recognition benchmark

Aejas,

Belhi,

Zhang

et al. 2024

Neural Comput & Applic

View full text Add to dashboard Cite

Named Entity Recognition Datasets: A Classification Framework

Zhang,

Xiao

2024

Int J Comput Intell Syst

View full text Add to dashboard Cite

Named entity recognition as a fundamental task plays a crucial role in accomplishing some of the tasks and applications in natural language processing. In the age of Internet information, as far as computer applications are concerned, a huge proportion of information is stored in structured and unstructured forms and used for language and text processing. Before neural networks were widely used in natural language processing tasks, research in the field of named entity recognition usually focused on leveraging lexical and syntactic knowledge to improve the performance of models or methods. To promote the development of named entity recognition, researchers have been creating named entity recognition datasets through conferences, projects, and competitions for many years, based on various research goals, and training entity recognition models with increasing accuracy on this basis. However, there has not been much exploration of named entity recognition datasets. Particularly, there have been many datasets available since the introduction of the named entity recognition task, but there is no clear framework to summarize the development of these seemingly independent datasets. A closer look at the context of the development of each dataset and the features it contains reveals that these datasets share some common features to varying degrees. In this thesis, we review the development of named entity recognition datasets over the years and describe them in terms of the language of the dataset, the domain of research, the type of entity, the granularity of the entity, and the annotation of the entity. Finally, we provide an idea for the creation of subsequent named entity recognition datasets.

show abstract

E-NER — An Annotated Named Entity Recognition Corpus of Legal Text

Cited by 3 publications

References 34 publications

EconBERTa: Towards Robust Extraction of Named Entities in Economics

EconBERTa: Towards Robust Extraction of Named Entities in Economics

Deep learning-based automatic analysis of legal contracts: a named entity recognition benchmark

Named Entity Recognition Datasets: A Classification Framework

Contact Info

Product

Resources

About