Predicting Degrees of Technicality in Automatic Terminology Extraction

Hätty, Anna; Schlechtweg, Dominik; Dorna, Michael; Walde, Sabine Schulte im

doi:10.18653/v1/2020.acl-main.258

Cited by 16 publications

(15 citation statements)

References 22 publications

(25 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We adapt the following models on relevant tasks in our setting with additional inputs (e.g., domain-specific corpora): (Amjadian et al, 2016(Amjadian et al, , 2018). • Multi-Channel (MC): Multi-Channel (Hätty et al, 2020) is the state-of-the-art model for automatic term extraction, which is based on a multi-channel neural network that takes domainspecific and general corpora as input.…”

Section: Methodsmentioning

confidence: 99%

Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach

Huang¹,

Chang²,

Xiong³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

We propose to measure fine-grained domain relevancethe degree that a term is relevant to a broad (e.g., computer science) or narrow (e.g., deep learning) domain. Such measurement is crucial for many downstream tasks in natural language processing. To handle longtail terms, we build a core-anchored semantic graph, which uses core terms with rich description information to bridge the vast remaining fringe terms semantically. To support a finegrained domain without relying on a matching corpus for supervision, we develop hierarchical core-fringe learning, which learns core and fringe terms jointly in a semi-supervised manner contextualized in the hierarchy of the domain. To reduce expensive human efforts, we employ automatic annotation and hierarchical positive-unlabeled learning. Our approach applies to big or small domains, covers head or tail terms, and requires little human effort. Extensive experiments demonstrate that our methods outperform strong baselines and even surpass professional human performance. 1

show abstract

Section: Methodsmentioning

confidence: 99%

Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach

Huang¹,

Chang²,

Xiong³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

show abstract

“…Ha¨tty proposes two novel models to exploit general-vs. domain-specific comparisons: a simple neural network model with pre-computed comparativeembedding information as input and a multi-channel model computing the comparison internally. Both models outperform previous approaches, with the multichannel model performing at the optimum level (Ha¨tty, Schlechtweg, Dorna, & Im Walde, 2020). Among these methods, Long-Short Term Memory Network (LSTM) (Zhao, Du, & Shi, 2018) and CRF (Wang, Wang, Deng, & Wu, 2016) and their variants achieve the best performance.…”

Section: Related Workmentioning

confidence: 99%

A Pattern and POS Auto-Learning Method for Terminology Extraction from Scientific Text

Shao

Bolin

Song

2021

Data and Information Management

View full text Add to dashboard Cite

A lot of new scientific documents are being published on various platforms every day. It is more and more imperative to quickly and efficiently discover new words and meanings from these documents. However, most of the related works rely on labeled data, and it is quite difficult to deal with unlabeled new documents efficiently. For this, we have introduced an unsupervised method based on sentence patterns and part of speech (POS) sequences. Our method just needs a few initial learnable patterns to obtain the initial terminology tokens and their POS sequences. In this process, new patterns are constructed and can match more sentences to find more POS sequences of terminology. Finally, we use obtained POS sequences and sentence patterns to extract terminology terms in new scientific text. Experiments on paper abstracts from Web of Knowledge show that this method is practical and can achieve a good performance on our test data.

show abstract

“…Several approaches build on word embed-dings to perform ATE on specific domains, such as medicine (e.g. Bay et al, 2020), or to separate general-language from domain-specific embeddings (Hätty et al, 2020). In contrast, our models perform ATE on four domains and in three languages utilizing a pretrained language and a pretrained NMT model.…”

Section: Related Workmentioning

confidence: 99%

Transforming Term Extraction: Transformer-Based Approaches to Multilingual Term Extraction Across Domains

Lang¹,

Wachowiak²,

Heinisch³

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

Automated Term Extraction (ATE), even though well-investigated, continues to be a challenging task. Approaches conventionally extract terms on corpus or document level and the benefits of neural models still remain underexplored with very few exceptions. We introduce three transformer-based term extraction models operating on sentence level: a language model for token classification, one for sequence classification, and an innovative use of Neural Machine Translation (NMT), which learns to reduce sentences to terms. All three models are trained and tested on the dataset of the ATE challenge TermEval 2020 in English, French, and Dutch across four specialized domains. The two best performing approaches are also evaluated on the ACL RD-TEC 2.0 dataset. Our models outperform previous baselines, one of which is BERT-based, by a substantial margin, with the token-classifier language model performing best.

show abstract

Predicting Degrees of Technicality in Automatic Terminology Extraction

Cited by 16 publications

References 22 publications

Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach

Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach

A Pattern and POS Auto-Learning Method for Terminology Extraction from Scientific Text

Transforming Term Extraction: Transformer-Based Approaches to Multilingual Term Extraction Across Domains

Contact Info

Product

Resources

About