Terminology Extraction with Term Variant Detection

Cram, Damien; Daille, Béatrice

doi:10.18653/v1/p16-4003

Cited by 33 publications

(34 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…A Standard Term Extraction Measure We selected one of the simplest standard contrastive term extraction measures, the Weirdness Ratio (WEIRD) (Ahmad et al, 1994), which is still commonly used or adapted (Moreno-Ortiz and Fernández-Cruz, 2015;Cram and Daille, 2016;Roesiger et al, 2016;Hätty et al, 2017, i.a.). It encompasses just the basic ingredients for termhood prediction, a comparison of word frequencies in relation to corpus sizes: where f spec and f gen correspond to the frequencies of a term candidate x in a general and a domain-specific corpus, and s spec and s gen are the respective sizes of the corpora.…”

Section: Incorporating Meaning Shifts Into Automatic Term Extractionmentioning

confidence: 99%

SURel: A Gold Standard for Incorporating Meaning Shifts into Term Extraction

Hätty

Schlechtweg

Walde

2019

Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

View full text Add to dashboard Cite

We introduce SURel, a novel dataset for German with human-annotated meaning shifts between general-language and domain-specific contexts. We show that meaning shifts of term candidates cause errors in term extraction, and demonstrate that the SURel annotation reflects these errors. Furthermore, we illustrate that SURel enables us to assess optimisations of term extraction techniques when incorporating meaning shifts.

show abstract

Section: Incorporating Meaning Shifts Into Automatic Term Extractionmentioning

confidence: 99%

SURel: A Gold Standard for Incorporating Meaning Shifts into Term Extraction

Hätty

Schlechtweg

Walde

2019

Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

View full text Add to dashboard Cite

show abstract

“…First we extract the terms that are most relevant to the domain, a task referred to as automatic term recognition (ATR). Current approaches to this task have employed a varied suite of methods for extracting terms from text based on parts of speech and metrics for assessing 'termhood' [15,29], domain modelling [11], and the composition of multiple metrics in an unsupervised manner [5]. More recently, these methods have been combined into off-the-shelf tools such as ATR4S [7] and JATE [31], and our system is a similar implementation to ATR4S.…”

Section: Related Workmentioning

confidence: 99%

Taxonomy Extraction for Customer Service Knowledge Base Construction

Pereira

Robin

Daudert

et al. 2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Customer service agents play an important role in bridging the gap between customers' vocabulary and business terms. In a scenario where organisations are moving into semi-automatic customer service, semantic technologies with capacity to bridge this gap become a necessity. In this paper we explore the use of automatic taxonomy extraction from text as a means to reconstruct a customer-agent taxonomic vocabulary. We evaluate our proposed solution in an industry use case scenario in the financial domain and show that our approaches for automated term extraction and using in-domain training for taxonomy construction can improve the quality of automatically constructed taxonomic knowledge bases.

show abstract

“…Let V be the vocabulary of input seed terms (e.g., apple, orange, and Spain in Figure 4); H is the noisy hypernym graph constructed in Section 2.2 (cf. Figure 4(a)); w(x,y) is the weight of the edge x→y in H ; Dx is the set of descendants of term x in H (e.g., apple is a descendant of food); R is the set of given roots 6 (e.g., food in Figure 4). The construction of the flow network F proceeds as follows (cf.…”

Section: Taxonomy Constructionmentioning

confidence: 99%

“…If the vocabulary contains only accurate terms, α is set to 1. For a given α, we run the network simplex algorithm with d=α⋃︀ V ⋃︀ to compute 6 If roots are not provided, a small set of upper terms can be used as roots [38]. the minimum-cost flow for F .…”

Section: Taxonomy Constructionmentioning

confidence: 99%

Taxonomy Induction Using Hypernym Subsequences

Gupta

Lebret

Harkous

et al. 2017

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

View full text Add to dashboard Cite

We propose a novel, semi-supervised approach towards domain taxonomy induction from an input vocabulary of seed terms. Unlike all previous approaches, which typically extract direct hypernym edges for terms, our approach utilizes a novel probabilistic framework to extract hypernym subsequences. Taxonomy induction from extracted subsequences is cast as an instance of the minimumcost flow problem on a carefully designed directed graph. Through experiments, we demonstrate that our approach outperforms stateof-the-art taxonomy induction approaches across four languages. Importantly, we also show that our approach is robust to the presence of noise in the input vocabulary. To the best of our knowledge, this robustness has not been empirically proven in any previous approach.

show abstract

Terminology Extraction with Term Variant Detection

Cited by 33 publications

References 8 publications

SURel: A Gold Standard for Incorporating Meaning Shifts into Term Extraction

SURel: A Gold Standard for Incorporating Meaning Shifts into Term Extraction

Taxonomy Extraction for Customer Service Knowledge Base Construction

Taxonomy Induction Using Hypernym Subsequences

Contact Info

Product

Resources

About