Proceedings of ACL-2016 System Demonstrations 2016
DOI: 10.18653/v1/p16-4003
|View full text |Cite
|
Sign up to set email alerts
|

Terminology Extraction with Term Variant Detection

Abstract: We introduce, TermSuite, a JAVA and UIMA-based toolkit to build terminologies from corpora. TermSuite follows the classic two steps of terminology extraction tools, the identification of term candidates and their ranking, but implements new features. It is multilingually designed, scalable, and handles term variants. We focus on the main components: UIMA Tokens Regex for defining term and variant patterns over word annotations, and the grouping component for clustering terms and variants that works both at mor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
32
0
1

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 33 publications
(34 citation statements)
references
References 8 publications
0
32
0
1
Order By: Relevance
“…A Standard Term Extraction Measure We selected one of the simplest standard contrastive term extraction measures, the Weirdness Ratio (WEIRD) (Ahmad et al, 1994), which is still commonly used or adapted (Moreno-Ortiz and Fernández-Cruz, 2015;Cram and Daille, 2016;Roesiger et al, 2016;Hätty et al, 2017, i.a.). It encompasses just the basic ingredients for termhood prediction, a comparison of word frequencies in relation to corpus sizes: where f spec and f gen correspond to the frequencies of a term candidate x in a general and a domain-specific corpus, and s spec and s gen are the respective sizes of the corpora.…”
Section: Incorporating Meaning Shifts Into Automatic Term Extractionmentioning
confidence: 99%
“…A Standard Term Extraction Measure We selected one of the simplest standard contrastive term extraction measures, the Weirdness Ratio (WEIRD) (Ahmad et al, 1994), which is still commonly used or adapted (Moreno-Ortiz and Fernández-Cruz, 2015;Cram and Daille, 2016;Roesiger et al, 2016;Hätty et al, 2017, i.a.). It encompasses just the basic ingredients for termhood prediction, a comparison of word frequencies in relation to corpus sizes: where f spec and f gen correspond to the frequencies of a term candidate x in a general and a domain-specific corpus, and s spec and s gen are the respective sizes of the corpora.…”
Section: Incorporating Meaning Shifts Into Automatic Term Extractionmentioning
confidence: 99%
“…First we extract the terms that are most relevant to the domain, a task referred to as automatic term recognition (ATR). Current approaches to this task have employed a varied suite of methods for extracting terms from text based on parts of speech and metrics for assessing 'termhood' [15,29], domain modelling [11], and the composition of multiple metrics in an unsupervised manner [5]. More recently, these methods have been combined into off-the-shelf tools such as ATR4S [7] and JATE [31], and our system is a similar implementation to ATR4S.…”
Section: Related Workmentioning
confidence: 99%
“…Let V be the vocabulary of input seed terms (e.g., apple, orange, and Spain in Figure 4); H is the noisy hypernym graph constructed in Section 2.2 (cf. Figure 4(a)); w(x,y) is the weight of the edge x→y in H ; Dx is the set of descendants of term x in H (e.g., apple is a descendant of food); R is the set of given roots 6 (e.g., food in Figure 4). The construction of the flow network F proceeds as follows (cf.…”
Section: Taxonomy Constructionmentioning
confidence: 99%
“…If the vocabulary contains only accurate terms, α is set to 1. For a given α, we run the network simplex algorithm with d=α⋃︀ V ⋃︀ to compute 6 If roots are not provided, a small set of upper terms can be used as roots [38]. the minimum-cost flow for F .…”
Section: Taxonomy Constructionmentioning
confidence: 99%