Proceedings of the Student Research Workshop at the 15th Conference Of the European Chapter of the Association for Co 2017
DOI: 10.18653/v1/e17-4012
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating the Reliability and Interaction of Recursively Used Feature Classes for Terminology Extraction

Abstract: Feature design and selection is a crucial aspect when treating terminology extraction as a machine learning classification problem. We designed feature classes which characterize different properties of terms, and propose a new feature class for components of term candidates. By using random forests, we infer optimal features which are later used to build decision tree classifiers. We evaluate our method using the ACL RD-TEC dataset. We demonstrate the importance of the novel feature class for downgrading term… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
3
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 18 publications
(11 reference statements)
1
3
0
Order By: Relevance
“…The categories Cosine, Domain and Head perform worst and do in most cases not even significantly improve over the baseline. The modifier features are better than the head features, which is in line with the results in (Hätty et al, 2017) where the modifier features are more important for detecting termhood than head features. For both the binary and the four-class tasks, the groups General, Compound and Contrastive perform best, with Compound as the winner for the binary task and Contrastive as the winner for the four-class task.…”
Section: Classification With Term and Compound Featuressupporting
confidence: 87%
See 1 more Smart Citation
“…The categories Cosine, Domain and Head perform worst and do in most cases not even significantly improve over the baseline. The modifier features are better than the head features, which is in line with the results in (Hätty et al, 2017) where the modifier features are more important for detecting termhood than head features. For both the binary and the four-class tasks, the groups General, Compound and Contrastive perform best, with Compound as the winner for the binary task and Contrastive as the winner for the four-class task.…”
Section: Classification With Term and Compound Featuressupporting
confidence: 87%
“…Contrastive Selection via Heads (CSvH) (Basili et al, 2001) is a corporacomparing measure that computes termhood for a complex term by biasing the termhood score with the general-language frequency of the head. Hätty et al (2017) combine termhood measures within a random forest classifier to extract single and multiword terms and apply the measures recursively to the components. Hätty and Schulte im Walde (2018) demonstrate that propagating constituent information through neural networks improves the prediction of compound termhood.…”
Section: Related Workmentioning
confidence: 99%
“…Statistical methods use statistical information, e.g., frequency of terms, to identify terms from a corpus (Frantzi et al, 2000;Nakagawa and Mori, 2002;Velardi et al, 2001;Drouin, 2003;Meijer et al, 2014). Machine learning methods learn a classifier, e.g., logistic regression classifier, with manually labeled data (Conrado et al, 2013;Fedorenko et al, 2014;Hätty et al, 2017). There also exists some work on automatic term extraction with Wikipedia (Vivaldi et al, 2012;Wu et al, 2012).…”
Section: Related Workmentioning
confidence: 99%
“…None of the existing work satisfies all of them. Previously, research has been conducted on automatic keyword extraction (Hätty et al, 2017;Meng et al, 2017;Alzaidy et al, 2019;Wang et al, 2020) and phrase mining (Liu et al, 2015;Shang et al, 2018). However, their main focus is to extract terms from single/multiple documents without considering whether the extracted terms are distinctive to a target domain contrastive with a context.…”
Section: Introductionmentioning
confidence: 99%