2015
DOI: 10.1186/s13173-015-0025-0
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation of cutoff policies for term extraction

Abstract: Background: This paper presents a policy to choose cutoff points to identify potentially relevant terms in a given domain. Term extraction methods usually generate term lists ordered according to a relevance criteria, and the literature is abundant to offer different relevance indices. However, very few studies turn their attention to how many terms should be kept, i.e., to a cutoff policy. Methods: Our proposed policy provides an estimation of the portion of this list which preserves a good balance between re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
3
3
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 23 publications
0
3
0
Order By: Relevance
“…This has been a standard methodology for some time (Daille, 1994) and is still used by state-ofthe-art systems such as TermoStat (Drouin, 2003) and TExSIS (Macken et al, 2013). However, the problem with these methodologies is determining the cut-off point (Lopes and Vieira, 2015) and combining multiple features (e.g., separate measures for termhood and unithood). It has become clear that multiple evidence (i.e.…”
Section: Introductionmentioning
confidence: 99%
“…This has been a standard methodology for some time (Daille, 1994) and is still used by state-ofthe-art systems such as TermoStat (Drouin, 2003) and TExSIS (Macken et al, 2013). However, the problem with these methodologies is determining the cut-off point (Lopes and Vieira, 2015) and combining multiple features (e.g., separate measures for termhood and unithood). It has become clear that multiple evidence (i.e.…”
Section: Introductionmentioning
confidence: 99%
“…Approaches, tools, algorithms, and methods for automatic term extraction: A systematic literature mapping 2015 Gaizauskas et al (2015), Khumalo (2015), Saneifar et al (2015), Gupta (2015), Pan and Zhao (2015), Kochetkova (2015), Lopes and Vieira (2015), Periñán-Pascual (2015), Gonçalves et al (2015), Guo et al (2015), Lahbib et al (2015), Astrakhantsev et al (2015), Bakar et al (2015), Liu et al (2015) 12,39% 2016…”
Section: Resultsmentioning
confidence: 99%
“…Manual evaluation of the entire sorted list would avoid the removal of real terms with low C-values, but it might be too laborious especially for large corpora. To minimize both laborious work and the number of true terms wrongly discarded, this study adopts a relative cut-o policy proposed by Lopes and Vieira (2015) which is based on the optimal trade-o point between wrong discard of true domain terms and wrong inclusion of irrelevant ones. The policy suggests that the bottom 85% of the ranked list should be discarded.…”
Section: Np Ranking and Term Selectionmentioning
confidence: 99%