2001
DOI: 10.1075/nlp.2.16nak
|View full text |Cite
|
Sign up to set email alerts
|

Experimental evaluation of ranking and selection methods in term extraction

Abstract: An automatic term extraction system consists of a term candidate extraction subsystem, a ranking subsystem and a selection subsystem. In this paper, we experimentally evaluate two ranking methods and two selection methods. As for ranking, a dichotomy of unithood and termhood is a key notion. We evaluate these two notions experimentally by comparing Imp based ranking method that is based directly on termhood and C-value based method that is indirectly based on both termhood and unithood. As for selection, we co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2002
2002
2022
2022

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 13 publications
(4 reference statements)
0
7
0
Order By: Relevance
“…Following the assumption that a multi-word term carries a key concept and is thus expected to behave like an atomic text unit, various statistical measures are applied to explore such unity or structural stability, termed "unithood" in Kageura and Umino (1996). Among them the popular ones are mutual information (MI) (Church and Hanks 1990;Damerau 1993), T-test (Church et al 1991), log-likelihood ratio (Dunning 1993), C-value (Frantzi and Ananiadou 1996) and NC-value (Frantzi et al 1998(Frantzi et al , 2000, and imp function (Nakagawa 2001a(Nakagawa , 2001b, which is reformulated as GM function in Nakagawa and Mori (2003). Xu et al (2002) apply a modified tf-idf measure (Salton 1992), named KFIDF, to identify domain relevant single-word terms from a collection of classified documents.…”
Section: Statistical Approachmentioning
confidence: 99%
See 1 more Smart Citation
“…Following the assumption that a multi-word term carries a key concept and is thus expected to behave like an atomic text unit, various statistical measures are applied to explore such unity or structural stability, termed "unithood" in Kageura and Umino (1996). Among them the popular ones are mutual information (MI) (Church and Hanks 1990;Damerau 1993), T-test (Church et al 1991), log-likelihood ratio (Dunning 1993), C-value (Frantzi and Ananiadou 1996) and NC-value (Frantzi et al 1998(Frantzi et al , 2000, and imp function (Nakagawa 2001a(Nakagawa , 2001b, which is reformulated as GM function in Nakagawa and Mori (2003). Xu et al (2002) apply a modified tf-idf measure (Salton 1992), named KFIDF, to identify domain relevant single-word terms from a collection of classified documents.…”
Section: Statistical Approachmentioning
confidence: 99%
“…The syntactic patterns are first applied to identify term candidates, by filtering out those unqualified ones, and then a statistical measure is applied to validate the true terms among them. For example, the imp function (Nakagawa 2001a(Nakagawa , 2001b is applied only to noun compounds each consisting of a number of simple nouns. It calculates the termhood of a compound candidate in terms of the termhood of its component nouns, which is measured by the number of nouns to conjoin with it to make compounds in a given corpus.…”
Section: Statistical Approachmentioning
confidence: 99%
“…Other methods: in addition to the methods described above, other statistical association measures such as dice coefficient, odds ratio and Jaccard (J), Normalized Expectation (NE), Mutual Dependency (MD), and Mutual Expectation (ME) are also used. These methods are widely used in the collocation extraction [6]- [9], [17], [24], [25], [32], [34]. These methods are formulated below: ; ;…”
Section: Log Likelihood Ratio (Llr)mentioning
confidence: 99%
“…The results show that the new method significantly improves the performance of multiword expression extraction in comparison with a classic MI extraction method. Chakraborty [24] and Dandapat, Mitra et al [25] have used statistical measurements to extract Noun-Noun (N-N) and Noun-Verb (N-V) collocations as MWE in Bengali Corpus respectively. Kunchukuttan and Damani [26] developed a system for Hindi compound noun MWE extraction from a Hindi corpus.…”
Section: Related Workmentioning
confidence: 99%
“…Other methods: in addition to the methods described above, other statistical association measures such as dice coefficient, odds ratio and Jaccard (J), Normalized Expectation (NE), Mutual Dependency (MD), and Mutual Expectation (ME) are also used. These methods are widely used in the collocation extraction [6]- [9], [17], [24], [25], [32], [34]. These methods are formulated below: ; ;…”
Section: Chi-square Test ( -Test )mentioning
confidence: 99%