Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics - ACL '04 2004
DOI: 10.3115/1218955.1218991
|View full text |Cite
|
Sign up to set email alerts
|

Finding predominant word senses in untagged text

Abstract: In word sense disambiguation (WSD), the heuristic of choosing the most common sense is extremely powerful because the distribution of the senses of a word is often skewed. The problem with using the predominant, or first sense heuristic, aside from the fact that it does not take surrounding context into account, is that it assumes some quantity of handtagged data. Whilst there are a few hand-tagged corpora available for some languages, one would expect the frequency distribution of the senses of words, particu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

5
233
1
1

Year Published

2006
2006
2017
2017

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 193 publications
(241 citation statements)
references
References 17 publications
(17 reference statements)
5
233
1
1
Order By: Relevance
“… Minimum frequency of 50 on SemCor corpus. As McCarthy [9] pointed out, SemCor comprises a relatively small sample of words. Consequently, there are words where the first sense in WordNet is counter-intuitive.…”
Section: Selecting Test Wordsmentioning
confidence: 99%
“… Minimum frequency of 50 on SemCor corpus. As McCarthy [9] pointed out, SemCor comprises a relatively small sample of words. Consequently, there are words where the first sense in WordNet is counter-intuitive.…”
Section: Selecting Test Wordsmentioning
confidence: 99%
“…While the researchers have started exploring the temporal and spatial scopes of word senses (Cook and Stevenson, 2010;Gulordava and Baroni, 2011;Kulkarni et al, 2015;Jatowt and Duh, 2014;Mitra et al, 2014;Mitra et al, 2015), corpora-specific senses have remained mostly unexplored. Our contributions: Motivated by the above applications, this paper studies corpora-specific senses for the first time and makes the following contributions 1 : (i) we take two different meth- 1 The code and evaluation results are available at: http: //tinyurl.com/h4onyww ods for novel sense discovery (Mitra et al, 2014;Lau et al, 2014) and one for predominant sense identification (McCarthy et al, 2004) and adapt these in an automated and unsupervised manner to identify corpus-specific sense for a given word (noun), and (ii) perform a thorough manual evaluation to rigorously compare the corpus-specific senses obtained using these methods. Manual evaluation conducted using 60 candidate words for each method indicates that ∼45-60% of the corpus-specific senses identified by the adapted algorithms are genuine.…”
Section: Introductionmentioning
confidence: 99%
“…McCarthy et al (2004) present one particularly promising approach. Given a corpus that includes multiple occurrences of a particular target noun, they use Lin's (1998) distributional method to identify a set of word types that are contextually and syntactically related to that target word.…”
Section: Distributional Methodsmentioning
confidence: 99%
“…As shown by McCarthy et al (2004), a method of disambiguation that relies on identifying the most frequent sense of a word for a particular domain can perform nearly as well as systems that are based on manually sense-tagged examples, and better than unsupervised systems that are based on un-annotated corpora or knowledge-rich resources.…”
Section: Clustering By Committee (Cbc)mentioning
confidence: 99%