1998
DOI: 10.1145/267954.267957
|View full text |Cite
|
Sign up to set email alerts
|

Corpus-based stemming using cooccurrence of word variants

Abstract: Stemming is used in many information retrieval (IR) systems to reduce variant word forms to common roots. It is one of the simplest applications of natural-language processing to IR and is one of the most effective in terms of user acceptance and consistency, though small retrieval improvements. Current stemming techniques do not, however, reflect the language use in specific corpora, and this can lead to occasional serious retrieval failures. We propose a technique for using corpus-based word variant cooccurr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
125
0
3

Year Published

2001
2001
2010
2010

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 229 publications
(134 citation statements)
references
References 4 publications
0
125
0
3
Order By: Relevance
“…Xu and Croft introduce the use of co-occurrence data to improve stemming algorithms (Xu and Croft, 1998). The premise of the system described in this paper is to use contextual (e.g., co-occurrence) information to improve the equivalence classes produced by an aggressive stemmer, such as the Porter stemmer.…”
Section: Discussionmentioning
confidence: 99%
“…Xu and Croft introduce the use of co-occurrence data to improve stemming algorithms (Xu and Croft, 1998). The premise of the system described in this paper is to use contextual (e.g., co-occurrence) information to improve the equivalence classes produced by an aggressive stemmer, such as the Porter stemmer.…”
Section: Discussionmentioning
confidence: 99%
“…were not separated out. This was referred to as the stringing effect in [13]. The proposed method split all of them into separate classes.…”
Section: Resultsmentioning
confidence: 99%
“…While stemming schemes are normally designed to work with general texts, some may also be especially designed for a specific domain (e.g., in medicine) or a given document collection, such as that developed by Xu and Croft (1998), which used a corpus-based approach. This more closely reflects language usage (including word frequencies and other co-occurrence statistics), instead of a set of morphological rules in which the frequency of each rule (and therefore its underlying importance) is not precisely known.…”
Section: Related Workmentioning
confidence: 99%