2005
DOI: 10.1007/11590316_108
|View full text |Cite
|
Sign up to set email alerts
|

Distribution Based Stemmer Refinement

Abstract: Abstract. Stemming is a common preprocessing task applied to text corpora. Errors in this process may be refined either manually or based on a corpus. We describe a novel corpus-based stemming technique which models the given words as being generated from a multinomial distribution over the topics available in the corpus. A sequential hypothesis testing like procedure helps us group together distributionally similar words. This stemmer refines any given stemmer and its strength can be controlled with the help … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2009
2009
2013
2013

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 11 publications
(10 reference statements)
0
1
0
Order By: Relevance
“…This is not the case since there are many documents that are returned for a word, but they are not returned for the conflated forms. Many stemming papers [27] point out that stemming reduces the size of index files considerably. They propose to use one index file to hold pointers to all documents having any of the conflated forms of a word.…”
Section: Query: Croutons Eztk Fabbosmentioning
confidence: 99%
“…This is not the case since there are many documents that are returned for a word, but they are not returned for the conflated forms. Many stemming papers [27] point out that stemming reduces the size of index files considerably. They propose to use one index file to hold pointers to all documents having any of the conflated forms of a word.…”
Section: Query: Croutons Eztk Fabbosmentioning
confidence: 99%