Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2000
DOI: 10.1145/345508.345650
|View full text |Cite
|
Sign up to set email alerts
|

Stemming and its effects on TFIDF ranking (poster session)

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
26
0
2

Year Published

2004
2004
2017
2017

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 40 publications
(28 citation statements)
references
References 8 publications
0
26
0
2
Order By: Relevance
“…Stemming removes inflections (e.g., "scrolls" and "scrolling" both reduce to "scroll"). Stemming allows for a more precise comparison between bug reports by creating a more normalized corpus; our experiments used the common Porter stemming algorithm (e.g., [7]). …”
Section: Textual Analysismentioning
confidence: 99%
“…Stemming removes inflections (e.g., "scrolls" and "scrolling" both reduce to "scroll"). Stemming allows for a more precise comparison between bug reports by creating a more normalized corpus; our experiments used the common Porter stemming algorithm (e.g., [7]). …”
Section: Textual Analysismentioning
confidence: 99%
“…Since the storage efficiency is not a concern for our experiments and there is no available stopword list constructed for Ottoman language, a stopword list is not used in our framework. Stemming is another method that not only shrinks the vocabulary of the dataset, but may also increase the effectiveness of an IR environment depending on design factors such as the stemming algorithm and the language used [15]. For highly inflected languages, such as Arabic and Ottoman, developing effective stemmers is a hard task and not within the scope of this thesis.…”
Section: Typical Components Of An Ir Systemmentioning
confidence: 99%
“…The effects of stemming and lemmatization as preprocessing operations of the input vector space model for LSA are controversial (see, e.g., Denhière & Lemaire, 2004;Kantrowitz, Mohit, & Mittal, 2000) and probably depend, on the one hand, on the quality of this type of preprocessing and, on the other hand, on the size of the corpora used. Stemming and lemmatization are different techniques that use language-dependent word morphology for the very same sought-after effect: Semantically similar words of the vocabulary are merged to create an equivalence class (the stem or the lemma), traditionally called the term, of the vector space model with less statistical noise; as a consequence of the merging, the vector space dimension is reduced.…”
Section: B Co-triggered Lemmatizationmentioning
confidence: 99%