“…The effects of stemming and lemmatization as preprocessing operations of the input vector space model for LSA are controversial (see, e.g., Denhière & Lemaire, 2004;Kantrowitz, Mohit, & Mittal, 2000) and probably depend, on the one hand, on the quality of this type of preprocessing and, on the other hand, on the size of the corpora used. Stemming and lemmatization are different techniques that use language-dependent word morphology for the very same sought-after effect: Semantically similar words of the vocabulary are merged to create an equivalence class (the stem or the lemma), traditionally called the term, of the vector space model with less statistical noise; as a consequence of the merging, the vector space dimension is reduced.…”