Ethel Lloyd-Yemoh scite author profile

Ethel Lloyd-Yemoh

2Publications

56Citation Statements Received

34Citation Statements Given

How they've been cited

168

How they cite others

Affiliations

Information Technology University, University of Malaya

Publications

Order By: Most citations

Stemming and Lemmatization: A Comparison of Retrieval Performances

Balakrishnan¹,

Lloyd-Yemoh²

2014

LNSE

154

View full text Add to dashboard Cite

Abstract-The current study proposes to compare document retrieval precision performances based on language modeling techniques, particularly stemming and lemmatization. Stemming is a procedure to reduce all words with the same stem to a common form whereas lemmatization removes inflectional endings and returns the base or dictionary form of a word. Comparisons were also made between these two techniques with a baseline ranking algorithm (i.e. with no language processing). A search engine was developed and the algorithms were tested based on a test collection. Both mean average precisions and histograms indicate stemming and lemmatization to outperform the baseline algorithm. As for the language modeling techniques, lemmatization produced better precision compared to stemming, however the differences are insignificant. Overall the findings suggest that language modeling techniques improves document retrieval, with lemmatization technique producing the best result.

show abstract

Improving Document Relevancy Using Integrated Language Modeling Techniques

Balakrishnan

Humaidi

Lloyd-Yemoh

2016

MJCS

View full text Add to dashboard Cite

This paper presents an integrated language model to improve document relevancy for text-queries. To be precise, an integrated stemming-lemmatization (S-L) model was developed and its retrieval performance was compared at three document levels, that is, at top 5, 10 and 15. A prototype search engine was developed and fifteen queries were executed. The mean average precisions revealed the S-L model to outperform the baseline (i.e. no language processing), stemming and also the lemmatization models at all three levels of the documents. These results were also supported by the histogram precisions which illustrated the integrated model to improve the document relevancy. However, it is to note that the precision differences between the various models were insignificant. Overall the study found that when language processing techniques, that is, stemming and lemmatization are combined, more relevant documents are retrieved.Keywords: Information retrieval, document relevancy, language modeling, stemming, lemmatization, mean average precision INTRODUCTIONThe use of internet all over the world has caused information size to increase, hence making it possible for large volumes of information to be retrieved by the users. However, this phenomenon also makes it difficult for users to find relevant information, therefore proper information retrieval techniques are needed. Information retrieval can be defined as "a problem-oriented discipline concerned with the problem of the effective and efficient transfer of desired information between human generator and human user" [1]. In short, information retrieval aims to provide users with those documents that will satisfy their information need.Many information retrieval algorithms were proposed, and some of the popular ones include the traditional Boolean model (i.e. based on binary decisions), vector space model (i.e. compares user queries with documents found in collections and computes their similarities), and probabilistic model (i.e. based on the probability theory to model uncertainties involved in retrieving data), among others. Over the years, information retrieval has evolved to include text retrieval in different languages, and thus giving birth to language models. The language model is particularly concerned with identifying how likely it is for a particular string in a specific language to be repeated [2]. A popular technique used in the language model is the N-gram model which predicts a preceding word based on previous N-1 words [3]. Other popular techniques include stemming and lemmatization.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.