2010
DOI: 10.1145/1838745.1838748
|View full text |Cite
|
Sign up to set email alerts
|

Comparative Study of Indexing and Search Strategies for the Hindi, Marathi, and Bengali Languages

Abstract: The main goal of this article is to describe and evaluate various indexing and search strategies for the Hindi, Bengali, and Marathi languages. These three languages are ranked among the world's 20 most spoken languages and they share similar syntax, morphology, and writing systems. In this article we examine these languages from an Information Retrieval (IR) perspective through describing the key elements of their inflectional and derivational morphologies, and suggest a light and more aggressive stemming app… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2011
2011
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 30 publications
(8 citation statements)
references
References 32 publications
0
8
0
Order By: Relevance
“…A number of works have reported the effectiveness of light stemming approaches [20,4] for a variety of languages both European and Asian. As opposed to the stemmers like Porter, light stemming methods focus on removing only inflectional variants from the end of a word.…”
Section: Language Specif C Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…A number of works have reported the effectiveness of light stemming approaches [20,4] for a variety of languages both European and Asian. As opposed to the stemmers like Porter, light stemming methods focus on removing only inflectional variants from the end of a word.…”
Section: Language Specif C Methodsmentioning
confidence: 99%
“…al. [4] is chosen as the rule based approach for Marathi and Bengali. The Hungarian stemmer is by Savoy [21] and the Czech stemmer is by Dolamic et.…”
Section: Baseline Stemmersmentioning
confidence: 99%
“…The adjectives, on the other hand, are normally inflected with some primary and secondary adjectival suffixes denoting degree, quality, quantity, and similar other attributes. As a result, to build up a complete and robust system for WSD for all types of morphologically derived forms tagged with lexical information and semantic relations is a real challenge for a language like Bengali [15][16][17][18][19][20][21][22][23][24][25].…”
Section: Key Features Of Bengali Morphologymentioning
confidence: 99%
“…In our approach, we have therefore used three types of learning sets that are built up according to three different meanings of the ambiguous word (māthā) "head", which are collected from the Bengali WordNet. There are five types of dictionary definition (glosses) for the word with twenty five (25) ), 2 nd and 3 rd category represent second type of meaning ( = ), 4 th and 5 th category represent third type of meaning ( = = । ). Based on the information types we have built up three specific categories of senses of (māthā): (a) " , ", (b) " , " and (c) " , , । ".…”
Section: Learning Proceduresmentioning
confidence: 99%
“…Some of the most frequent Bengali suffixes are 'ই', 'গুবলা', 'র্া', 'টি', 'রা', 'হীন'. We have used the system described in the [4], with some simple additional modifications.…”
Section: Stemmingmentioning
confidence: 99%