2016
DOI: 10.1007/s10462-016-9527-1
|View full text |Cite
|
Sign up to set email alerts
|

A survey on Urdu and Urdu like language stemmers and stemming techniques

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
4
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
3

Relationship

1
9

Authors

Journals

citations
Cited by 17 publications
(7 citation statements)
references
References 35 publications
0
4
0
Order By: Relevance
“…3. A document is tokenized using space or punctuation symbols [14], [37]. Non-language characters, special symbols, numeric values, and URLs are removed so that a document contains only words of the target language.…”
Section: ) Preprocessing Of Text Documentsmentioning
confidence: 99%
“…3. A document is tokenized using space or punctuation symbols [14], [37]. Non-language characters, special symbols, numeric values, and URLs are removed so that a document contains only words of the target language.…”
Section: ) Preprocessing Of Text Documentsmentioning
confidence: 99%
“…The agglutination nature of the Urdu language means that the prefix, lemma, and suffix are added to the root (stem) word with multiple different combinations making a more complicated word structure (morphology) [ 42 ]. A token may either change a word’s NE type or the word may not be classified as NEs when agglutinated with other words.…”
Section: Challenges Of Urdu Named Entity Recognitionmentioning
confidence: 99%
“…This makes Urdu a complex and highly rich morphological language. Further, it is one of the most important languages in South Asia, as it is spoken by more than 175 million people in Pakistan, India, and other South Asian countries [3][4][5].…”
Section: Introductionmentioning
confidence: 99%