2018
DOI: 10.4108/eai.19-12-2018.156081
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Machine Learning based Documents Clustering in Urdu

Abstract: The volume of data on the web is growing rapidly, due to the proliferation of news sources, contents, blogs and journals etc. Like other languages, the Urdu language has also observed tremendous growth on the internet. As the volume of data is expanding, information retrieval (IR) is becoming complicated. Document clustering is an unsupervised ML approach, employed to group a huge number of dispersed documents into a small number of significant and consistent clusters, thus providing a base for indexing, IR an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 33 publications
0
2
0
Order By: Relevance
“…Recent studies have demonstrated effectiveness in clustering large volumes of online news items [ 14 , 15 , 16 ]. Text clustering can be defined as keeping similar group documents together, marking the outliers texts outside the group based on similarity estimation between them [ 17 ].…”
Section: Methodsmentioning
confidence: 99%
“…Recent studies have demonstrated effectiveness in clustering large volumes of online news items [ 14 , 15 , 16 ]. Text clustering can be defined as keeping similar group documents together, marking the outliers texts outside the group based on similarity estimation between them [ 17 ].…”
Section: Methodsmentioning
confidence: 99%
“…The most popular part-of-speech tagging would be identifying words as nouns, verbs, adjectives, etc. The Albanian language has some properties that pose difficulties in creating a part-of-speech tag set [6]. A challenge faced in building a dictionary for low resource language (in this case for Albanian language) is that a partof-speech tag set that can adequately represent the underlying linguistic phenomena is difficult to build.…”
Section: Part-of-speech (Pos) Taggingmentioning
confidence: 99%