Unsupervised Machine Learning based Documents Clustering in Urdu

Rahman, Atta Ur; Khan, Khairullah; Khan, Wahab; Khan, Amin; Saqia, Bibi

doi:10.4108/eai.19-12-2018.156081

Cited by 7 publications

(2 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recent studies have demonstrated effectiveness in clustering large volumes of online news items [ 14 , 15 , 16 ]. Text clustering can be defined as keeping similar group documents together, marking the outliers texts outside the group based on similarity estimation between them [ 17 ].…”

Section: Methodsmentioning

confidence: 99%

Similarity Analysis in Understanding Online News in Response to Public Health Crisis

Cezario

Marques

Pinto

et al. 2022

IJERPH

View full text Add to dashboard Cite

Background: The “Syphilis No!” campaign the Brazilian Ministry of Health (MoH) launched between November 2018 and March 2019, brought forward the concept "Test, Treat and Cure" to remind the population of the importance of syphilis prevention. In this context, this study aims to analyze the similarity of syphilis online news to comprehend how public health communication interventions influence media coverage of the syphilis issue. Methods: This paper presented a computational approach to assess the effectiveness of communication actions on a public health problem. Data were collected between January 2015 and December 2019 and processed using the Hermes ecosystem, which utilizes text mining and machine learning algorithms to cluster similar content. Results: Hermes identified 1049 google-indexed web pages containing the term ’syphilis’ in Brazil. Of these, 619 were categorized as news stories. In total, 157 were grouped into clusters of at least two similar news items and a single cluster with 462 news classified as “single” for not featuring similar news items. From these, 19 clusters were identified in the pre-campaign period, 23 during the campaign, and 115 in the post-campaign. Conclusions: The findings presented in this study show that the volume of syphilis-related news reports has increased in recent years and gained popularity after the SNP started, having been boosted during the campaign and escalating even after its completion.

show abstract

Section: Methodsmentioning

confidence: 99%

Similarity Analysis in Understanding Online News in Response to Public Health Crisis

Cezario

Marques

Pinto

et al. 2022

IJERPH

View full text Add to dashboard Cite

show abstract

“…The most popular part-of-speech tagging would be identifying words as nouns, verbs, adjectives, etc. The Albanian language has some properties that pose difficulties in creating a part-of-speech tag set [6]. A challenge faced in building a dictionary for low resource language (in this case for Albanian language) is that a partof-speech tag set that can adequately represent the underlying linguistic phenomena is difficult to build.…”

Section: Part-of-speech (Pos) Taggingmentioning

confidence: 99%

Building Dictionaries for Low Resource Languages: Challenges of Unsupervised Learning

Mati¹,

Hamiti²,

Susuri³

et al. 2021

AETiC

View full text Add to dashboard Cite

The development of natural language processing resources for Albanian has grown steadily in recent years. This paper presents research conducted on unsupervised learning-the challenges associated with building a dictionary for the Albanian language and creating part-of-speech tagging models. The majority of languages have their own dictionary, but languages with low resources suffer from a lack of resources. It facilitates the sharing of information and services for users and whole communities through natural language processing. The experimentation corpora for the Albanian language includes 250K sentences from different disciplines, with a proposal for a part-of-speech tagging tag set that can adequately represent the underlying linguistic phenomena. Contributing to the development of Albanian is the purpose of this paper. The results of experiments with the Albanian language corpus revealed that its use of articles and pronouns resembles that of more high-resource languages. According to this study, the total expected frequency as a means for correctly tagging words has been proven effective for populating the Albanian language dictionary.

show abstract

Evaluation of clustering techniques on Urdu News head-lines: a case of short length text

Nasim

Haider

2022

Journal of Experimental & Theoretical Artificial Intelligen

View full text Add to dashboard Cite

Unsupervised Machine Learning based Documents Clustering in Urdu

Cited by 7 publications

References 33 publications

Similarity Analysis in Understanding Online News in Response to Public Health Crisis

Similarity Analysis in Understanding Online News in Response to Public Health Crisis

Building Dictionaries for Low Resource Languages: Challenges of Unsupervised Learning

Evaluation of clustering techniques on Urdu News head-lines: a case of short length text

Contact Info

Product

Resources

About