Improving Document Relevancy Using Integrated Language Modeling Techniques

Balakrishnan, Vimala; Humaidi, Norshima; Lloyd-Yemoh, Ethel

doi:10.22452/mjcs.vol29no1.4

Cited by 14 publications

(5 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For European languages there are two types: Stemming, which is based on the reduction of words to a common steam by clipping off the unnecessary morphemes, and Lemmatisation, which is based on the clustering of words by morphemes guided by the knowledge by the computer program of the dictionary and morphology of the language for this association (Singh, Gupta, 2017:159). Although in principle one might think that lemmatisation could offer more reliable results, various studies such as those carried out by Kettunen, Kunttu and Järvelin (2005) and Balakrishnan, Humaidi and Lloyd-Yemoh (2016) point out that the results in different languages between both methods present insignificant differences. Anyways, stemming, the most used method in computer programs for Latin languages, can present two types of errors in certain cases: false positives and false negatives, respectively, with words that have an almost equal morphology and different meaning or polysemic words (Hajeer, Ismail, Badr, Tolba, 2017).…”

Section: Objectives and Methodsmentioning

confidence: 99%

Occupational accidents and their prevention in the Spanish digital press

García¹,

Martínez²,

Carabel³

et al. 2017

View full text Add to dashboard Cite

Introduction. Occupational accidents and their prevention have become a serious day-to-day human, social and economic problem for businesses, while the news media have traditionally tended to give central coverage to fatal work accidents. The objective of this article is to study how born-digital newspapers address this issue. Methods. The study is based on the content analysis of media framing by text mining techniques of a sample of news pieces related to the work accidents and their prevention, published over a period of two years and seven months, by five of the main digital newspapers in Spain.Results and conclusions. Evidence shows that digital news media respond reactively to workplace accidents, just like their print counterparts, although there are important differences in the treatment of this subject according to the editorial line of each online newspaper.

show abstract

Section: Objectives and Methodsmentioning

confidence: 99%

Occupational accidents and their prevention in the Spanish digital press

García¹,

Martínez²,

Carabel³

et al. 2017

View full text Add to dashboard Cite

show abstract

“…After that, all the tokens were converted to lowercase form before applying the lemmatization technique. Lemmatization, in general, uses vocabulary and morphological analysis of words to remove inflectional endings and convert them to their dictionary form Balakrishnan et al [ 46 ]. A stopwords list was applied to the lemmatized words, and then the length of each tweet was normalized using the L2 norm.…”

Section: Study Proceduresmentioning

confidence: 99%

Early-stage pregnancy recognition on microblogs: Machine learning and lexicon-based approaches

Sarsam,

Alzahrani,

Al-Samarraie

2023

Heliyon

View full text Add to dashboard Cite

“…All the tokens were transformed to a lowercase form before applying the lemmatisation technique. Lemmatisation, in general, uses vocabulary and morphological analysis of word and removes inflectional endings to convert words to a dictionary form (Balakrishnan, Humaidi, & Lloyd-Yemoh, 2016). The stop-words method was applied on the lemmatised words.…”

Section: Data Pre-processing and Text Clusteringmentioning

confidence: 99%

Disease discovery-based emotion lexicon: a heuristic approach to characterise sicknesses in microblogs

Sarsam

Al-Samarraie

Al-Sa’di

2020

Netw Model Anal Health Inform Bioinforma

View full text Add to dashboard Cite

The analysis of microblogging data has been widely used to discover valuable resources for timely identification of critical illness-related incidents and serious epidemics. Despite the numerous efforts made in this field, making an accurate and timely prediction of incidents and outbreaks based on certain clinical symptoms remains a great challenge. Hence, providing an investigative method can be crucial in characterising a disease state. This study proposes a heuristic mechanism by using an unsupervised learning technique to efficiently detect disease incidents and outbreaks from the tweet content. We categorised the types of emotions that are highly linked to a specific disease and its related terminologies. Emotions (anger, fear, sadness, and joy) and diabetes-related terminologies were extracted using the NRC Affect Intensity Lexicon and a part-of-speech tagging tool. A two-cluster solution was established and validated.The classification result showed that K-means clustering with 2 centroids had the highest classification accuracy (96.53%). The relationship between diabetes-related terms (in the form of tweets) and emotions were established and assessed using the association rules mining technique.The results showed that diabetes-related terms were exclusively associated with fear emotions.This study offers a novel mechanism for disease recognition and outbreak detection in microblogs which is useful in making informed decisions about a disease state.

show abstract

Improving Document Relevancy Using Integrated Language Modeling Techniques

Cited by 14 publications

References 21 publications

Occupational accidents and their prevention in the Spanish digital press

Occupational accidents and their prevention in the Spanish digital press

Early-stage pregnancy recognition on microblogs: Machine learning and lexicon-based approaches

Disease discovery-based emotion lexicon: a heuristic approach to characterise sicknesses in microblogs

Contact Info

Product

Resources

About