Word by Word Labelling of Romanized Sindhi Text by using Online Python Tool

Sodhar, Irum Naz; Buller, Abdul Hafeez; Sulaiman, Suriani; Sodhar, Anam Naz

doi:10.14569/ijacsa.2022.0130831

Cited by 3 publications

(4 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Figures 3 and 4 summarize the settings and the training steps of the language models. Machine Learning classifiers deployed, trained, and tested in this work were Logistic regression [30], Support Vector Machine (SVM) [31], K-nearest neighbors (KNN) [32], Decision Tree [33], Stochastic Gradient Descent (SGD) [34], and Multinomial Naive Bayes [35]. In the ensemble learning category, several models were applied to do the same task such as Voting Classifiers [36], Random Forest [37], Bagging Meta-Estimator [38], AdaBoost [39], XGBoost [40], Gradient Boosting [41], and Light Gradient Boosting Machine (LightGBM) [42].…”

Section: The Proposed Approachmentioning

confidence: 99%

Detection and prediction of Future Mental disorder from Social Media Data using Machine Learning, Ensemble Learning, and Large Language Models.

Abdullah,

Negied

2024

IEEE Access

View full text Add to dashboard Cite

Social media platforms are used widely by all people to express their feelings, opinions, and emotional states. Billions of people worldwide use them daily to share what they think and feel in their posts. Amongst all social media available platforms, Facebook only contains around three billion personal accounts. In this work Reddit dataset is used to automatically detect mental illness from social media posts. This study is not only limited to early detection of already existing mental illness or disorder like depression and anxiety from social posts, but also and most importantly the study is extended to predict successfully potential mental illness that would happen in future. This study deploys Nineteen different models to study the capability of them in detecting and predicting mental disorders from social media posts. Some of the deployed models are classical machine learning classifiers, some are ensemble learning models, and the rest are large language models (LLMs). Six machine learning classifiers were used in this work for the automatic detection and prediction of mental illness and logistic regression proved to be the best amongst other classifiers in this task. Nine Ensemble methods were also used and examined. Amongst the Nine ensemble learning models VC2, Light GBM, Bagging estimator, and XGBoost proved to be superior in this task. Four large language models were also used and examined for the same task. RoBERTa and OpenAI GPT proved to outperform the rest of models in this task. All those models were built, trained, tested, and compared with previous work in literature to get the best possible results. The study covers the main four mental disorders which are ADHD, Anxiety, Bipolar, and Depression. The work proposed in this paper succeeded in outperforming the results in literature in terms of number of addressed mental disorders, number of models used and tested, and dataset size used to validate results. The proposed work also outperformed the only attempt in literature that addressed all mental disorders in results of detection and prediction noticeably. This work achieved the detection of already existing mental disorders F1-score of 0.80 from clinical data and of 0.52 from non-clinical data, and it achieved a prediction of future mental disorder F1-score of 0.43 from non-clinical data.

show abstract

Section: The Proposed Approachmentioning

confidence: 99%

Detection and prediction of Future Mental disorder from Social Media Data using Machine Learning, Ensemble Learning, and Large Language Models.

Abdullah,

Negied

2024

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Sentiment analysis of RST has been done on the online Python tool for 100 sentences. But during, before, and after performing the task of sentiment analysis on RST, faced issues with the completion of this task [18,19]. While performing the task of sentiment analysis on RST, positive sentences were not identified by the tool (Python), but after the characters of the Romanized text were changed, and then the results came.…”

Section: Issues Of Sentiment Analysis Of Romanized Sindhi Textmentioning

confidence: 99%

Hybrid Approach Used to Analyze the Sentiments of Romanized Text (Sindhi)

Sodhar¹,

Sulaiman²,

Buller³

et al. 2023

IJACSA

View full text Add to dashboard Cite

Sentiment analysis is an important part of natural language processing (NLP). This study evaluated the sentiment of Romanized Sindhi Text (RST) using a hybrid approach and ground truth values. The methodology of sentiment analysis involves three major steps: input data, process on tool, analysis of data and evaluation of results. One hundred RST sentences were used in this study's sentiment analysis, which can be positive, neutral, or negative. The statements in the corpus of this study are simple to understand and are used in everyday life. This research used an online Python tool to process a text and get results in the form of outcomes. The results showed that 86% of the sentences have neutral sentiments, 9% of the total results of sentiment analysis have negative sentiments, and only 5% of sentences of Romanized Sindhi Text have positive sentiments. The accuracy of the RST was measured on an online calculator and the value was 87.02% on the basis of ground truth values. An error ratio of 12.98% was calculated on the basis accuracy found on the online calculator of confusion matrix.

show abstract

“…Sindhi is a complex language with a rich morphology that allows for word borrowing and lending (9,10) . It has a high rate of ambiguity due to similar patterns and vowel deletions.…”

Section: Sindhi Morphologymentioning

confidence: 99%

“…Although they are recognized as part of great languages, the variety of affixes in Sindhi makes it more complex. The significant variety in Sindhi's morphology caused by different prefix, suffix, and stem placements in words makes it difficult to computerize (10) .…”

Section: Sindhi Morphologymentioning

confidence: 99%

Morphology-Assisted Sindhi Text Analysis for Natural Language Processing Applications

Sodhar,

Buller,

Sulaiman

2023

IJST

View full text Add to dashboard Cite

Objectives: Understanding word construction and internal structure, especially in the Sindhi language, requires knowledge of the linguistic field known as morphology. In this study, Sindhi morphology is examined with particular attention paid to its structure, function, nature, word categories, and writing system. Natural Language Processing (NLP) relies on morphological analysis to identify words and their grammatical features, enabling applications like spell checkers and machine translation. A comparative analysis is done to comprehend how Sindhi Morphology developed. Because research on morphology analysis lack proper classification and cover both modern and conventional methodologies, Sindhi morphology variances present difficulties for computerization. Methods: Morphological analysis is crucial in Natural Language Processing (NLP) domains like spell checkers and gadget translation, studying word production and phrase shape using morphemes, the smallest grammatical elements in a language. Morphemes are the building blocks of words and are divided into free and fixed morphemes. Findings: Sindhi's rich morphology and complexity enable borrowing and lending of words, but ambiguity is high due to similar patterns and vowel deletions. Morphological analysis influences semantic and syntactic analysis. Computerization is challenging due to prefixes, suffixes, and stem positions. Primary and secondary words can be subdivided into compound and complicated terms. The language uses initial, middle, and end writing styles. Novelty: This research aims to develop an automatic Sindhi morphological analyzer for future NLP applications, ensuring compatibility with existing Information Technology world applications. It will help understand Sindhi word structure and benefit software developers in developing Sindhi natural language and speech processing applications.

show abstract

Word by Word Labelling of Romanized Sindhi Text by using Online Python Tool

Cited by 3 publications

References 12 publications

Detection and prediction of Future Mental disorder from Social Media Data using Machine Learning, Ensemble Learning, and Large Language Models.

Detection and prediction of Future Mental disorder from Social Media Data using Machine Learning, Ensemble Learning, and Large Language Models.

Hybrid Approach Used to Analyze the Sentiments of Romanized Text (Sindhi)

Morphology-Assisted Sindhi Text Analysis for Natural Language Processing Applications

Contact Info

Product

Resources

About