Toward an Enhanced Bengali Text Classification Using Saint and Common Form

Ria, Nushrat Jahan; Khushbu, Sharun Akter; Yousuf, Mohammad Abu; Masum, Abu Kaisar Mohammad; Abujar, Sheikh; Hossain, Syed Akhter

doi:10.1109/icccnt49239.2020.9225358

Cited by 9 publications

(2 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…List of punctuations that we removed from our dataset are given in Table 3. − Data tokenization: tokenization is the method of splitting or tokenizing a string [9]. Words are the token of a sentence and the sentences are the token of a paragraph.…”

Section: Data Pre-processingmentioning

confidence: 99%

“…Though they have used six classifiers, among all of those accuracy of NB was much efficient than other classifier's. Futhermore, count-vectorizer, tokenizing words, removal of stop words, part-of-speech (POS) tagging were the major steps for data preprocessing [9]- [11]. Different libraries and tools such as natural language toolkit (NLTK), TextBlob, Waikato environment for knowledge analysis (WEKA), and Beautiful Soup had been used for data preprocessing [12], [13].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Depression prognosis using natural language processing and machine learning from social media status

Hossain

Talukder

Jahan

2022

IJECE

View full text Add to dashboard Cite

<p><span>Depression is an acute problem throughout the world. Due to worst and prolong depression many people dies in every year. The problem is that most of the people are not concern of the fact that they are suffering from depression. In this research, our aim was to find out whether an individual is depressed or not by analyzing social media status. Therefore, we focused on real data. Our dataset consists of 2000 sentences, which was collected from different social media platforms Facebook, Twitter, and Instagram. Then, we have performed five data pre-processing approaches for natural language processing (NLP) such as tokenization, removal of stop words, removing empty string, removing punctuations, stemming and lemmatization. For our selected model, we considered that processed data as an input. Finally, we applied six machine learning (ML) classifiers multinomial Naive Bayes (NB), logistic regression, liner support vector classifier, random forest, K-nearest neighbour, and decision tree to achieve better accuracy over our dataset. Among six algorithms, multinomial NB and logistic regression performed well on our dataset and obtained 98% accuracy.</span></p>

show abstract

Section: Data Pre-processingmentioning

confidence: 99%