Social media and microblogging platforms generally contain elements of figurative and nonliteral language, including satire. The identification of figurative language is a fundamental task for sentiment analysis. It will not be possible to obtain sentiment analysis methods with high classification accuracy if elements of figurative language have not been properly identified. Satirical text is a kind of figurative language, in which irony and humor have been utilized to ridicule or criticize an event or entity. Satirical news is a pervasive issue on social media platforms, which can be deceptive and harmful. This paper presents an ensemble scheme for satirical news identification in Turkish news articles. In the presented scheme, linguistic and psychological feature sets have been utilized to extract the feature sets (i.e. linguistic, psychological, personal, spoken categories, and punctuation). In the classification phase, accuracy rates of five supervised learning algorithms (i.e. naive Bayes algorithm, logistic regression, support vector machines, random forest, and k-nearest neighbor algorithm) with three widely utilized ensemble methods (i.e. AdaBoost, bagging, and random subspace) have been considered. Based on the results, we concluded that the random forest algorithm yielded the highest performance, with a classification accuracy of 96.92% for satire detection in Turkish. For deep learning-based architectures, we have achieved classification accuracy of 97.72% with the recurrent neural network architecture with attention mechanism.
Massive open online courses (MOOCs) are recent and widely studied distance learning approaches aimed at providing learning material to learners from geographically dispersed locations without age, gender, or race-related constraints. MOOCs generally enriched by discussion forums to provide interactions among students, professors, and teaching assistants. MOOC discussion forum posts provide feedback regarding the students' learning processes, social interactions, and concerns. The purpose of our research is to present a How to cite this article: Onan A, Toçoğlu MA. Weighted word embeddings and clustering-based identification of question topics in MOOC discussion forum posts.
This study presents a new dataset to be used in emotion extraction studies in Turkish text. We consider emotion extraction as a supervised text classification problem, which thereby requires a dataset for the training process. To satisfy this requirement, we aim to create a new dataset containing data for the six emotion categories: happiness, fear, anger, sadness, disgust and surprise. To gather this dataset, we conducted a survey and collected 27,350 entries from 4709 individuals. In the next step, we performed a validation process in which annotators validated each entry one by one by assigning a related emotion category. As a result of this process, we obtained two datasets, one raw and the other validated. Subsequently, we generated four versions of these two datasets using two different stemming methods and then modelled them using a vector space model. Then, we ran machine learning algorithms, including complement naive Bayes (CNB), random forest (RF), decision tree C4.5 (J48) and an updated version of support vector machines (SVMs), on the models to calculate the accuracy, precision, recall and F-measure values. Based on the results we obtained, we concluded that the SVM classifier yielded the highest performance value and that the models trained with a validated dataset provide more accurate results than the models trained with a non-validated dataset.
Text data analysis of social media is becoming more and more important since it includes the most recent information on what people think about. Likewise, emotion is one of the most valuable parts of human communication, emotion analysis is a type of information extraction process which identifies the emotional states of a given text. In this study, we investigated the performance of deep neural networks on emotion analysis from Turkish tweets. For this, we examined three different deep learning architectures including artificial neural network (ANN), convolutional neural network (CNN) and recurrent neural network (RNN) with long short-term memory (LSTM). Besides, we curated a dataset of Turkish tweets and annotated each tweet automatically for six emotion categories using a lexicon-based approach. For the evaluation, we conducted a set of experiments for each architecture. The results showed that the lexiconbased automatic annotation of tweets is valid. Secondly, ANN produced the worst result as expected, and CNN resulted in the highest score of 0.74 in terms of accuracy measure. Experiments also showed that our proposed approach for emotion analysis of tweets in Turkish performs better than state-of-the-art in this topic.
In this paper, we proposed a lexicon for emotion analysis in Turkish for six emotional categories happiness, fear, anger, sadness, disgust, and surprise. Besides, we also investigated the effects of a lemmatizer and a stemmer, two term-weighting schemes, four lexicon enrichment methods, and a term selection approach for lexicon construction. To do this, we generated Turkish emotion lexicon based on a dataset, TREMO, containing 25,989 documents. We then preprocessed the documents to obtain dictionary and stem forms of each term using a lemmatizer and a stemmer. Afterwards, we proposed two different weighting schemes where term frequency, term-class frequency and mutual information (MI) values for six emotion categories are taken into consideration. We then enriched the lexicon by using bigram and concept hierarchy methods, and performed term selection for efficiency issues. Then, we compared the performance of lexicon-based approach with machine learning based approach by using our proposed lexicon. The experiments showed that the use of the proposed lexicon efficiently produces comparable results in emotion analysis in Turkish text.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.