Social media platforms allow users to share thoughts, experiences, and beliefs. These platforms represent a rich resource for natural language processing techniques to make inferences in the context of cognitive psychology. Some inaccurate and biased thinking patterns are defined as cognitive distortions. Detecting these distortions helps users restructure how to perceive thoughts in a healthier way. This paper proposed a machine learning-based approach to improve cognitive distortions' classification of the Arabic content over Twitter. One of the challenges that face this task is the text shortness, which results in a sparsity of co-occurrence patterns and a lack of context information (semantic features). The proposed approach enriches text representation by defining the latent topics within tweets. Although classification is a supervised learning concept, the enrichment step uses unsupervised learning. The proposed algorithm utilizes a transformer-based topic modeling (BERTopic). It employs two types of document representations and performs averaging and concatenation to produce contextual topic embeddings. A comparative analysis of F1-score, precision, recall, and accuracy is presented. The experimental results demonstrate that our enriched representation outperformed the baseline models by different rates. These encouraging results suggest that using latent topic distribution, obtained from the BERTopic technique, can improve the classifier's ability to distinguish between different CD categories.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.