2022
DOI: 10.2196/34492
|View full text |Cite
|
Sign up to set email alerts
|

Traditional Machine Learning Models and Bidirectional Encoder Representations From Transformer (BERT)–Based Automatic Classification of Tweets About Eating Disorders: Algorithm Development and Validation Study

Abstract: Background Eating disorders affect an increasing number of people. Social networks provide information that can help. Objective We aimed to find machine learning models capable of efficiently categorizing tweets about eating disorders domain. Methods We collected tweets related to eating disorders, for 3 consecutive months. After preprocessing, a subset of 2000 tweets was labeled: (1) messages written by peo… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
19
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 25 publications
(21 citation statements)
references
References 55 publications
(87 reference statements)
1
19
0
Order By: Relevance
“…The size of the data set is quite similar to those of Kummervold et al [ 17 ] (1633 tweets for training and 544 for testing) and Benítez-Andrades et al [ 26 ] (n=1400 for training and n=600 for testing). Furthermore, the benefit of using a pretrained model such as the CamemBERT is that a large data set is not required to obtain good results.…”
Section: Methodsmentioning
confidence: 91%
See 1 more Smart Citation
“…The size of the data set is quite similar to those of Kummervold et al [ 17 ] (1633 tweets for training and 544 for testing) and Benítez-Andrades et al [ 26 ] (n=1400 for training and n=600 for testing). Furthermore, the benefit of using a pretrained model such as the CamemBERT is that a large data set is not required to obtain good results.…”
Section: Methodsmentioning
confidence: 91%
“…This accuracy is slightly higher than that obtained by BERT for the same topic (vaccines) [ 17 ] and in the same range as previous findings [ 16 , 29 ]. However, CamemBERT obtained a better accuracy (78.7%-87.8%) in a study using dichotomous labels for tweets about eating disorders and using a preprocessing step, reducing the initial number of tweets by 2 [ 26 ]. However, by limiting the analysis to long tweets (170 or more characters, in accordance with the statistical analysis conducted on the performance of the model), the accuracy of classification model 2 improved significantly (from 62.9% to 72.4% for the F1-score).…”
Section: Discussionmentioning
confidence: 99%
“…We note that prior to using BERT, we have attempted K-means clustering technique that failed at clustering the documents into interpretable topics. Also, in previous studies (Benitez-Andrades et al, 2022;Bilal and Almazroi, 2022) the authors observed that BERT-based classifiers outperform bag-of-words approaches. Though BERT-based models can be computationally expensive (Bhattacharjee et al, 2020), we utilized BERTopic to have a better accuracy in classifying documents into interpretable topics.…”
Section: Topic Modeling To Identify Subfields Within the Covid-vaccin...mentioning
confidence: 85%
“…• when other sources of information are not freely available, such as in languages other than English [7,9,19,20,22,38,43,47]; • when researchers investigate questions related to patients and population, while these questions are not discussed with medical doctors or require large population samples. We can mention for instance sentiment analysis on medication and vaccines [7,20,33,38,43,47], and adverse drug effects [76,77]; • when mental health of patients is concerned in cases like depression [9, 4, 78], eating disorders [79], suicide detection and prevention [19,80], quality of life of patients [31,81], and drug misuse [82][83][84].…”
Section: Social Media As the Preferred Source Of Informationmentioning
confidence: 99%