Proceedings of the Seventh Workshop on Noisy User-Generated Text (W-Nut 2021) 2021
DOI: 10.18653/v1/2021.wnut-1.3
|View full text |Cite
|
Sign up to set email alerts
|

Detecting Depression in Thai Blog Posts: a Dataset and a Baseline

Abstract: We present the first openly available corpus for detecting depression in Thai. Our corpus is compiled by expert verified cases of depression in several online blogs. We experiment with two different LSTM based models and two different BERT based models. We achieve a 77.53% accuracy with a Thai BERT model in detecting depression. This establishes a good baseline for future researcher on the same corpus. Furthermore, we identify a need for Thai embeddings that have been trained on a more varied corpus than Wikip… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 22 publications
(13 reference statements)
0
3
0
Order By: Relevance
“…As the model we are basing our work on (i.e., multilingual BERT) is trained on a generic encyclopedia corpus (Wikipedia) and has little exposure to Islamic and Qur'anic concepts, we continue training the multilingual BERT model to adapt it to the domain of the task here. In our previous research (Hämäläinen et al, 2021a;Hämäläinen et al, 2021b), we have found that BERT based models tend to work better if their training data has had text of a similar domain as the downstream task the model is fine-tuned for. Therefore, we believe that domain adaptation is beneficial in this case as well.…”
Section: Domain Adaptationmentioning
confidence: 99%
“…As the model we are basing our work on (i.e., multilingual BERT) is trained on a generic encyclopedia corpus (Wikipedia) and has little exposure to Islamic and Qur'anic concepts, we continue training the multilingual BERT model to adapt it to the domain of the task here. In our previous research (Hämäläinen et al, 2021a;Hämäläinen et al, 2021b), we have found that BERT based models tend to work better if their training data has had text of a similar domain as the downstream task the model is fine-tuned for. Therefore, we believe that domain adaptation is beneficial in this case as well.…”
Section: Domain Adaptationmentioning
confidence: 99%
“…These metrics were defined using Eq. (3)(4)(5)(6), respectively. This study proposed a novel stacking ensemble strategy consisting of five main stages, as denoted in Fig.…”
Section: ) Evaluation Metricsmentioning
confidence: 99%
“…The remaining datasets that have been appeared in studies, have been kept private and dataset details have not disclosed. Moreover, many of them contain a very small number of trainable examples, such as Thai sentence Wiki [18], which is comprised of 600 samples, and Thai depression dataset with only 944 samples [5].…”
Section: Introductionmentioning
confidence: 99%