UIT-HSE at WNUT-2020 Task 2: Exploiting CT-BERT for Identifying COVID-19 Information on the Twitter Social Network

Tran, Khiem Vinh; Phan, Hao Phu; Nguyen, Kiet Van; Nguyen, Ngan Luu-Thuy

doi:10.18653/v1/2020.wnut-1.53

Cited by 5 publications

(2 citation statements)

References 11 publications

(6 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Not surprisingly, CT-BERT, resulted in by continuing pre-training from the pre-trained BERTlarge model on a corpus of 22.5M COVID-19 related Tweets, is utilized in a large number of the highly-ranked systems. In particular, all of top 6 teams including NutCracker, NLP North, UIT-HSE (Tran et al, 2020), #GCDH (Varachkina et al, 2020), Loner and Phonemer (Wadhawan, 2020) utilize CT-BERT. That is why we find slight differences in their obtained F 1 scores.…”

Section: Resultsmentioning

confidence: 99%

WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets

Nguyen¹,

Vu²,

Rahimi³

et al. 2020

Proceedings of the Sixth Workshop on Noisy User-Generated Text (W-Nut 2020)

View full text Add to dashboard Cite

In this paper, we provide an overview of the WNUT-2020 shared task on the identification of informative COVID-19 English Tweets. We describe how we construct a corpus of 10K Tweets and organize the development and evaluation phases for this task. In addition, we also present a brief summary of results obtained from the final system evaluation submissions of 55 teams, finding that (i) many systems obtain very high performance, up to 0.91 F 1 score, (ii) the majority of the submissions achieve substantially higher results than the baseline fastText (Joulin et al., 2017), and (iii) fine-tuning pre-trained language models on relevant language data followed by supervised training performs well in this task.

show abstract

Section: Resultsmentioning

confidence: 99%

WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets

Nguyen¹,

Vu²,

Rahimi³

et al. 2020

Proceedings of the Sixth Workshop on Noisy User-Generated Text (W-Nut 2020)

View full text Add to dashboard Cite

show abstract

“…Our approaches to text preprocessing are various combinations of the following steps, most of which have been inspired by [8,20]:…”

Section: Data Preprocessingmentioning

confidence: 99%

g2tmn at Constraint@AAAI2021: Exploiting CT-BERT and Ensembling Learning for COVID-19 Fake News Detection

Glazkova,

Glazkov,

Trifonov

2020

Preprint

View full text Add to dashboard Cite

The COVID-19 pandemic has had a huge impact on various areas of human life. Hence, the coronavirus pandemic and its consequences are being actively discussed on social media. However, not all social media posts are truthful. Many of them spread fake news that cause panic among readers, misinform people and thus exacerbate the effect of the pandemic. In this paper, we present our results at the Constraint@AAAI2021 Shared Task: COVID-19 Fake News Detection in English. In particular, we propose our approach using the transformer-based ensemble of COVID-Twitter-BERT (CT-BERT) models. We describe the models used, the ways of text preprocessing and adding extra data. As a result, our best model achieved the weighted F1-score of 98.69 on the test set (the first place in the leaderboard) of this shared task that attracted 166 submitted teams in total.

show abstract