TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification

Barbieri, Francesco; Camacho-Collados, José; Espinosa-Anke, Luis; Neves, Lucio Pereira

doi:10.18653/v1/2020.findings-emnlp.148

Cited by 321 publications

(282 citation statements)

References 24 publications

(21 reference statements)

Supporting

Mentioning

205

Contrasting

Unclassified

Order By: Relevance

“…Additionally, we excluded common English positive words that have been found to be negatively correlated with well-being and happiness at the regional level, namely "love", "good", "LOL", "better", "well" and "like" (Jaidka et al, 2020). We test the robustness of the LIWC analysis by comparing with the results of a machine learning model for emotion categorization (Barbieri et al, 2020) based on a RoBERTa (Liu et al, 2019) supervised classifier trained with the annotated tweets from the SemEval dataset (Mohammad et al, 2018).…”

Section: Methodsmentioning

confidence: 99%

“…For two example countries (USA, Spain), we checked the robustness of our dictionary-based emotion measures by comparing it with measures based on a machine learning model for emotion categorization. For this, we finetuned the pretrained masked language model from the TweetEval Benchmark (Barbieri et al, 2020) -a model based on RoBERTa-base (Liu et al, 2019) -on English and Spanish tweets from the SemEval dataset (Mohammad et al, 2018). We then correlated the percentage of emotional tweets based on dictionary-based (anxiety, anger, sadness, positive LIWC) and machine-learning based emotion labels (fear, anger, sadness and joy + optimism + love predicted by RoBERTa).…”

Section: Duration Of Changes In Emotional Expressionmentioning

confidence: 99%

See 1 more Smart Citation

Collective Emotions during the COVID-19 Outbreak

Metzler¹,

Rimé²,

Pellert³

et al. 2021

Preprint

View full text Add to dashboard Cite

The COVID-19 pandemic has exposed the world's population to sudden challenges that elicited strong emotional reactions. Although investigations of responses to tragic one-off events exist, studies on the evolution of collective emotions during a pandemic are missing. We analyzed the digital traces of emotional expressions in tweets during five weeks after the start of outbreaks in 18 countries and six different languages. We observed an early strong upsurge of anxiety-related terms in all countries, which was stronger in countries with stronger increases in cases. Sadness terms rose and anger terms decreased around two weeks later, as social distancing measures were implemented. Positive emotions remained relatively stable. All emotions changed together with an increase in the stringency of measures during certain weeks of the outbreak. Our results show some of the most enduring changes in emotional expression observed in long periods of social media data. Words that frequently occurred in tweets suggest a shift in topics of conversation across all emotions, from political ones in 2019, to pandemic related issues during the outbreak, including everyday life changes, other people, and health. This kind of time-sensitive analyses of large-scale samples of emotional expression have the potential to inform mental health support and risk communication.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Duration Of Changes In Emotional Expressionmentioning

confidence: 99%

Collective Emotions during the COVID-19 Outbreak

Metzler¹,

Rimé²,

Pellert³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Comparison with SOTA The SST-2, SE17-4A, SM20-5 tasks have been deployed on GLUE, TweetEval, and Codalab benchmarks respectively, therefore, the state of the art (SOTA) results are available. Current SOTA performances on SST-2 are obtained by (Sun et al, 2019) and (Raffel et al, 2020) (tied), SOTA for SE17-4A is reported by (Barbieri et al, 2020), and SOTA for SM20-5 is reported by (Bai and Zhou, 2020) as shown in Figure 10.…”

Section: All Modules Activementioning

confidence: 87%

“…The tweets are classified as Negative, Neutral, and Positive. The performance for this task is measured by the macro-average of recall scores for positive, negative, and neutral classes and evaluated by the TweetEval benchmark (Barbieri et al, 2020) 5 .…”

Section: Sentiment Analysismentioning

confidence: 99%

Multi-input Recurrent Independent Mechanisms for leveraging knowledge sources: Case studies on sentiment analysis and health text mining

Bagherzadeh¹,

Bergler²

2021

Proceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Ar

View full text Add to dashboard Cite

This paper presents a way to inject and leverage existing knowledge from external sources in a Deep Learning environment, extending the recently proposed Recurrent Independent Mechnisms (RIMs) architecture, which comprises a set of interacting yet independent modules. We show that this extension of the RIMs architecture is an effective framework with lower parameter implications compared to purely fine-tuned systems.

show abstract

“…TWEETEVAL (Barbieri et al, 2020) is a pretrained RoBERTa base model, further trained with 60M tweets, randomly collected, resulting in a Twitter-domain adapted version. We use a selection of four TWEETEVAL models, each fine tuned for a twitter-specific downstream task: hate speech-, emotion-and irony-detection, and offensive language identification.…”

Section: Bertweetmentioning

confidence: 99%

Fighting the COVID-19 Infodemic with a Holistic BERT Ensemble

Tziafas¹,

Kogkalidis²,

Caselli³

2021

Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

View full text Add to dashboard Cite

This paper describes the TOKOFOU system, an ensemble model for misinformation detection tasks based on six different transformer-based pre-trained encoders, implemented in the context of the COVID-19 Infodemic Shared Task for English. We fine tune each model on each of the task's questions and aggregate their prediction scores using a majority voting approach. TOKOFOU obtains an overall F1 score of 89.7%, ranking first.

show abstract

TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification

Cited by 321 publications

References 24 publications

Collective Emotions during the COVID-19 Outbreak

Collective Emotions during the COVID-19 Outbreak

Multi-input Recurrent Independent Mechanisms for leveraging knowledge sources: Case studies on sentiment analysis and health text mining

Fighting the COVID-19 Infodemic with a Holistic BERT Ensemble

Contact Info

Product

Resources

About