Creating and Comparing Dictionary, Word Embedding, and Transformer-Based Models to Measure Discrete Emotions in German Political Text

Widmann, Tobias; Wich, Maximilian

doi:10.1017/pan.2022.15

Cited by 27 publications

(23 citation statements)

References 62 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We find that advanced supervised machine learning classification methods using transformer language models can approach the performance of human analysis when it comes to inference on various internal states from short texts. Our results, thus, echo recent suggestions about the potential of deep learning methods in social science applications (van Atteveldt et al 2021;Bonikowski, Luo, and Stuhler 2022;Do, Ollion, and Shen 2022;Widmann and Wich 2022). Yet, we also suggest that increased method complexity does not always warrant a large improvement in performancesimple supervised machine learning methods such as logistic regression can sometimes perform almost as well as more complex algorithms.…”

Section: Introductionsupporting

confidence: 89%

“…An epoch denotes an iteration during which the model has used all the relevant data for learning coding patterns once; when completed, the model is updated based on the data it has seen and then the data can be passed through the updated model again, training it for another epoch and letting it learn from the data even further. We then choose the bestperforming epoch for each model (Widmann and Wich 2022).…”

Section: Sml Classification Methodsmentioning

confidence: 99%

“…Guided by this framework, we present a comprehensive benchmark of approaches from three families of automatic text analysis methods: dictionary methods, supervised machine learning classification methods, and unsupervised machine learning clustering methods. Building on existing work (Barberá et al 2021;Nelson et al 2021;Widmann and Wich 2022), we survey the performance of various algorithms within these three families across four different datasets and coding tasks.…”

Section: Conclusion and Final Remarksmentioning

confidence: 99%

See 2 more Smart Citations

A Systematic Evaluation of Text Mining Methods for Short Texts: Mapping Individuals’ Internal States from Online Posts

Macanovic¹,

Przepiorka²

2022

Preprint

View full text Add to dashboard Cite

Sociologists have successfully used text mining to investigate discourse using news articles, official documents, and other sources. Yet, the potential of exploring millions of short texts generated spontaneously by individuals in online environments has remained untapped within the field. To fill this gap, we show how such texts can inform sociologists about individual internal states such as norms, motives, and stances, which thus far have been mainly elicited using surveys. We assess the performance of 581 variations of three text mining approaches–dictionary methods, supervised, and unsupervised machine learning–against the benchmark of texts coded by humans for complex schemes capturing individuals’ internal states. Our analysis includes coding feedback texts from an online market for motives for leaving feedback (N = 2,000) and tweet texts for moral values expressed in text (N = 3,832). We describe challenges arising with these different approaches and provide best-practice advice for future applications.

show abstract

Section: Introductionsupporting

confidence: 89%

Section: Sml Classification Methodsmentioning

confidence: 99%

Section: Conclusion and Final Remarksmentioning

confidence: 99%

See 1 more Smart Citation

A Systematic Evaluation of Text Mining Methods for Short Texts: Mapping Individuals’ Internal States from Online Posts

Macanovic¹,

Przepiorka²

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Second, we assess its performance against two widely used document scaling techniques: Wordscores (Laver, Benoit, and Garry 2003) and Wordfish (Lo, Proksch, and Slapin 2016). Finally, as transformer architectures are considered the current state-of-the-art technique in NLP (see, e.g., Widmann and Wich 2022), the performance of our dictionary is compared to the performance of the newly released ConfliBERT model (Hu et al 2022). As a measure of performance, we investigate alignment with conflict trends over time and correlations with our variable of interest.…”

Section: Introductionmentioning

confidence: 99%

“…They are able to show considerable improvements of their approach compared with other dictionaries. Similarly, Widmann and Wich (2022) apply a word embedding model and manual coding to extend an existing German-language sentiment dictionary. They compare this dictionary to word embeddings and transformer models, finding that transformer models outperform the other approaches.…”

Section: Introductionmentioning

confidence: 99%

Introducing an Interpretable Deep Learning Approach to Domain-Specific Dictionary Creation: A Use Case for Conflict Prediction

Häffner¹,

Hofer²,

Nagl

et al. 2023

Polit. Anal.

View full text Add to dashboard Cite

Recent advancements in natural language processing (NLP) methods have significantly improved their performance. However, more complex NLP models are more difficult to interpret and computationally expensive. Therefore, we propose an approach to dictionary creation that carefully balances the trade-off between complexity and interpretability. This approach combines a deep neural network architecture with techniques to improve model explainability to automatically build a domain-specific dictionary. As an illustrative use case of our approach, we create an objective dictionary that can infer conflict intensity from text data. We train the neural networks on a corpus of conflict reports and match them with conflict event data. This corpus consists of over 14,000 expert-written International Crisis Group (ICG) CrisisWatch reports between 2003 and 2021. Sensitivity analysis is used to extract the weighted words from the neural network to build the dictionary. In order to evaluate our approach, we compare our results to state-of-the-art deep learning language models, text-scaling methods, as well as standard, nonspecialized, and conflict event dictionary approaches. We are able to show that our approach outperforms other approaches while retaining interpretability.

show abstract

Methodology

Gallant,

van der Noll

2024

Jews and Muslims in German Print Media

View full text Add to dashboard Cite

Creating and Comparing Dictionary, Word Embedding, and Transformer-Based Models to Measure Discrete Emotions in German Political Text

Cited by 27 publications

References 62 publications

A Systematic Evaluation of Text Mining Methods for Short Texts: Mapping Individuals’ Internal States from Online Posts

A Systematic Evaluation of Text Mining Methods for Short Texts: Mapping Individuals’ Internal States from Online Posts

Introducing an Interpretable Deep Learning Approach to Domain-Specific Dictionary Creation: A Use Case for Conflict Prediction

Methodology

Contact Info

Product

Resources

About