Multi-Label Prediction for Political Text-as-Data

Erlich, Aaron; Dantas, Stefano G.; Bagozzi, Benjamin E.; Berliner, Daniel; Palmer‐Rubin, Brian

doi:10.1017/pan.2021.15

Cited by 5 publications

(3 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Overall, we find that transformer classification models can tackle coding schemes of varying complexity well. In line with some recent research, we do find that it can be beneficial to use supervised machine learning models designed for category co-occurrence when working with particularly complex coding schemes (Erlich et al, 2022 ). Other methods, including dictionaries, logistic regression, and even zero-shot classification, tend to capture co-occurrence patterns less well.…”

Section: Conclusion and Final Remarkssupporting

confidence: 88%

See 1 more Smart Citation

A systematic evaluation of text mining methods for short texts: Mapping individuals’ internal states from online posts

Macanovic,

Przepiorka

2024

Behav Res

View full text Add to dashboard Cite

Short texts generated by individuals in online environments can provide social and behavioral scientists with rich insights into these individuals’ internal states. Trained manual coders can reliably interpret expressions of such internal states in text. However, manual coding imposes restrictions on the number of texts that can be analyzed, limiting our ability to extract insights from large-scale textual data. We evaluate the performance of several automatic text analysis methods in approximating trained human coders’ evaluations across four coding tasks encompassing expressions of motives, norms, emotions, and stances. Our findings suggest that commonly used dictionaries, although performing well in identifying infrequent categories, generate false positives too frequently compared to other methods. We show that large language models trained on manually coded data yield the highest performance across all case studies. However, there are also instances where simpler methods show almost equal performance. Additionally, we evaluate the effectiveness of cutting-edge generative language models like GPT-4 in coding texts for internal states with the help of short instructions (so-called zero-shot classification). While promising, these models fall short of the performance of models trained on manually analyzed data. We discuss the strengths and weaknesses of various models and explore the trade-offs between model complexity and performance in different applications. Our work informs social and behavioral scientists of the challenges associated with text mining of large textual datasets, while providing best-practice recommendations.

show abstract

Section: Conclusion and Final Remarkssupporting

confidence: 88%

“… 7 We train separate RF and SVM models for each coding category of interest in applications where categories can co-occur across texts. For a comprehensive overview of other solutions for tackling co-occurrence with SML algorithms, see Erlich and colleagues ( 2022 ). …”

mentioning

confidence: 99%

A systematic evaluation of text mining methods for short texts: Mapping individuals’ internal states from online posts

Macanovic,

Przepiorka

2024

Behav Res

View full text Add to dashboard Cite

show abstract

“…Usually, researchers give these texts multiple labels in line with the above hierarchical classes via time-consuming manual efforts before analyzing this information and extracting knowledge about poverty governance [6,7]. Therefore, a classification model based on natural language processing is the primary method for automatic multi-label classification [8].…”

Section: Introductionmentioning

confidence: 99%

Multi-Label Classification of Chinese Rural Poverty Governance Texts Based on XLNet and Bi-LSTM Fused Hierarchical Attention Mechanism

Wang

Guo

2023

Applied Sciences

View full text Add to dashboard Cite

Hierarchical multi-label text classification (HMTC) is a highly relevant and widely discussed topic in the era of big data, particularly for efficiently classifying extensive amounts of text data. This study proposes the HTMC-PGT framework for poverty governance’s single-path hierarchical multi-label classification problem. The framework simplifies the HMTC problem into training and combination problems of multi-class classifiers in the classifier tree. Each independent classifier in this framework uses an XLNet pretrained model to extract char-level semantic embeddings of text and employs a hierarchical attention mechanism integrated with Bi-LSTM (BiLSTM + HA) to extract semantic embeddings at the document level for classification purposes. Simultaneously, this study proposes that the structure uses transfer learning (TL) between classifiers in the classifier tree. The experimental results show that the proposed XLNet + BiLSTM + HA + FC + TL model achieves micro-P, micro-R, and micro-F1 values of 96.1%, which is 7.5~38.1% higher than those of other baseline models. The HTMC-PGT framework based on XLNet, BiLSTM + HA, and transfer learning (TL) between classifier tree nodes proposed in this study solves the hierarchical multi-label classification problem of poverty governance text (PGT). It provides a new idea for solving the traditional HMTC problem.

show abstract

Semantic features analysis for biomedical lexical answer type prediction using ensemble learning approach

Hussain,

Wasim,

Cheema

et al. 2024

Knowl Inf Syst

View full text Add to dashboard Cite

Lexical answer type prediction is integral to biomedical question–answering systems. LAT prediction aims to predict the expected answer’s semantic type of a factoid or list-type biomedical question. It also aids in the answer processing stage of a QA system to assign a high score to the most relevant answers. Although considerable research efforts exist for LAT prediction in diverse domains, it remains a challenging biomedical problem. LAT prediction for the biomedical field is a multi-label classification problem, as one biomedical question might have more than one expected answer type. Achieving high performance on this task is challenging as biomedical questions have limited lexical features. One biomedical question must be assigned multiple labels given these limited lexical features. In this paper, we develop a novel feature set (lexical, noun concepts, verb concepts, protein–protein interactions, and biomedical entities) from these lexical features. Using ensemble learning with bagging, we use the label power set transformation technique to classify multi-label. We evaluate the integrity of our proposed methodology on the publicly available multi-label biomedical questions dataset (MLBioMedLAT) and compare it with twelve state-of-the-art multi-label classification algorithms. Our proposed method attains a micro-F1 score of 77%, outperforming the baseline model by 25.5%.

show abstract

Multi-Label Prediction for Political Text-as-Data

Cited by 5 publications

References 50 publications

A systematic evaluation of text mining methods for short texts: Mapping individuals’ internal states from online posts

A systematic evaluation of text mining methods for short texts: Mapping individuals’ internal states from online posts

Multi-Label Classification of Chinese Rural Poverty Governance Texts Based on XLNet and Bi-LSTM Fused Hierarchical Attention Mechanism

Semantic features analysis for biomedical lexical answer type prediction using ensemble learning approach

Contact Info

Product

Resources

About