2023
DOI: 10.1017/pan.2022.29
|View full text |Cite
|
Sign up to set email alerts
|

Cross-Lingual Classification of Political Texts Using Multilingual Sentence Embeddings

Abstract: Established approaches to analyze multilingual text corpora require either a duplication of analysts’ efforts or high-quality machine translation (MT). In this paper, I argue that multilingual sentence embedding (MSE) is an attractive alternative approach to language-independent text representation. To support this argument, I evaluate MSE for cross-lingual supervised text classification. Specifically, I assess how reliably MSE-based classifiers detect manifesto sentences’ topics and positions compared to clas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(11 citation statements)
references
References 43 publications
0
6
0
Order By: Relevance
“…More recently, the field has turned to large language models in an effort to improve classification of e.g. ideological placement (Rheault & Cochrane 2020), political emotions (Widmann & Wich 2023) and political manifesto's (Licht 2023;Laurer et al 2024). Miller, Linder & Mebane (2020 explored an "active" labelling strategy where manual classification is aided by a text algorithm to select relevant documents (see also Alshami et al 2023).…”
Section: Chatgpt For Text Analysismentioning
confidence: 99%
“…More recently, the field has turned to large language models in an effort to improve classification of e.g. ideological placement (Rheault & Cochrane 2020), political emotions (Widmann & Wich 2023) and political manifesto's (Licht 2023;Laurer et al 2024). Miller, Linder & Mebane (2020 explored an "active" labelling strategy where manual classification is aided by a text algorithm to select relevant documents (see also Alshami et al 2023).…”
Section: Chatgpt For Text Analysismentioning
confidence: 99%
“…A second challenge that arises with cross-country studies is a rapid increase of cases and very often a language barrier (Licht and Lind 2023). Expert surveys and existing content-analytic strategies for measuring parties' anti-elite strategies commonly struggle with one or both of these challenges (cf.…”
Section: Existing Studies and Data Sourcesmentioning
confidence: 99%
“…10 Accordingly, we implemented measures to facilitate the label class balance in our final annotated dataset. First, we distributed only tweets that would likely contain political content for annotation, which we determined by applying a pre-trained supervised text classifier (Licht 2020). 11 Second, we mitigated repeated annotation of posts with very similar content by clustering the remaining tweets into 500 groups based on their multilingual embeddings and sampling tweets from all these "strata".…”
Section: Crowd-sourced Codingmentioning
confidence: 99%
See 1 more Smart Citation
“…Machine translation (MT) is a popular strategy for researchers who want to apply quantitative text analysis methods to such multilingual text collections (e.g., Baum and Zhukov 2019;Dancygier and Margalit 2020;Barberá et al 2022;cf. Baden et al 2022, Dolinksy et al 2022, Licht and Lind 2023. It allows bridging language barriers by transferring documents written in different languages into a single target language and thus enables researchers to analyze the resulting monolingual documents with standard text-as-data methods (e.g.…”
Section: Introductionmentioning
confidence: 99%