2021
DOI: 10.1080/19312458.2021.1955845
|View full text |Cite
|
Sign up to set email alerts
|

Machine Translation Vs. Multilingual Dictionaries Assessing Two Strategies for the Topic Modeling of Multilingual Text Collections

Abstract: The goal of this paper is to evaluate two methods for the topic modeling of multilingual document collections: (1) machine translation (MT), and (2) the coding of semantic concepts using a multilingual dictionary (MD) prior to topic modeling. We empirically assess the consequences of these approaches based on both a quantitative comparison of models and a qualitative validation of each method's potentials and weaknesses. Our case study uses two text collections (of tweets and news articles) in three languages … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 22 publications
(9 citation statements)
references
References 38 publications
0
8
0
Order By: Relevance
“…It is a methodology similar to, but different from, Naive Bayes algorithms, in which hateful phrases are identified, and it is machine learning that searches for the words and relationships between them as a way of training and learning to then be applied to new phrases, as used by Arcila- Calderón et al (2021) in Spanish. While there are methodological advances in multilingual dictionaries (Maier et al, 2022), hate detection in Spanish is still in its infancy and needs further development. -Probabilistic determination of bot behaviour through Kearney's (2018) Tweetbotornot algorithm.…”
Section: Methodsmentioning
confidence: 99%
“…It is a methodology similar to, but different from, Naive Bayes algorithms, in which hateful phrases are identified, and it is machine learning that searches for the words and relationships between them as a way of training and learning to then be applied to new phrases, as used by Arcila- Calderón et al (2021) in Spanish. While there are methodological advances in multilingual dictionaries (Maier et al, 2022), hate detection in Spanish is still in its infancy and needs further development. -Probabilistic determination of bot behaviour through Kearney's (2018) Tweetbotornot algorithm.…”
Section: Methodsmentioning
confidence: 99%
“…Full-text translated documents can then be pre-processed and tokenized into words and phrases ( n -gram tokens) to obtain monolingual BoW representations of originally multilingual documents. 2 This approach has been shown to enable reliable dictionary analysis (Windsor, Cupit, and Windsor 2019), topic modeling (de Vries, Schoonvelde, and Schumacher 2018; Lucas et al 2015; Maier et al 2021; Reber 2019), and supervised text classification (Courtney et al 2020; Lind et al 2021b).…”
Section: Approaches To Cross-lingual Quantitative Text Analysismentioning
confidence: 99%
“…To solve this issue, some scholars have focused on other techniques to examine multilingual news content, such as (machine) translation (see e.g., Courtney, Breen, McMenamin, & McNulty, 2020;De Vries et al, 2018;Lind et al, 2021;Maier, Baden, Stoltenberg, De Vries-Kedem, & Waldherr, 2022) or multilingual dictionaries (Maier et al, 2022). They show that such techniques can efficiently and effectively be employed for the classification of multilingual data.…”
Section: Distant Political News Classificationmentioning
confidence: 99%