2018
DOI: 10.1080/19312458.2018.1555798
|View full text |Cite
|
Sign up to set email alerts
|

Overcoming Language Barriers: Assessing the Potential of Machine Translation and Topic Modeling for the Comparative Analysis of Multilingual Text Corpora

Abstract: This study assesses the potential of topic models coupled with machine translation for comparative communication research across language barriers. From a methodological point of view, the robustness of a combined approach is examined. For this purpose the results of different machine translation services (Google Translate vs. DeepL) as well as methods (full-text vs. term-by-term) are compared. From a substantive point of view, the integratability of the approach into comparative study designs is tested. For t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
36
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 39 publications
(36 citation statements)
references
References 45 publications
0
36
0
Order By: Relevance
“…This was done to avoid a bias in favor of the duplicated documents. As topic models only work with monolingual text material, we translated all the vocabulary of the English web pages into German before calculating the model [76]. We applied several common preprocessing steps in order to extract as much information from the corpus as possible.…”
Section: Plos Onementioning
confidence: 99%
“…This was done to avoid a bias in favor of the duplicated documents. As topic models only work with monolingual text material, we translated all the vocabulary of the English web pages into German before calculating the model [76]. We applied several common preprocessing steps in order to extract as much information from the corpus as possible.…”
Section: Plos Onementioning
confidence: 99%
“…The 25 NRRPs officially submitted are indicated by the European Commission Recovery and Resilience facility here: https://ec.europa.eu/info/business-economy-euro/recoverycoronavirus/recovery-and-resilience-facility_en context of the EU (e.g. Lucas et al 2015, Traber et al, 2020 which have been assessed to yield results as good as human translation -in a bag-of-words context -even through the use of a far less advanced method as Google translation (deVries et al, 2018, Reber, 2019.…”
Section: Methodsmentioning
confidence: 99%
“…Details on the preparation process for topic modeling can be found in the Online Appendix. Due to the language heterogeneity of our corpus, we followed recent methodological research that found machine translation services to offer a viable and valid solution (e.g., Lucas et al, 2015;Reber, 2019). 4 Following these suggestions, we translated Arabic/Hebrew tweets into English using the Google Translate API.…”
Section: Automated Topic Modelingmentioning
confidence: 99%