2017
DOI: 10.31219/osf.io/cuxha
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

No Longer Lost in Translation: Evidence that Google Translate Works for Comparative Bag-of-Words Text Applications

Abstract: Automated text analysis allows researchers to analyze large quantities of text. Yet, comparative researchers are presented with a big challenge: across countries people speak different languages. To address this issue, some analysts have suggested using Google Translate to convert all texts into English before starting the analysis (Lucas et al., 2015). But in doing so, do we get lost in translation? This paper evaluates the usefulness of machine translation for bag-of-words models – such as topic models. We u… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
41
0
1

Year Published

2018
2018
2022
2022

Publication Types

Select...
8
1

Relationship

3
6

Authors

Journals

citations
Cited by 36 publications
(42 citation statements)
references
References 11 publications
0
41
0
1
Order By: Relevance
“…The speeches were originally sourced from active or archived public websites and are partially auto‐translated to English (De Vries et al. ; Schumacher et al. ).…”
Section: Methodsmentioning
confidence: 99%
“…The speeches were originally sourced from active or archived public websites and are partially auto‐translated to English (De Vries et al. ; Schumacher et al. ).…”
Section: Methodsmentioning
confidence: 99%
“…To compare speeches in different languages, we used Google Translate to translate all non-English texts to English, as this was the language of a large majority of speeches. De Vries et al (2018) demonstrate that Google Translate can be used for our purposes. In particular, they compared the output of topic models -an automated text analysis technique -of a text corpus of European Parliament proceedings translated to English by professional translators to a text corpus of the same proceedings translated to English by Google Translate.…”
Section: Selection Of Documentsmentioning
confidence: 99%
“…However, because the French, Italian and German answers were translated into a fourth language, the loss of information affects all the answers similarly, which likely increases the answers’ across‐language comparability (de Vries et al. ).…”
Section: Data and Estimationmentioning
confidence: 99%