Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2015
DOI: 10.3115/v1/n15-1170
|View full text |Cite
|
Sign up to set email alerts
|

Cross-lingual Text Classification Using Topic-Dependent Word Probabilities

Abstract: Cross-lingual text classification is a major challenge in natural language processing, since often training data is available in only one language (target language), but not available for the language of the document we want to classify (source language). Here, we propose a method that only requires a bilingual dictionary to bridge the language gap. Our proposed probabilistic model allows us to estimate translation probabilities that are conditioned on the whole source document. The assumption of our probabili… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 5 publications
0
6
0
Order By: Relevance
“…Cross-lingual document classification has been extensively studied. Previous method transfers knowledge with cross-lingual resources, such as bilingual dictionary (Wu et al, 2008;Shi et al, 2010), parallel text (Xu and Yang, 2017), labeled data from related languages (Zhang et al, 2018), multilingual topic model (Ni et al, 2011;Andrade et al, 2015), machine translation system (Banea et al, 2008;Wan, 2009;Zhou et al, 2016), and clwe (Klementiev et al, 2012). Our method instead brings a bilingual speaker in the loop to actively provide cross-lingual knowledge, which is more reliable in low-resource settings.…”
Section: Related Workmentioning
confidence: 99%
“…Cross-lingual document classification has been extensively studied. Previous method transfers knowledge with cross-lingual resources, such as bilingual dictionary (Wu et al, 2008;Shi et al, 2010), parallel text (Xu and Yang, 2017), labeled data from related languages (Zhang et al, 2018), multilingual topic model (Ni et al, 2011;Andrade et al, 2015), machine translation system (Banea et al, 2008;Wan, 2009;Zhou et al, 2016), and clwe (Klementiev et al, 2012). Our method instead brings a bilingual speaker in the loop to actively provide cross-lingual knowledge, which is more reliable in low-resource settings.…”
Section: Related Workmentioning
confidence: 99%
“…Previous CLDC methods are typically word-based and rely on one of the following cross-lingual signals to transfer knowledge: large bilingual lexicons (Shi, Mihalcea, and Tian 2010;Andrade et al 2015), MT systems (Banea et al 2008;Wan 2009;Zhou, Wan, and Xiao 2016), or CLWE (Klementiev, Titov, and Bhattarai 2012). One exception is the recently proposed multilingual BERT (Devlin et al 2019;Wu and Dredze 2019), which uses a subword vocabulary.…”
Section: Related Workmentioning
confidence: 99%
“…CLDC works when it can find a shared representation for documents from both languages: train a classifier on source language documents and apply it on target language documents. Previous work uses a bilingual lexicon (Shi, Mihalcea, and Tian 2010;Andrade et al 2015), machine translation (Banea et al 2008;Wan 2009;Zhou, Wan, and Xiao 2016, MT), topic models (Mimno et al 2009;Yuan, Van Durme, and Boyd-Graber 2018), cross-lingual word embeddings (Klementiev, Titov, and Bhattarai 2012, CLWE), or multilingual contextualized embeddings (Wu and Dredze 2019) to extract cross-lingual features. But these methods may be impossible in low-resource languages, as they require some combination of large parallel or comparable text, high-coverage dictionaries, and monolingual corpora from a shared domain.…”
Section: Introduction: Classifiers Across Languagesmentioning
confidence: 99%
“…Our work is closely related to the work of Ježek and Toman [9], in which they developed a multimodal multilingual classification technique. In their work, they used a language recognition algorithm to inform the classifier about the appropriate language module to call and work on through the use of EuroWordNet [10] Andrade et al [13] also used a bilingual dictionary, where they proposed a probabilistic model that estimates the translation probabilities that are conditioned on the whole source document. The underlying assumption of their probabilistic model was based on topic models, namely that each document can be characterized by a distribution over topics that helps to resolve the ambiguity that may arise in the translation of single words.…”
Section: Related Workmentioning
confidence: 99%