Multilingual Extractive Reading Comprehension by Runtime Machine Translation

Asai, Akari; Eriguchi, Akiko; Hashimoto, Kazuma; Tsuruoka, Yoshimasa

doi:10.48550/arxiv.1809.03275

Cited by 36 publications

(36 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…we still often see improvement over the baseline, but the improvement is always less than when using XLDA only over the context channel. This is in keeping with the findings of Asai et al (2018), which show that the ability to correctly translate questions is crucial for question answering. In other words, SQuAD is extremely sensitive to the translation quality of the question, and it is not surprising that machine translations of the questions are less effective than translating the context, which is less sensitive.…”

Section: Xlda For Squadsupporting

confidence: 89%

XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering

Singh,

McCann,

Keskar

et al. 2019

Preprint

View full text Add to dashboard Cite

While natural language processing systems often focus on a single language, multilingual transfer learning has the potential to improve performance, especially for low-resource languages. We introduce XLDA, cross-lingual data augmentation, a method that replaces a segment of the input text with its translation in another language. XLDA enhances performance of all 14 tested languages of the cross-lingual natural language inference (XNLI) benchmark. With improvements of up to 4.8%, training with XLDA achieves stateof-the-art performance for Greek, Turkish, and Urdu. XLDA is in contrast to, and performs markedly better than, a more naive approach that aggregates examples in various languages in a way that each example is solely in one language. On the SQuAD question answering task, we see that XLDA provides a 1.0% performance increase on the English evaluation set. Comprehensive experiments suggest that most languages are effective as cross-lingual augmentors, that XLDA is robust to a wide range of translation quality, and that XLDA is even more effective for randomly initialized models than for pretrained models.

show abstract

Section: Xlda For Squadsupporting

confidence: 89%

XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering

Singh,

McCann,

Keskar

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…However, all this work focuses on English attacks. Further, although many multilingual QA datasets exist (He et al, 2017;Asai et al, 2018;Mozannar et al, 2019;Artetxe et al, 2020;Lewis et al, 2020), no prior work has explored adversarial evaluation and exposed vulnerabilities over large pre-trained multi-lingual language models. .…”

Section: Related Workmentioning

confidence: 99%

Are Multilingual BERT models robust? A Case Study on Adversarial Attacks for Multilingual Question Answering

Rosenthal,

Bornea,

Sil

2021

Preprint

View full text Add to dashboard Cite

Recent approaches have exploited weaknesses in monolingual question answering (QA) models by adding adversarial statements to the passage. These attacks caused a reduction in stateof-the-art performance by almost 50%. In this paper, we are the first to explore and successfully attack a multilingual QA (MLQA) system pre-trained on multilingual BERT using several attack strategies for the adversarial statement reducing performance by as much as 85%. We show that the model gives priority to English and the language of the question regardless of the other languages in the QA pair. Further, we also show that adding our attack strategies during training helps alleviate the attacks.

show abstract

“…Unlike NBC and Fox, CBS does not have a Spanish-language outlet of its own that could broadcast the game (though per league policy, a separate Spanish play-by-play call was carried on CBS's second audio program channel for over-the-air viewers). […] 3 . We here evaluate the performance of the fine-tuned multilingual BERT on them and compare the results to a baseline [3].…”

Section: Fine-tuning Bert On English Squadmentioning

confidence: 99%

Multilingual Question Answering from Formatted Text applied to Conversational Agents

Siblini,

Pasqual,

Lavielle

et al. 2019

Preprint

View full text Add to dashboard Cite

Recent advances in NLP with language models such as BERT, GPT-2, XLNet or XLM, have allowed surpassing human performance on Reading Comprehension tasks on large-scale datasets (e.g. SQuAD), and this opens up many perspectives for Conversational AI. However, task-specific datasets are mostly in English which makes it difficult to acknowledge progress in foreign languages. Fortunately, state-of-the-art models are now being pre-trained on multiple languages (e.g. BERT was released in a multilingual version managing a hundred languages) and are exhibiting ability for zero-shot transfer from English to others languages on XNLI. In this paper, we run experiments that show that multilingual BERT, trained to solve the complex Question Answering task defined in the English SQuAD dataset, is able to achieve the same task in Japanese and French. It even outperforms the best published results of a baseline which explicitly combines an English model for Reading Comprehension and a Machine Translation Model for transfer. We run further tests on crafted cross-lingual QA datasets (context in one language and question in another) to provide intuition on the mechanisms that allow BERT to transfer the task from one language to another. Finally, we introduce our application Kate. Kate is a conversational agent dedicated to HR support for employees that exploits multilingual models to accurately answer to questions, in several languages, directly from information web pages.

show abstract

Multilingual Extractive Reading Comprehension by Runtime Machine Translation

Cited by 36 publications

References 30 publications

XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering

XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering

Are Multilingual BERT models robust? A Case Study on Adversarial Attacks for Multilingual Question Answering

Multilingual Question Answering from Formatted Text applied to Conversational Agents

Contact Info

Product

Resources

About