Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.80
|View full text |Cite
|
Sign up to set email alerts
|

Modeling Code-Switch Languages Using Bilingual Parallel Corpus

Abstract: Language modeling is the technique to estimate the probability of a sequence of words. A bilingual language model is expected to model the sequential dependency for words across languages, which is difficult due to the inherent lack of suitable training data as well as diverse syntactic structure across languages. We propose a bilingual attention language model (BALM) that simultaneously performs language modeling objective with a quasi-translation objective to model both the monolingual as well as the cross-l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
13
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 19 publications
(18 citation statements)
references
References 36 publications
0
13
0
Order By: Relevance
“…A sentence-encoder is needed to map the individual utterances within D onto the vector space. Firstly, we fine-tune a RoBERTa-base pre-trained language model (Liu et al, 2019) with training data of the target dialogue domain, because task-adaptive finetuning of the pre-trained language model on the target domain data benefits the final performance (Gururangan et al, 2020;Lee and Li, 2020). Next, the mean pooling operation is performed on the token embeddings within each utterance of D to derive their respective utterance-level representations.…”
Section: Dialogue Utterance Representationmentioning
confidence: 99%
“…A sentence-encoder is needed to map the individual utterances within D onto the vector space. Firstly, we fine-tune a RoBERTa-base pre-trained language model (Liu et al, 2019) with training data of the target dialogue domain, because task-adaptive finetuning of the pre-trained language model on the target domain data benefits the final performance (Gururangan et al, 2020;Lee and Li, 2020). Next, the mean pooling operation is performed on the token embeddings within each utterance of D to derive their respective utterance-level representations.…”
Section: Dialogue Utterance Representationmentioning
confidence: 99%
“…We show that the proposed approach outperforms the naive MLM rescoring which is without the conversion mentioned above by 7.23% relative WER reduction on Mainland China Code-Switch (MLCCS). We also improve an over 7.08% relative WER reduction from the Bilingual Attention Language Model(BALM) [4] which achieves state-of-the-art performance on the SEAME [15] code-switch dataset.…”
Section: Introductionmentioning
confidence: 91%
“…It's becoming fairly common in today's globalized world not only among bilingual societies but also in predominantly monolingual societies, and more and more speakers use a second language in the professional context. Codeswitch speech recognition poses a significant challenge [1] even as the recent ASR system reaches outstanding performance [2,3,4]. It introduces more vocabulary choices at each prediction step due to the words from two languages and appears freely and sparingly without strict syntactic or grammatical rules.…”
Section: Introductionmentioning
confidence: 99%
“…The main challenge addressed in these works is the limited availability of codemixed sentences. Gonen and Goldberg (2019) and Lee and Li (2020) propose different methods of training LMs for CM sentences without explicitly creating synthetic CM data, but another popular strategy is to first create synthetic CM data and train the LM with such synthetic data. We next summarize existing approaches to generate syn-thetic CM data: propose to learn switching patterns from code-mixed data using a GAN-based adversarial training.…”
Section: Related Workmentioning
confidence: 99%