Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-52
|View full text |Cite
|
Sign up to set email alerts
|

Acoustic and Textual Data Augmentation for Improved ASR of Code-Switching Speech

Abstract: In this paper, we describe several techniques for improving the acoustic and language model of an automatic speech recognition (ASR) system operating on code-switching (CS) speech. We focus on the recognition of Frisian-Dutch radio broadcasts where one of the mixed languages, namely Frisian, is an underresourced language. In previous work, we have proposed several automatic transcription strategies for CS speech to increase the amount of available training speech data. In this work, we explore how the acoustic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
44
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
3
2

Relationship

4
4

Authors

Journals

citations
Cited by 41 publications
(44 citation statements)
references
References 25 publications
0
44
0
Order By: Relevance
“…CS ASR employs a bilingual acoustic model that captures the phonetic characteristics of both languages and a bilingual language model (LM) which can assign probabilities to code-mixed word sequences as well as monolingual word sequences from both languages. The current system uses data-augmented models described in [24]. The acoustic model is trained on automatically transcribed data from the same archive and a large amount of monolingual data from the high-resourced language (Dutch) together with the manually transcribed data form the FAME!…”
Section: Baseline Approach: Time Alignment Of Cs Asr Outputmentioning
confidence: 99%
See 2 more Smart Citations
“…CS ASR employs a bilingual acoustic model that captures the phonetic characteristics of both languages and a bilingual language model (LM) which can assign probabilities to code-mixed word sequences as well as monolingual word sequences from both languages. The current system uses data-augmented models described in [24]. The acoustic model is trained on automatically transcribed data from the same archive and a large amount of monolingual data from the high-resourced language (Dutch) together with the manually transcribed data form the FAME!…”
Section: Baseline Approach: Time Alignment Of Cs Asr Outputmentioning
confidence: 99%
“…training data is the only source of CS text and contains 140k words. The remaining CS text is automatically generated as described in [24].…”
Section: Speech and Text Datamentioning
confidence: 99%
See 1 more Smart Citation
“…Project, we have developed a spoken document retrieval system for the radio broadcast archives of Omrop Fryslân (Frisian Broadcast), the regional public broadcaster of the province Fryslân in the Netherlands. This system relies on automatically generated transcriptions hypothesized by a code-switching automatic speech recognition system [16] and speaker labels generated by a modern speaker recognition system developed using the resources [17] with the ultimate goal of making these archives searchable.…”
Section: Introductionmentioning
confidence: 99%
“…The upper panel summarizes the number of words from each language subset. The middle panel provides the results of state-of-the-art ANN achitecturesYılmaz et al, 2018) for reference purposes and the lower panel presents the results achieved by the ANN and SNN models in this work (AM: acoustic model).…”
mentioning
confidence: 99%