Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1241
|View full text |Cite
|
Sign up to set email alerts
|

Neural Language Codes for Multilingual Acoustic Models

Abstract: Multilingual Speech Recognition is one of the most costly AI problems, because each language (7,000+) and even different accents require their own acoustic models to obtain best recognition performance. Even though they all use the same phoneme symbols, each language and accent imposes its own coloring or "twang". Many adaptive approaches have been proposed, but they require further training, additional data and generally are inferior to monolingually trained models. In this paper, we propose a different appro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
5
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 16 publications
0
5
0
Order By: Relevance
“…In the world of speech recognition, training a single recognizer for multiple languages is not a thematic stranger [3] from Hidden Markov Model (HMM) based models [17,18], hybrid models [19] to end-to-end neural based models with CTC [20,21] or sequence-to-sequence models [22,5,23,24,25,26], with the last approach being inspired by the success of multilingual machine translation [1,2]. The literature especially mentions the merits of disclosing the language identity (when the utterance is supposed to belong to a single language) to the model, whose architecture is designed to incorporate the language information.…”
Section: Related Work and Comparisonmentioning
confidence: 99%
See 2 more Smart Citations
“…In the world of speech recognition, training a single recognizer for multiple languages is not a thematic stranger [3] from Hidden Markov Model (HMM) based models [17,18], hybrid models [19] to end-to-end neural based models with CTC [20,21] or sequence-to-sequence models [22,5,23,24,25,26], with the last approach being inspired by the success of multilingual machine translation [1,2]. The literature especially mentions the merits of disclosing the language identity (when the utterance is supposed to belong to a single language) to the model, whose architecture is designed to incorporate the language information.…”
Section: Related Work and Comparisonmentioning
confidence: 99%
“…One of the manifestations is language gating from either language embeddings [21] or language codes [20,27] that aim at selecting a subset of the neurons in the network hidden layer. In our current approach, this effect can be achieved by factorizing further Equation 15 [15]:…”
Section: Related Work and Comparisonmentioning
confidence: 99%
See 1 more Smart Citation
“…In studies using popular end-to-end architectures such as RNN-T or Listen, Attend and Spell (LAS) [15], multilingual ASR performance is usually enhanced by providing auxiliary language inputs to the model [16][17][18][19][20][21]. Depending on whether or not the language spoken is known beforehand at runtime, language inputs can be provided in the form of constant one-hot vectors or on-the-fly prediction vectors (e.g., posteriors from a streaming LID model), respectively.…”
Section: Introductionmentioning
confidence: 99%
“…Automatic speech recognition (ASR) systems are becoming increasingly ubiquitous in today's world as more and more mobile devices, home appliances and automobiles add ASR capabilities. Although many improvements have been made in multi-dialect [1,2], multi-accent [3,4] and even truly multilingual [5,6,7] ASR in recent years, they often only support a small subset of languages [8]. In order to get a satisfactory Word Error Rate (WER) for a larger range of languages, language identification (LID) models have been combined with monolingual ASR systems to allow utterance-level switching for a larger set of languages [9] with reasonable accuracy, even over a set of up to 8 candidate languages.…”
Section: Introductionmentioning
confidence: 99%