ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9053057
|View full text |Cite
|
Sign up to set email alerts
|

Improving Language Identification for Multilingual Speakers

Abstract: Spoken language identification (LID) technologies have improved in recent years from discriminating largely distinct languages to discriminating highly similar languages or even dialects of the same language. One aspect that has been mostly neglected, however, is discrimination of languages for multilingual speakers, despite being a primary target audience of many systems that utilize LID technologies. As we show in this work, LID systems can have a high average accuracy for most combinations of languages whil… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 15 publications
0
4
0
Order By: Relevance
“…They work with 11 different models and use ReLU, dropouts, Adam, batch normalization, and various other attributes to get good results. Literature [ 25 ] proposed a system with an acoustic model and a context-aware model. They created the model with 4 convolutional layers with 128 units, 4 fully connected layers with the 1024 units, 1 fully connected layer with the 512 units, 1 temporal pooling layer with a mean and standard deviation, 1 fully connected layer with 1024 units, and at last softmax function is used with unit 1.…”
Section: Related Workmentioning
confidence: 99%
“…They work with 11 different models and use ReLU, dropouts, Adam, batch normalization, and various other attributes to get good results. Literature [ 25 ] proposed a system with an acoustic model and a context-aware model. They created the model with 4 convolutional layers with 128 units, 4 fully connected layers with the 1024 units, 1 fully connected layer with the 512 units, 1 temporal pooling layer with a mean and standard deviation, 1 fully connected layer with 1024 units, and at last softmax function is used with unit 1.…”
Section: Related Workmentioning
confidence: 99%
“…Wan et al [21] and Mazzawi et al [22] also investigate LSTM based architectures for this dataset. Titus et al [23] explore the effect of accent in language identification performance and train models robust to accented speech.…”
Section: Related Workmentioning
confidence: 99%
“…Although E2E multilingual ASR [2][3][4] and language identification [5][6][7][8][9][10][11][12][13][14] can be studied separately, there exists a large number of previous work on using LID to improve multilingual E2E ASR . One body of work shows that using oracle LID information helps multilingual models [15][16][17][18][19][20].…”
Section: Introductionmentioning
confidence: 99%