Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1836
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Speech Recognition and Topic Identification from Speech for Almost-Zero-Resource Languages

Abstract: Automatic speech recognition (ASR) systems often need to be developed for extremely low-resource languages to serve enduses such as audio content categorization and search. While universal phone recognition is natural to consider when no transcribed speech is available to train an ASR system in a language, adapting universal phone models using very small amounts (minutes rather than hours) of transcribed speech also needs to be studied, particularly with state-of-the-art DNN-based acoustic models. The DARPA LO… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
12
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 8 publications
(12 citation statements)
references
References 22 publications
0
12
0
Order By: Relevance
“…Appropriate pretraining of the encoder and decoder reduced the WER by 20% absolute in the 4h Italian set, to 56.2%. This performance has been shown to still be usable for some downstream tasks such as topic identification in low-resource settings [29].…”
Section: Resultsmentioning
confidence: 99%
“…Appropriate pretraining of the encoder and decoder reduced the WER by 20% absolute in the 4h Italian set, to 56.2%. This performance has been shown to still be usable for some downstream tasks such as topic identification in low-resource settings [29].…”
Section: Resultsmentioning
confidence: 99%
“…To solve the LORELEI task, prior work [8] used a mismatched ASR to directly decode IL speech, while [9] proposed sharing common phonemic representation among languages and transferring acoustic models trained on higher-resource (potentially related) language(s). After ASR, [8,9] translated both development (dev) and incident languages into English words, used the translated dev language data along with the given topic label annotations to learn English-language topic models and then classify the translated IL data.…”
Section: Evacuation Sheltermentioning
confidence: 99%
“…To solve the LORELEI task, prior work [8] used a mismatched ASR to directly decode IL speech, while [9] proposed sharing common phonemic representation among languages and transferring acoustic models trained on higher-resource (potentially related) language(s). After ASR, [8,9] translated both development (dev) and incident languages into English words, used the translated dev language data along with the given topic label annotations to learn English-language topic models and then classify the translated IL data. Additionally, instead of using ASR to convert speech into sequences of words, [10,11,9] also investigated unsupervised techniques to automatically discover and decode IL speech segments into phone-like units via acoustic unit discovery (AUD), or into wordlike units via unsupervised term discovery (UTD).…”
Section: Evacuation Sheltermentioning
confidence: 99%
See 2 more Smart Citations