2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2016
DOI: 10.1109/icassp.2016.7472824
|View full text |Cite
|
Sign up to set email alerts
|

Investigating techniques for low resource conversational speech recognition

Abstract: In this paper we investigate various techniques in order to build effective speech to text (STT) and keyword search (KWS) systems for low resource conversational speech. Subword decoding and graphemic mappings were assessed in order to detect out-of-vocabulary keywords. To deal with the limited amount of transcribed data, semi-supervised training and data selection methods were investigated. Robust acoustic features produced via data augmentation were evaluated for acoustic modeling. For language modeling, aut… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
3

Relationship

3
0

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 18 publications
0
4
0
Order By: Relevance
“…iii. Can code-switching be detected using LID systems [17]? what is the minimum time span to be successful?…”
Section: Research Questionsmentioning
confidence: 99%
See 1 more Smart Citation
“…iii. Can code-switching be detected using LID systems [17]? what is the minimum time span to be successful?…”
Section: Research Questionsmentioning
confidence: 99%
“…First, the speech files were automatically segmented into acoustically homogeneous segments, which ideally correspond to speaker turns and/or to a given language or stable acoustic conditions (broad band/telephone band...). These segments were then automatically transcribed using different ASR systems [23,17] in parallel: a French system, a multi-dialect Arabic system (predominantly Lebanese) and an Algerian Arabic (dialect) system. The systems were trained on several hundreds of hours of speech from a large number of speakers.…”
Section: Speech Technologies For Code-switchingmentioning
confidence: 99%
“…All of these speech modifications drastically degrade the performance of automatic speech recognition (ASR) systems when the speaker wears an oxygen mask [11]. Using recent speech recognition systems trained with normal speech [12,13,14], the Word Error Rate (WER) obtained for speech with the oxygen mask doubles in comparison to that of normal speech from the same speaker. In order to build accurate ASR models for military aircraft pilots, the speech variations needs to be clearly identified and quantified.…”
Section: Introductionmentioning
confidence: 99%
“…To this end, we propose to compare French vowel production variation in Algerian Arabic-French bilinguals and in French (FR) native speakers. Furthermore, we will compare their French productions (FR-Alg) to their speech productions of Algerian Arabic (AA) in CS context [6,7]. The aim of the study is to shed some light on the pronunciation variation Algerian Arabic-French bilinguals produce in both languages and in CS speech.…”
Section: Introductionmentioning
confidence: 99%