Development of the SRI/nightingale Arabic ASR system

Vergyri, Dimitra; Mandal, Arindam; Wang, Wen; Stolcke, Andreas; Zheng, Jing; Graciarena, Martin; Rybach, David; Gollan, Christian; Schlüter, Ralf; Kirchhoff, Katrin; Faria, Arlo; Morgan, Nelson

doi:10.21437/interspeech.2008-415

Cited by 20 publications

(3 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The basic acoustic models are trained based on Maximum Likelihood (ML) method. Then, a discriminative training based on Minimum Phone Error (MPE) criterion is performed to enhance the models [15,16].…”

Section: Methodsmentioning

confidence: 99%

Investigating the use of morphological decomposition and diacritization for improving Arabic LVCSR

El-Desoky¹,

Gollan²,

Rybach³

et al. 2009

Interspeech 2009

Self Cite

View full text Add to dashboard Cite

Section: Methodsmentioning

confidence: 99%

Investigating the use of morphological decomposition and diacritization for improving Arabic LVCSR

El-Desoky¹,

Gollan²,

Rybach³

et al. 2009

Interspeech 2009

Self Cite

View full text Add to dashboard Cite

“…There has been a lot of process on this task over the last couple of years, see e. g. [3,4,5,6,7]. This paper describes the progress of work at CMU since our initial efforts in 2006 [8], using the JRTk/ Ibis toolkit [9].…”

Section: The Gale Speech-to-text Taskmentioning

confidence: 99%

The 2010 CMU GALE speech-to-text system

Metze¹,

Hsiao²,

Qin³

et al. 2010

Interspeech 2010

View full text Add to dashboard Cite

“…Arabic spoken corpora have been primarily gathered from radio and television news broadcasts and phone calls [12]. Because of the limitations of the available spoken corpora, Arabic ASR research and applications have been limited to particular domains, such as Arabic digits [15] [16], broadcast news [19], command and control [15], The Holy Qur'an [15] [24], and Arabic proverbs [20]. Limited text and speech Arabic corpora are also a major problem for Arabic ASR researchers who are seeking to apply Arabic ASR to a broader range of applications.…”

Section: Introductionmentioning

confidence: 99%

A Novel Human-Vehicle Interaction Assistive Device for Arab Drivers Using Speech Recognition

2022

View full text Add to dashboard Cite

About one-quarter of all car collisions in the United States are caused by distracted driving, and this ratio is expected to rise. As vehicles are equipped with more elaborate and complex technology, human-vehicle interaction via dashboard displays and controls will become more complex and distracting. Human-vehicle interaction via voice-based technology offers a less distracting alternative. In this study we aim to develop a voice-based car assistant, with a focus on Arabic language speech recognition. We prepare a new 4000-word domain-specific lexicon to comprehensively support driver-vehicle interactions, and we create corresponding text and speech corpora. Then we extract acoustic feature vectors and use various acoustic models to support speech recognition. The language model is created using an n-gram model. Then acoustic and language models, and the lexicon are combined to generate a decoding graph. The text corpus consists of 6110 elements, including words, phrases, and sentences. The speech corpus has more than 60000 recordings (almost 50 hours). For the decoding of noise-free audio, a Deep Neural Network + Hidden Markov Model provided 94.832% accuracy, a Subspace Gaussian Mixture Model + Hidden Markov Model provided 94.2% accuracy, and the best Gaussian Mixture Model + Hidden Markov Model provided 94.13% accuracy. For the decoding of noisy audio, a Deep Neural Network + Hidden Markov Model provided 93.316% accuracy, a Subspace Gaussian Mixture Model + Hidden Markov Model provided 92.62% accuracy, and the best Gaussian Mixture Model + Hidden Markov Model provided 91.82% accuracy. A usability study was conducted on the system with 10 participants. Almost all of the results of that study showed usability ratings of greater than 4.0 out of 5.0. These usability ratings indicate that the proposed system was seen by the participants as important, and useful for reducing driver distraction.

show abstract

Development of the SRI/nightingale Arabic ASR system

Cited by 20 publications

References 15 publications

Investigating the use of morphological decomposition and diacritization for improving Arabic LVCSR

Investigating the use of morphological decomposition and diacritization for improving Arabic LVCSR

The 2010 CMU GALE speech-to-text system

A Novel Human-Vehicle Interaction Assistive Device for Arab Drivers Using Speech Recognition

Contact Info

Product

Resources

About