A Speech Endpoint Detection Based on Dynamically Updated Threshold of Box-Counting Dimension

Gao, Hongbin; Weiyi, Pang; Chunru, Huang; Zhang, Yongqiang

doi:10.1109/ifita.2009.381

Cited by 5 publications

(2 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The VAD block is used to detect the beginning and end of speech waveforms and to exclude nonspeech segments [3][4][5][6][7]. This technique is used in this work because nonspeech segments can degrade recognition performance, especially at a low signal-to-noise ratio (SNR) [3][4][5].…”

Section: Introductionmentioning

confidence: 99%

“…This technique is used in this work because nonspeech segments can degrade recognition performance, especially at a low signal-to-noise ratio (SNR) [3][4][5]. In noisy environments, the noise will smear speech waveforms, thus a robust speech recognition algorithm, such as cepstrum mean subtraction (CMS) [10], running spectrum filtering (RSF), and dynamic range adjustment (DRA) [8][9][10][11], is required.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Dynamic Time Warping for Speech Recognition with Training Part to Reduce the Computation

Sun

Miyanaga

Sai

2014

Journal of Signal Processing

View full text Add to dashboard Cite

In this paper, we proposed a dynamic time warping (DTW) method with a training part. DTW is a popular automatic speech recognition (ASR) method based on template matching. Conventional DTW is fast and of low complexity, however its recognition accuracy is limited. Recently, a DTW with multireferences (mDTW) algorithm has also been developed to improve the recognition accuracy to be comparable to that of the hidden Markov model (HMM) algorithm under noisy conditions. However the mDTW algorithm increases the calculation cost. Therefore, in order to reduce the calculation cost, in this paper, a training part will be added to the DTW-based ASR system, unlike the mDTW, which tries to find appropriate reference utterances to replace the increasing utterances. The results show that the average recognition accuracy of the proposed method is similar to that of the mDTW, and the calculation cost was reduced by 41.6%.

show abstract