2018
DOI: 10.1121/1.5039837
|View full text |Cite
|
Sign up to set email alerts
|

Acoustic landmarks contain more information about the phone string than other frames for automatic speech recognition with deep neural network acoustic model

Abstract: Most mainstream automatic speech recognition (ASR) systems consider all feature frames equally important. However, acoustic landmark theory is based on a contradictory idea that some frames are more important than others. Acoustic landmark theory exploits quantal nonlinearities in the articulatory-acoustic and acoustic-perceptual relations to define landmark times at which the speech spectrum abruptly changes or reaches an extremum; frames overlapping landmarks have been demonstrated to be sufficient for speec… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 11 publications
(9 citation statements)
references
References 33 publications
(48 reference statements)
0
6
0
Order By: Relevance
“…Landmark-based ASR has been shown to slightly reduce the WER of a large-vocabulary speech recognizer, but only in a rescoring paradigm using a very small test set [18]. Landmarks can reduce computational load for DNN/HMM hybrid models [12,13] and can improve recognition accuracy [11]. Previous works [11,12,13,19] annotated landmark positions mostly following experimental findings presented in [20,21].…”
Section: Acoustic Landmarksmentioning
confidence: 99%
See 1 more Smart Citation
“…Landmark-based ASR has been shown to slightly reduce the WER of a large-vocabulary speech recognizer, but only in a rescoring paradigm using a very small test set [18]. Landmarks can reduce computational load for DNN/HMM hybrid models [12,13] and can improve recognition accuracy [11]. Previous works [11,12,13,19] annotated landmark positions mostly following experimental findings presented in [20,21].…”
Section: Acoustic Landmarksmentioning
confidence: 99%
“…Many efforts have been attempted to augment acoustic modeling with acoustic landmarks [11,12,13] which are detected by accurate time-aligned phonetic transcriptions. To the best of our knowledge, only TIMIT [14] (5.4 hours) provides such fine-grained transcriptions.…”
Section: Introductionmentioning
confidence: 99%
“…We extracted landmark training labels by referencing the TIMIT human annotated phone boundaries. An example of the labeling is presented in Fig 2. This example from [7] illustrates the labeling of the word "Symposium" 1 . The figure is generated using Praat [19].…”
Section: Defining and Marking Landmarksmentioning
confidence: 99%
“…Automatic speech recognition (ASR) systems have been proposed that depend completely on landmarks, with no regard for the steady-state regions of the speech signal [5], and such systems have been demonstrated to be competitive with phone-based ASR under certain circumstances. Other studies have proposed training two separate sets of classifiers, one trained to recognize landmarks, another trained to recognize steady-state phone segments, and fusing the two for improved accuracy [6] or for reduced computational complexity [7]. It has been difficult to build cross-lingual ASR from such sys-tems, however, because very few of the world's languages possess large corpora with the correct timing of consonant release and consonant closure landmarks manually coded.…”
Section: Introductionmentioning
confidence: 99%
“…The MTL approach is applied to neural networks by sharing some of the hidden layers between different tasks. Some research could improve the accuracy of CTC-based ASR by incorporating acoustic landmarks, which could help CTC training converge more rapidly and smoothly [66,67]. Moreover, the information of acoustic landmarks could be obtained, which could be used as an additional information source, to further improve the performance of the APED system [68].…”
mentioning
confidence: 99%