2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2012
DOI: 10.1109/icassp.2012.6289081
|View full text |Cite
|
Sign up to set email alerts
|

An acoustic segment modeling approach to query-by-example spoken term detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
0

Year Published

2013
2013
2020
2020

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 50 publications
(29 citation statements)
references
References 16 publications
0
29
0
Order By: Relevance
“…Each ASM unit had 3 states with 16 gaussian components at each state. Detailed training procedure of the ASM tokenizer can be found in [9]. Both the GMM tokenizer and the ASM tokenizer took in 39-dimensional MFCC features, which were processed with utterance-based mean and variance normalization (MVN) and vocal tract length normalization (VTLN).…”
Section: Tokenizer Implementationmentioning
confidence: 99%
See 1 more Smart Citation
“…Each ASM unit had 3 states with 16 gaussian components at each state. Detailed training procedure of the ASM tokenizer can be found in [9]. Both the GMM tokenizer and the ASM tokenizer took in 39-dimensional MFCC features, which were processed with utterance-based mean and variance normalization (MVN) and vocal tract length normalization (VTLN).…”
Section: Tokenizer Implementationmentioning
confidence: 99%
“…In the previous studies related to the posteriorgram-based template matching framework, many efforts were contributed to introducing novel modeling for the tokenizers, such as deep Boltzmann machine [7], discriminant GMM [8], acoustic segment model (ASM) [9], etc. However few works were dedicated to the combined use of different tokenizers.…”
Section: Introductionmentioning
confidence: 99%
“…However, the proposed framework differs from previous studies in that features derived from a pre-index of spoken documents are used to train a classifier. Most previous studies used acoustic features [19], [20], ASR-related information [21], latticebased information [22], [23], and similar approaches. In contrast, we demonstrate the effectiveness of the proposed decision process, wherein STD outputs are verified using an SVM-based classifier trained with pre-indexed best match keyword-based features.…”
Section: Introductionmentioning
confidence: 99%
“…The input of the GMM tokenizer was 39-dimensional MFCC feature vectors, which had been processed with vocal tract length normalization (VTLN). The second is an ASM tokenizer [14], containing 256 ASM units. Each unit had 3 states with 16 Gaussian mixtures for each state.…”
Section: Restricted Systemsmentioning
confidence: 99%
“…Five semi-supervised tokenizers were built from Czech, Hungarian, Russian, English and Mandarin phoneme recognizers, which were all in the split temporal context network structure [12,13]. The two unsupervised tokenizers were "GMM" and "ASM" (Acoustic Segment Modeling) [14], as described in Section 4.2. All these tokenizers were used to generate posteriorgrams, and Dynamic Time Warping (DTW) was applied for detection.…”
Section: Open Systemsmentioning
confidence: 99%