Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-1278
|View full text |Cite
|
Sign up to set email alerts
|

Subspace Detection of DNN Posterior Probabilities via Sparse Representation for Query by Example Spoken Term Detection

Abstract: We cast the query by example spoken term detection (QbE-STD) problem as subspace detection where query and background subspaces are modeled as union of low-dimensional subspaces. The speech exemplars used for subspace modeling are class-conditional posterior probabilities estimated using deep neural network (DNN). The query and background training exemplars are exploited to model the underlying lowdimensional subspaces through dictionary learning for sparse representation. Given the dictionaries characterizing… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
7
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
4
3

Relationship

3
4

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 21 publications
(25 reference statements)
0
7
0
Order By: Relevance
“…This low-dimensional structure is the result of the constrained articulatory mechanism of human speech production [11], [12], which leads to the generation of linguistic units (e.g., phones, senones) lying on non-linear manifolds. As already shown in [13], [14], [15], these manifolds can be modeled as a union of low-dimensional subspaces and sparse representation is found to be a promising technique to model these subspaces. It is the goal of the present work to investigate how this sparsity property can be exploited to further improve state-of-the-art QbE-STD system.…”
Section: Introductionmentioning
confidence: 88%
See 1 more Smart Citation
“…This low-dimensional structure is the result of the constrained articulatory mechanism of human speech production [11], [12], which leads to the generation of linguistic units (e.g., phones, senones) lying on non-linear manifolds. As already shown in [13], [14], [15], these manifolds can be modeled as a union of low-dimensional subspaces and sparse representation is found to be a promising technique to model these subspaces. It is the goal of the present work to investigate how this sparsity property can be exploited to further improve state-of-the-art QbE-STD system.…”
Section: Introductionmentioning
confidence: 88%
“…In the context of speech processing, sparse recovery has already been studied for robust speech recognition [18], [19], [20], enhanced acoustic modeling [14], [15] as well as spoken query detection [13], [21], [22]. In our earlier work [13], we cast the query detection problem as subspace detection between query and non-query speech where the corresponding subspaces are modeled through dictionary learning for sparse representation. Given these dictionaries, detection of each frame is performed based on the ratio of the two corresponding sparse representation reconstruction errors, and the frame-level decisions are accumulated by counting the continuous number of frames detected as the query.…”
Section: Introductionmentioning
confidence: 99%
“…Exploiting this property enables a hierarchical speech classification and recognition framework based on structured sparse modeling of posterior exemplars [9,11]. In addition, the low-dimensional subspaces can be modeled through dictionary learning for sparse coding to enable unsupervised adaptation and enhanced acoustic modeling for speech recognition [10,12].…”
Section: State-of-the-art Solutions and Challengesmentioning
confidence: 99%
“…In addition, the low-dimensional subspaces can be modeled through dictionary learning for sparse coding to enable unsupervised adaptation and enhanced acoustic modeling for speech recognition [10,12]. Sparse subspace modeling of the posterior exemplars are also found promising for query-by-example spoken term detection (QbE-STD) [7,11,13].…”
Section: State-of-the-art Solutions and Challengesmentioning
confidence: 99%
“…Exploiting this property enables hierarchical speech recognition and classification frameworks based on sparse modeling of phonetic posterior exemplars [10,11]. Furthermore, the low-dimensional subspaces can be modeled through dictionary learning for sparse representation, and projection of the posteriors into the space characterized over the training data reduces the mismatch of the testing posteriors, and leads to enhanced acoustic modeling for speech recognition [8,9].…”
Section: Introductionmentioning
confidence: 99%