2018
DOI: 10.1109/taslp.2018.2815780
|View full text |Cite
|
Sign up to set email alerts
|

Sparse Subspace Modeling for Query by Example Spoken Term Detection

Abstract: Abstract-This paper focuses on the problem of query by example spoken term detection (QbE-STD) in zero-resource scenario. Current state-of-the-art approaches to tackle this problem rely on dynamic programming based template matching techniques using phone posterior features extracted at the output of a deep neural network (DNN). Previously, it has been shown that the space of phone posteriors is highly structured, as a union of low-dimensional subspaces. To exploit the temporal and sparse structure of the spee… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 20 publications
(20 citation statements)
references
References 31 publications
0
19
1
Order By: Relevance
“…This observation is opposite to the results previously obtained on a clean (simple) database [27] where incorporating more examples of the query were found more effective for the sparse method compared to the baseline DTW system. This issue can be attributed to the large variability and overlap present in the utterances of AMI corpus.…”
Section: Qbe-std Performancecontrasting
confidence: 99%
See 1 more Smart Citation
“…This observation is opposite to the results previously obtained on a clean (simple) database [27] where incorporating more examples of the query were found more effective for the sparse method compared to the baseline DTW system. This issue can be attributed to the large variability and overlap present in the utterances of AMI corpus.…”
Section: Qbe-std Performancecontrasting
confidence: 99%
“…Alternative to the average reconstruction error of the background phone dictionaries, the minimum of them can also be used as the background score [16,27]. However, we found that the average score yields better detection performance.…”
Section: Subspace Modeling and Detectionmentioning
confidence: 99%
“…Recent exemplar based speech processing offers high flexibility in speech applications, partly attributed to the lack of complex statistical assumptions that facilitate exploiting "data deluge" with no prejudice on expected answers. Deep neural network (DNN) based class-conditional posterior probabilities (hereafter referred to as posteriors) have been found to be one of the best speech representations to enable exemplar based speech recognition [4] and spoken query detection [5,6,7]. In theory, if infinite number of exemplars of continuous probability density functions are provided, a simple nearest-neighbor rule leads to optimal classification [8].…”
Section: State-of-the-art Solutions and Challengesmentioning
confidence: 99%
“…In addition, the low-dimensional subspaces can be modeled through dictionary learning for sparse coding to enable unsupervised adaptation and enhanced acoustic modeling for speech recognition [10,12]. Sparse subspace modeling of the posterior exemplars are also found promising for query-by-example spoken term detection (QbE-STD) [7,11,13].…”
Section: State-of-the-art Solutions and Challengesmentioning
confidence: 99%
See 1 more Smart Citation