Sparse Subspace Modeling for Query by Example Spoken Term Detection

Ram, Dhananjay; Asaei, Afsaneh; Bourlard, Hervé

doi:10.1109/taslp.2018.2815780

Cited by 20 publications

(20 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This observation is opposite to the results previously obtained on a clean (simple) database [27] where incorporating more examples of the query were found more effective for the sparse method compared to the baseline DTW system. This issue can be attributed to the large variability and overlap present in the utterances of AMI corpus.…”

Section: Qbe-std Performancecontrasting

confidence: 99%

See 1 more Smart Citation

Subspace Detection of DNN Posterior Probabilities via Sparse Representation for Query by Example Spoken Term Detection

Ram¹,

Asaei²,

Bourlard³

2016

Interspeech 2016

Self Cite

View full text Add to dashboard Cite

We cast the query by example spoken term detection (QbE-STD) problem as subspace detection where query and background subspaces are modeled as union of low-dimensional subspaces. The speech exemplars used for subspace modeling are class-conditional posterior probabilities estimated using deep neural network (DNN). The query and background training exemplars are exploited to model the underlying lowdimensional subspaces through dictionary learning for sparse representation. Given the dictionaries characterizing the query and background subspaces, QbE-STD is performed based on the ratio of the two corresponding sparse representation reconstruction errors. The proposed subspace detection method can be formulated as the generalized likelihood ratio test for composite hypothesis testing. The experimental evaluation demonstrate that the proposed method is able to detect the query given a single example and performs significantly better than a highly competitive QbE-STD baseline system based on dynamic time warping (DTW) for exemplar matching.

show abstract

Section: Qbe-std Performancecontrasting

confidence: 99%

“…Alternative to the average reconstruction error of the background phone dictionaries, the minimum of them can also be used as the background score [16,27]. However, we found that the average score yields better detection performance.…”

Section: Subspace Modeling and Detectionmentioning

confidence: 99%

Subspace Detection of DNN Posterior Probabilities via Sparse Representation for Query by Example Spoken Term Detection

Ram¹,

Asaei²,

Bourlard³

2016

Interspeech 2016

Self Cite

View full text Add to dashboard Cite

show abstract

“…Recent exemplar based speech processing offers high flexibility in speech applications, partly attributed to the lack of complex statistical assumptions that facilitate exploiting "data deluge" with no prejudice on expected answers. Deep neural network (DNN) based class-conditional posterior probabilities (hereafter referred to as posteriors) have been found to be one of the best speech representations to enable exemplar based speech recognition [4] and spoken query detection [5,6,7]. In theory, if infinite number of exemplars of continuous probability density functions are provided, a simple nearest-neighbor rule leads to optimal classification [8].…”

Section: State-of-the-art Solutions and Challengesmentioning

confidence: 99%

“…In addition, the low-dimensional subspaces can be modeled through dictionary learning for sparse coding to enable unsupervised adaptation and enhanced acoustic modeling for speech recognition [10,12]. Sparse subspace modeling of the posterior exemplars are also found promising for query-by-example spoken term detection (QbE-STD) [7,11,13].…”

Section: State-of-the-art Solutions and Challengesmentioning

confidence: 99%

See 1 more Smart Citation

Phonological Posterior Hashing for Query by Example Spoken Term Detection

Asaei¹,

Ram²,

Bourlard³

2018

Interspeech 2018

Self Cite

View full text Add to dashboard Cite

State of the art query by example spoken term detection (QbE-STD) systems in zero-resource conditions rely on representation of speech in terms of sequences of class-conditional posterior probabilities estimated by deep neural network (DNN). The posteriors are often used for pattern matching or dynamic time warping (DTW). Exploiting posterior probabilities as speech representation propounds diverse advantages in a classification system. One key property of the posterior representations is that they admit a highly effective hashing strategy that enables indexing a large audio archive in divisions for reducing the search complexity. Moreover, posterior indexing leads to a compressed representation and enables pronunciation dewarping and partial detection with no need for DTW. We exploit these characteristics of the posterior space in the context of redundant hash addressing for query-by-example spoken term detection (QbE-STD). We evaluate the QbE-STD system on AMI corpus and demonstrate that tremendous speedup and superior accuracy is achieved compared to the state-of-the-art pattern matching solution based on DTW. The system has the potential to enable massively large scale spoken query detection.

show abstract

Spoken Term Detection Techniques

Mary¹,

Deekshitha

2018

SpringerBriefs in Speech Technology

View full text Add to dashboard Cite

Sparse Subspace Modeling for Query by Example Spoken Term Detection

Cited by 20 publications

References 31 publications

Subspace Detection of DNN Posterior Probabilities via Sparse Representation for Query by Example Spoken Term Detection

Subspace Detection of DNN Posterior Probabilities via Sparse Representation for Query by Example Spoken Term Detection

Phonological Posterior Hashing for Query by Example Spoken Term Detection

Spoken Term Detection Techniques

Contact Info

Product

Resources

About