Query-by-Example Spoken Term Detection using Frequency Domain Linear Prediction and Non-Segmental Dynamic Time Warping

Mantena, Gautam Varma; Achanta, Sivanand; Prahallad, Kishore

doi:10.1109/taslp.2014.2311322

Cited by 53 publications

(22 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition, performance using the proposed approach is better than the Gaussian posteriorgram. This finding matches a previous study reported in [6]. This might be because of the increasing number of clusters better represents the speech signal at the frame-level.…”

Section: Number Of Gaussiansupporting

confidence: 92%

“…The number of Gaussian components in Gaussian posteriorgram plays an important role in QbE-STD tasks [4], [6]. In this Section, we investigate the effect of the number of mixture components used in VTLN warping factor estimation on QbE-STD tasks.…”

Section: Number Of Gaussianmentioning

confidence: 99%

“…The SDTW needs to be executed multiple times to detect the presence of spoken query, which increases computational requirements. To overcome this computational requirement, subsequence Dynamic Time Warping (subDTW) [5] or non-segmental version of DTW were proposed in [6], [7].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

VTLN-warped Gaussian posteriorgram for QbE-STD

Madhavi

Patil

2017

2017 25th European Signal Processing Conference (EUSIPCO)

View full text Add to dashboard Cite

Abstract-Vocal Tract Length Normalization (VTLN) is a very important speaker normalization technique for speech recognition tasks. In this paper, we propose the use of Gaussian posteriorgram of VTLN-warped spectral features for a Queryby-Example Spoken Term Detection (QbE-STD). This paper presents the use of a Gaussian Mixture Model (GMM) framework for estimation of VTLN warping factor. This GMM framework does not require phoneme-level transcription and hence, it can be useful for unsupervised tasks. We propose the iterative approach for VTLN warping factor estimation with two GMM training approaches, namely, Expectation-Maximization (EM) and Deterministic Annealing-Expectation Maximization (DAEM). The VTLN-warped Gaussian posteriorgram gave the better QbE-STD performance. The performance of TIMIT QbE-STD was investigated with different evaluation factors, such as a number of Gaussian components in GMM, various local constraints, and a number of iterations in VTLN warping factor estimation. VTLNwarped Gaussian posteriorgram reduces the speaker-specific variation in Gaussian posteriorgram and hence, it is expected to give better performance than Gaussian posteriorgram.

show abstract

Section: Number Of Gaussiansupporting

confidence: 92%

Section: Number Of Gaussianmentioning

confidence: 99%

See 1 more Smart Citation

VTLN-warped Gaussian posteriorgram for QbE-STD

Madhavi

Patil

2017

2017 25th European Signal Processing Conference (EUSIPCO)

View full text Add to dashboard Cite

show abstract

“…DTW is employed [12] for this query and reference file at syllable level. From the figure 1, it can be observed that there is match from syllable number 14 to 16 of reference speech file to that of query file.…”

Section: Spoken Term Detectionmentioning

confidence: 99%

Usage of acoustic cues in spoken term detection keyword spotting for zero low resource languages

Sreedhar¹,

Suryakanth²

2017

Fifth International Conference on Advances in Computing, Communication and Information Technology - CCIT 2017

View full text Add to dashboard Cite

Abstract:The proposed work exploits acoustic cues at various levels and incorporates them in the present (Spoken Term Detection) STD frame work. Recently proposed new syllabification method [1] for speech signal is being used for STD. In STD, a query and reference speech signals are provided, these speech signals are syllabified using the new syllabification method and features like Mel-frequency cepstral coefficients (MFCC), posterior grams are extracted. These features are then matched using template based match techniques like dynamic time warping (DTW) at syllable level instead of regular frame level. This essentially reduces the unwanted matching done at frame level.

show abstract

“…Using speech queries offers a big advantage for devices with limited textbased capabilities, which can be effectively used under the QbE STD paradigm. Other advantage is that QbE STD can be employed for building language-independent STD systems [7][8][9][10], since prior knowledge of the language involved in the speech data is not necessary.…”

Section: Introductionmentioning

confidence: 99%

Comparison of ALBAYZIN query-by-example spoken term detection 2012 and 2014 evaluations

Tejedor

Toledano

López-Otero³

et al. 2016

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

Query-by-example spoken term detection (QbE STD) aims at retrieving data from a speech repository given an acoustic query containing the term of interest as input. Nowadays, it is receiving much interest due to the large volume of multimedia information. This paper presents the systems submitted to the ALBAYZIN QbE STD 2014 evaluation held as a part of the ALBAYZIN 2014 Evaluation campaign within the context of the IberSPEECH 2014 conference. This is the second QbE STD evaluation in Spanish, which allows us to evaluate the progress in this technology for this language. The evaluation consists in retrieving the speech files that contain the input queries, indicating the start and end times where the input queries were found, along with a score value that reflects the confidence given to the detection of the query. Evaluation is conducted on a Spanish spontaneous speech database containing a set of talks from workshops, which amount to about 7 h of speech. We present the database, the evaluation metric, the systems submitted to the evaluation, the results, and compare this second evaluation with the first ALBAYZIN QbE STD evaluation held in 2012. Four different research groups took part in the evaluations held in 2012 and 2014. In 2014, new multi-word and foreign queries were added to the single-word and in-language queries used in 2012. Systems submitted to the second evaluation are hybrid systems which integrate letter transcription-and template matching-based systems. Despite the significant improvement obtained by the systems submitted to this second evaluation compared to those of the first evaluation, results still show the difficulty of this task and indicate that there is still room for improvement.

show abstract

Query-by-Example Spoken Term Detection using Frequency Domain Linear Prediction and Non-Segmental Dynamic Time Warping

Cited by 53 publications

References 25 publications

VTLN-warped Gaussian posteriorgram for QbE-STD

VTLN-warped Gaussian posteriorgram for QbE-STD

Usage of acoustic cues in spoken term detection keyword spotting for zero low resource languages

Comparison of ALBAYZIN query-by-example spoken term detection 2012 and 2014 evaluations

Contact Info

Product

Resources

About