Combining State-Level Spotting and Posterior-Based Acoustic Match for Improved Query-by-Example Spoken Term Detection

Oishi, Shuji; Matsuba, Tatsuya; Makino, Mitsuaki; Kai, Atsuhiko

doi:10.21437/interspeech.2016-1259

Cited by 3 publications

(3 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Logistic regression-based fusion on DTW and phone-based systems is employed in [71][72][73][74]. DTWbased search at the HMM state-level from syllables obtained from a word-based speech recognizer and a deep neural network (DNN) posteriorgram-based rescoring are employed in [75], and [76] adds a logistic regression-based approach for detection rescoring. Finally, [77] employs a syllable-based speech recognizer and dynamic programming at the triphone state level to output detections and DNN posteriorgram-based rescoring.…”

Section: Hybrid Methodsmentioning

confidence: 99%

Search on speech from spoken queries: the Multi-domain International ALBAYZIN 2018 Query-by-Example Spoken Term Detection Evaluation

Tejedor

Toledano

López-Otero

et al. 2019

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

The huge amount of information stored in audio and video repositories makes search on speech (SoS) a priority area nowadays. Within SoS, Query-by-Example Spoken Term Detection (QbE STD) aims to retrieve data from a speech repository given a spoken query. Research on this area is continuously fostered with the organization of QbE STD evaluations. This paper presents a multi-domain internationally open evaluation for QbE STD in Spanish. The evaluation aims at retrieving the speech files that contain the queries, providing their start and end times, and a score that reflects the confidence given to the detection. Three different Spanish speech databases that encompass different domains have been employed in the evaluation: MAVIR database, which comprises a set of talks from workshops; RTVE database, which includes broadcast television (TV) shows; and COREMAH database, which contains 2-people spontaneous speech conversations about different topics. The evaluation has been designed carefully so that several analyses of the main results can be carried out. We present the evaluation itself, the three databases, the evaluation metrics, the systems submitted to the evaluation, the results, and the detailed post-evaluation analyses based on some query properties (within-vocabulary/out-of-vocabulary queries, single-word/multi-word queries, and native/foreign queries). Fusion results of the primary systems submitted to the evaluation are also presented. Three different teams took part in the evaluation, and ten different systems were submitted. The results suggest that the QbE STD task is still in progress, and the performance of these systems is highly sensitive to changes in the data domain. Nevertheless, QbE STD strategies are able to outperform text-based STD in unseen data domains.

show abstract

Section: Hybrid Methodsmentioning

confidence: 99%

Search on speech from spoken queries: the Multi-domain International ALBAYZIN 2018 Query-by-Example Spoken Term Detection Evaluation

Tejedor

Toledano

López-Otero

et al. 2019

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

show abstract

“…[38][39][40][41] use a logistic regression-based fusion on DTW-and phone-based systems. Oishi et al [42] uses a DTW-based search at the HMM state-level from syllables obtained from a word-based speech recognizer and a deep neural network (DNN) posteriorgram-based rescoring, and [43] adds a logistic regression-based approach for detection rescoring. Obara et al [44] employs a syllablebased speech recognizer and dynamic programming at the triphone-state level to output detections and DNN posteriorgram-based rescoring.…”

Section: Hybrid Approachmentioning

confidence: 99%

ALBAYZIN Query-by-example Spoken Term Detection 2016 evaluation

Tejedor

Toledano

López-Otero

et al. 2018

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

Query-by-example Spoken Term Detection (QbE STD) aims to retrieve data from a speech repository given an acoustic (spoken) query containing the term of interest as the input. This paper presents the systems submitted to the ALBAYZIN QbE STD 2016 Evaluation held as a part of the ALBAYZIN 2016 Evaluation Campaign at the IberSPEECH 2016 conference. Special attention was given to the evaluation design so that a thorough post-analysis of the main results could be carried out. Two different Spanish speech databases, which cover different acoustic and language domains, were used in the evaluation: the MAVIR database, which consists of a set of talks from workshops, and the EPIC database, which consists of a set of European Parliament sessions in Spanish. We present the evaluation design, both databases, the evaluation metric, the systems submitted to the evaluation, the results, and a thorough analysis and discussion. Four different research groups participated in the evaluation, and a total of eight template matching-based systems were submitted. We compare the systems submitted to the evaluation and make an in-depth analysis based on some properties of the spoken queries, such as query length, single-word/multi-word queries, and in-language/out-of-language queries.

show abstract

“…Query-by-example Spoken Term Detection aims to retrieve data from a speech repository (henceforth utterance) given an acoustic query containing the term of interest as input. QbE STD has been mainly addressed from three different approaches: methods based on the word/subword transcription of the query that typically employ a word/phone-based speech recognition system for query detection [12,13], methods based on template matching of features that are typically based on posteriorgram-based units and DTW-like search for query detections [14,15,16,17], and hybrid approaches that take advantage of both approaches [18,19,20,21].…”

Section: Introductionmentioning

confidence: 99%

AUDIAS-CEU: A Language-independent approach for the Query-by-Example Spoken Term Detection task of the Search on Speech ALBAYZIN 2018 evaluation

Cabello¹,

Toledano²,

Tejedor³

2018

IberSPEECH 2018

View full text Add to dashboard Cite

Query-by-Example Spoken Term Detection is the task of detecting query occurrences within speech data (henceforth utterances). Our submission is based on a language-independent template matching approach. First, queries and utterances are represented as phonetic posteriorgrams computed for English language with the phoneme decoder developed by the Brno University of Technology. Next, the Subsequence Dynamic Time Warping algorithm with a modified Pearson correlation coefficient as cost measure is employed to hipothesize detections. Results on development data showed an ATWV=0.1774 with MAVIR data and an ATWV=0.0365 with RTVE data.

show abstract

Combining State-Level Spotting and Posterior-Based Acoustic Match for Improved Query-by-Example Spoken Term Detection

Cited by 3 publications

References 9 publications

Search on speech from spoken queries: the Multi-domain International ALBAYZIN 2018 Query-by-Example Spoken Term Detection Evaluation

Search on speech from spoken queries: the Multi-domain International ALBAYZIN 2018 Query-by-Example Spoken Term Detection Evaluation

ALBAYZIN Query-by-example Spoken Term Detection 2016 evaluation

AUDIAS-CEU: A Language-independent approach for the Query-by-Example Spoken Term Detection task of the Search on Speech ALBAYZIN 2018 evaluation

Contact Info

Product

Resources

About