2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015
DOI: 10.1109/icassp.2015.7178961
|View full text |Cite
|
Sign up to set email alerts
|

Language independent query-by-example spoken term detection using N-best phone sequences and partial matching

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
6
1

Relationship

2
5

Authors

Journals

citations
Cited by 14 publications
(5 citation statements)
references
References 22 publications
0
5
0
Order By: Relevance
“…While our LVCSR-KWS work in Tamil and Vietnamese [24] focus on text queries, we have inspired strategies used in spoken term detection of audio queries. For example, [31] proposed partialmatching symbolic search, which complements popular pattern matching approaches using dynamic time warping in Query-by-Example Search on Speech (QUESST), formerly called Spoken Web Search (SWS), in MediaEval 2014.…”
Section: Discussionmentioning
confidence: 99%
“…While our LVCSR-KWS work in Tamil and Vietnamese [24] focus on text queries, we have inspired strategies used in spoken term detection of audio queries. For example, [31] proposed partialmatching symbolic search, which complements popular pattern matching approaches using dynamic time warping in Query-by-Example Search on Speech (QUESST), formerly called Spoken Web Search (SWS), in MediaEval 2014.…”
Section: Discussionmentioning
confidence: 99%
“…In the 2014 Queryby-Example Speech Search Task (QUESST) [11], one task was non-exact matching, in which test occurrences could contain small morphological variations with regard to the lexical form of the query. To solve this problem, Xu et al [12] proposed a partial matching strategy in which all partial phone sequences of a query were used to search for matching instances; Proenga et al [13] some of these works attempted to solve the non-exact matching problem, they all used DTW-based matching on frame-level representations, which has been shown to be outperformed by distance-based matching on acoustic word embeddings [2,3,4].…”
Section: Related Workmentioning
confidence: 99%
“…Our partial matching DTW systems, including fixedwindow [8,16] and phoneme-sequence [17] partial matching systems, were used to deal with T2 and T3 queries. In each fixed-window partial matching system, an analysis window between 70 and 90 frames long was defined.…”
Section: Dtw Systemsmentioning
confidence: 99%
“…Weighted finite state transducer (WFST) based symbolic search systems were used to deal with T2 and T3 queries [8,16]. Such systems decoded a query utterance into N-best phone sequences, and the partial phone sequences were extracted and converted to WFST format.…”
Section: Wfst-based Symbolic Search Systemsmentioning
confidence: 99%
See 1 more Smart Citation