2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2014
DOI: 10.1109/icassp.2014.6854983
|View full text |Cite
|
Sign up to set email alerts
|

Use of articulatory bottle-neck features for query-by-example spoken term detection in low resource scenarios

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
5
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(5 citation statements)
references
References 22 publications
0
5
0
Order By: Relevance
“…Image processing techniques are also widely employed for verifying the presence of the query in an utterance [37–39 ]. Research studies in STD with focus on low resourced languages are in progress [16, 19 ].…”
Section: Related Work On Stdmentioning
confidence: 99%
See 2 more Smart Citations
“…Image processing techniques are also widely employed for verifying the presence of the query in an utterance [37–39 ]. Research studies in STD with focus on low resourced languages are in progress [16, 19 ].…”
Section: Related Work On Stdmentioning
confidence: 99%
“…However, the development of LVCSR system is possible only for well‐resourced languages, and it has difficulty in handling out‐of‐vocabulary (OOV) words. Later, predictive neural networks [12 ], phone lattice alignments [13, 14 ], multilayer perceptrons [15, 16 ], and deep neural networks (DNNs) [17–21 ] were introduced to keyword spotting. Variety features such as spectrographic seam patterns [22 ] and spectro–temporal patch features [23 ] have also experimented.…”
Section: Related Work On Stdmentioning
confidence: 99%
See 1 more Smart Citation
“…To deal with OOV problem, many approaches using a feature-based acoustic match have shown its effectiveness in low-resource STD tasks, as well as the robustness against the effects by difference of speaker and recording environments [1], [2], [3]. However, a feature-based approach is time-consuming and the approach alone couldn't outperform the STD performance of conventional ASR-based system for rich-resource language tasks.…”
Section: Introductionmentioning
confidence: 99%
“…The most widely used methods rely on template matching of speech features (eg: Gaussian or phone posteriorgrams) using dynamic time warping (DTW) algorithms [6]- [8], and similarity search based on end-to-end neural networks [9]. There exist hybrid approaches that use discriminatively trained models to extract speech features (eg: multilingual, articulatory bottleneck features) [10]- [13], which are then used in a template matching framework.…”
mentioning
confidence: 99%