2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings
DOI: 10.1109/icassp.2006.1660179
|View full text |Cite
|
Sign up to set email alerts
|

Keyword Spotting of Arbitrary Words Using Minimal Speech Resources

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
34
0

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 42 publications
(34 citation statements)
references
References 5 publications
0
34
0
Order By: Relevance
“…The more recent work of Novotney et al showed the potential for leveraging small amounts of annotated data to train initial models, which can then be used to automatically annotate much larger quantities of unannotated data [5]. The approach by Gish et al requires even less supervision by creating initial acoustic models using unsupervised techniques on unannotated audio data [13]. This latter work is most similar to our ongoing research directed towards rapid portability of ASR technology to languages with limited linguistic resources.…”
Section: Introductionmentioning
confidence: 99%
“…The more recent work of Novotney et al showed the potential for leveraging small amounts of annotated data to train initial models, which can then be used to automatically annotate much larger quantities of unannotated data [5]. The approach by Gish et al requires even less supervision by creating initial acoustic models using unsupervised techniques on unannotated audio data [13]. This latter work is most similar to our ongoing research directed towards rapid portability of ASR technology to languages with limited linguistic resources.…”
Section: Introductionmentioning
confidence: 99%
“…One of the disadvantages of this approach, however, is poor generalization to arbitrary languages, (or more general audio), since it typically requires a trained speech recognizer. Thus, for under-resourced languages, there is a time/cost issue to obtain enough annotated data to build a recognizer with acceptable recognition performance [2,3].…”
Section: Introductionmentioning
confidence: 99%
“…Converting all matched fragment pairs to a graph. Each numbered node corresponds to a temporal local maximum in fragment similarity in a particular utterance (e.g., [1][2][3][4][5]. Each matching fragment is represented by a connection between two nodes in the graph (e.g., 1-4, 2-4, 3-5).…”
Section: Segmental Dtwmentioning
confidence: 99%
“…Annotation of speech corpora is currently a very time consuming and expensive endeavor and is a limiting factor in how quickly speech recognizers can be created for new problem areas and languages. Given the relative ease of creating and storing large quantities of audio-visual speech material these days, methods that can process vast quantities of unannotated data to enable keyword search [1], audio summarization etc. could be quite useful.…”
Section: Introductionmentioning
confidence: 99%