2007 IEEE Workshop on Automatic Speech Recognition &Amp; Understanding (ASRU) 2007
DOI: 10.1109/asru.2007.4430187
|View full text |Cite
|
Sign up to set email alerts
|

Fast audio search using vector space modelling

Abstract: Many techniques for retrieving arbitrary content from audio have been developed to leverage the important challenge of providing fast access to very large volumes of multimedia data. We present a two-stage method for fast audio search, where a vector-space modelling approach is first used to retrieve a short list of candidate audio segments for a query. The list of candidate segments is then searched using a wordbased index for known words and a phone-based index for out-of-vocabulary words. We explore various… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2007
2007
2010
2010

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 12 publications
0
4
0
Order By: Relevance
“…. 10. In most cases, the query was detected correctly at least once the top 5 matches; however, the remaining locations were not found.…”
Section: Precision-recall Curve For Samesource Queriesmentioning
confidence: 87%
See 1 more Smart Citation
“…. 10. In most cases, the query was detected correctly at least once the top 5 matches; however, the remaining locations were not found.…”
Section: Precision-recall Curve For Samesource Queriesmentioning
confidence: 87%
“…Closely related tasks are the National Institute of Standards and Technology (NIST) tasks of spoken term detection (STD) [11,10] and spoken document retrieval (SDR) [6,3,9,5,4], where audio documents are searched in response to a text query. The STD task is to detect the query location, and the SDR task is to rank audio documents based on their relevance to the query (sometimes based on related words if the query term is not detected).…”
Section: Introductionmentioning
confidence: 99%
“…Audio transcripts generated by Automatic Speech Recognition (ASR) systems provide good content search cues, albeit imperfect coverage and varying accuracy, especially for salient key terms [1,2]. Search for content can be improved significantly through re-ranking or filtering speech segments by known speaker characteristics.…”
Section: Introductionmentioning
confidence: 99%
“…The third category of audio pattern retrieval techniques use speech recognizers to transcribe audio data into sub-word units or phonetic lattice instead of the conventional top-1 sequence. To address the OOV problems inherent in LVCSR systems, the vocabularyindependent approach [24,25,63,[161][162][163] to speech indexing has been proposed. The speech data is first transcribed into either phonetic lattice or sub-word sequences for subsequent processing.…”
Section: Introductionmentioning
confidence: 99%