Fast audio search using vector space modelling

Matthews, Bryan; Chaudhari, Upendra V.; Ramabhadran, Bhuvana

doi:10.1109/asru.2007.4430187

Cited by 7 publications

(4 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…. 10. In most cases, the query was detected correctly at least once the top 5 matches; however, the remaining locations were not found.…”

Section: Precision-recall Curve For Samesource Queriesmentioning

confidence: 87%

“…Closely related tasks are the National Institute of Standards and Technology (NIST) tasks of spoken term detection (STD) [11,10] and spoken document retrieval (SDR) [6,3,9,5,4], where audio documents are searched in response to a text query. The STD task is to detect the query location, and the SDR task is to rank audio documents based on their relevance to the query (sometimes based on related words if the query term is not detected).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Spoken Term Detection Using Visual Spectrogram Matching

Lazic

Aarabi

2008

2008 Tenth IEEE International Symposium on Multimedia

View full text Add to dashboard Cite

This work proposes a novel spoken term detection technique, where the query is in audio format. Detection and retrieval are performed by matching the spectrograms of the spoken document and query as visual images, using ideas from computer vision. Local descriptors are computed on a dense grid over each spectrogram, and the query term is detected using deformable template matching of grids. Detection experiments are perfomed on an hour-long newscast recording, involving 10 query terms of length 2-3 words. When the query term comes from the document, nearly all other instances of the term in the document are detected; performance degrades when the query is recorded by the user.

show abstract

“…. 10. In most cases, the query was detected correctly at least once the top 5 matches; however, the remaining locations were not found.…”

Section: Precision-recall Curve For Samesource Queriesmentioning

confidence: 87%

Section: Introductionmentioning

confidence: 99%

Spoken Term Detection Using Visual Spectrogram Matching

Lazic

Aarabi

2008

2008 Tenth IEEE International Symposium on Multimedia

View full text Add to dashboard Cite

show abstract

“…Audio transcripts generated by Automatic Speech Recognition (ASR) systems provide good content search cues, albeit imperfect coverage and varying accuracy, especially for salient key terms [1,2]. Search for content can be improved significantly through re-ranking or filtering speech segments by known speaker characteristics.…”

Section: Introductionmentioning

confidence: 99%

Audio-based classification of speaker characteristics

Dutta

Haubold

2009

2009 IEEE International Conference on Multimedia and Expo

View full text Add to dashboard Cite

The human voice is primarily a carrier of speech, but it also contains non-linguistic features unique to a speaker and indicative of various speaker demographics, e.g. gender, nativity, ethnicity. Such characteristics are helpful cues for audio/video search and retrieval. In this paper, we evaluate the effects of various low-, mid-, and high-level features for effective classification of speaker characteristics. Low-level signal-based features include MFCCs, LPCs, and six spectral features; mid-level statistical features model lowlevel features; and high-level semantic features are based on selected phonemes in addition to mid-level features. Our data set consists of approximately 76.4 hours of annotated audio with 2786 unique speaker segments used for classification. Quantitative evaluation of our method results in accuracy rates up to 98.6% on our test data for male/female classification using mid-level features and a linear kernel support vector machine. We determine that mid-and high-level features are optimal for identification of speaker characteristics.

show abstract

“…The third category of audio pattern retrieval techniques use speech recognizers to transcribe audio data into sub-word units or phonetic lattice instead of the conventional top-1 sequence. To address the OOV problems inherent in LVCSR systems, the vocabularyindependent approach [24,25,63,[161][162][163] to speech indexing has been proposed. The speech data is first transcribed into either phonetic lattice or sub-word sequences for subsequent processing.…”

Section: Introductionmentioning

confidence: 99%

Audio pattern discovery and retrieval

Wang¹

View full text Add to dashboard Cite

This thesis explores unsupervised algorithms for pattern discovery and retrieval in audio and speech data. In this work, audio pattern is defined as repeating audio content such as repeating music segments or words/short phrases in speech recordings. The meanings of "pattern" will be defined separately for different types of data, for example, repeating pattern discovery in music will extract segments with similar melody in music piece; In human speech, the same words/short phrases spoken by single or multiple speakers are also defined as speech patterns; In broadcast audio, repeated commercials/logo music are also considered as patterns. Previous work on audio pattern discovery focuses on either symbolizing the audio signal into token sequences followed by text-based search or using Brute-Force search techniques such as self-similarity matrix and Dynamic Time Warping. Symbolization process that relies on Vector Quantization or other modeling techniques may suffer from

show abstract

Fast audio search using vector space modelling

Cited by 7 publications

References 12 publications

Spoken Term Detection Using Visual Spectrogram Matching

Spoken Term Detection Using Visual Spectrogram Matching

Audio-based classification of speaker characteristics

Audio pattern discovery and retrieval

Contact Info

Product

Resources

About