Interspeech 2013 2013
DOI: 10.21437/interspeech.2013-654
|View full text |Cite
|
Sign up to set email alerts
|

Robust audio-codebooks for large-scale event detection in consumer videos

Abstract: In this paper we present our audio based system for detecting "events" within consumer videos (e.g. You Tube) and report our experiments on the TRECVID Multimedia Event Detection (MED) task and development data. Codebook or bag-of-words models have been widely used in text, visual and audio domains and form the state-of-the-art in MED tasks. The overall effectiveness of these models on such datasets depends critically on the choice of low-level features, clustering approach, sampling method, codebook size, wei… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(6 citation statements)
references
References 17 publications
0
6
0
Order By: Relevance
“…Exponential Chi-square (χ 2 ) kernels in form of exp(−γd(x, y)) have been known to work remarkably well with histogram features, including for detection of acoustic concepts [34] [35]. d(x, y) is χ 2 distance.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…Exponential Chi-square (χ 2 ) kernels in form of exp(−γd(x, y)) have been known to work remarkably well with histogram features, including for detection of acoustic concepts [34] [35]. d(x, y) is χ 2 distance.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…Pancoast and Akbacak used k-means in their original study [16]; however, due to the large number of frames to be clustered, the runtime of this approach is very high. Rawat et al offer simple random sampling [19]; its runtime is marginally better than the k-means, and it does not really affect the performance. Later, Arthur et al applied k-means++ clustering [1], a cluster center initialization procedure, which was used instead of completely random sampling, hence the distribution of cluster centers became more balanced.…”
Section: Parameters Of the Boaw Methodsmentioning
confidence: 99%
“…In the BoAW approach, the numerical LLDs or alternatively the higher level derived features extracted from the SnS data will first undergo a vector quantisation (VQ) step, which employs a codebook of template LLDs which was previously learnt from a certain number of training data [74]. For generating the codebook, Schmitt et al and their followers used the initialisation step of k-means++ clustering [104], which is comparable to an optimised random sampling of LLDs [105] instead of the traditional k-means clustering [106], [107] method to improve the computational speed and at the same time guarantees a comparable performance. To improve the robustness of this approach, the N a (assignment number) words (i.e., LLDs) with the lowest Euclidean distance are considered instead of assigning each LLD to only the most similar word in the codebook.…”
Section: B Higher Representationsmentioning
confidence: 99%