2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI) 2013
DOI: 10.1109/cbmi.2013.6576546
|View full text |Cite
|
Sign up to set email alerts
|

Audio event detection in movies using multiple audio words and contextual Bayesian networks

Abstract: Abstract-This article investigates a novel use of the wellknown audio words representations to detect specific audio events, namely gunshots and explosions, in order to get more robustness towards soundtrack variability in Hollywood movies. An audio stream is processed as a sequence of stationary segments. Each segment is described by one or several audio words obtained by applying product quantization to standard features. Such a representation using multiple audio words constructed via product quantisation i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 11 publications
0
4
0
Order By: Relevance
“…Other methods focus on using only the audio features [10]. Furthermore, Penet et al [18] proposes a novel use of the well-known audio words representations by describing each segment by one or several audio words obtained by applying product quantization to standard features.…”
Section: Related Workmentioning
confidence: 99%
“…Other methods focus on using only the audio features [10]. Furthermore, Penet et al [18] proposes a novel use of the well-known audio words representations by describing each segment by one or several audio words obtained by applying product quantization to standard features.…”
Section: Related Workmentioning
confidence: 99%
“…To determine the design of the variables, we compared [4], [6], [7], [24], [25], and [35] and analyzed 38 videos in order to extract those words that are most frequently used by assailants the time of an assault. After counting, we found that there are about seven words that are frequently used and present in most cases.…”
Section: Words Variablementioning
confidence: 99%
“…The most frequently used audio representation is the Mel-Frequency Cepstral Coefficients (MFCCs) [156], which is widely used in the field of speech recognition and is designed to be robust to noise. MFCCs are used in many AED systems [10,59,39,211,210,149]. Other audio features used in AED frameworks, both in the time and the frequency domain, include Zero-Crossing Rate (ZCR) [156], which was used in [8,59,211], Linear Prediction Coefficients (LPCs) and Linear Predictive Cepstral Coefficients (LPCCs) in [8,211], Log-Frequency Cepstral Coefficients (LFCCs) in [8,120] and Perceptual Linear Prediction (PLP) in [59,120].…”
Section: Featuresmentioning
confidence: 99%
“…Another promising approach for building audio event detectors includes dictionary learning techniques together with Bayesian Networks. For instance, in [149], Penet et al used a dictionary learning and segment quantization approach where they replaced the low-level audio features extracted for each segment with one or several symbols corresponding to audio words. The quantization dictionary learning phase was implemented with a k-means algorithm using product quantization [92].…”
Section: Event Inferencementioning
confidence: 99%