Audio event detection in movies using multiple audio words and contextual Bayesian networks

Penet, Cédric; Demarty, Claire-Hélène; Gravier, Guillaume; Gros, Patrick

doi:10.1109/cbmi.2013.6576546

Cited by 6 publications

(4 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other methods focus on using only the audio features [10]. Furthermore, Penet et al [18] proposes a novel use of the well-known audio words representations by describing each segment by one or several audio words obtained by applying product quantization to standard features.…”

Section: Related Workmentioning

confidence: 99%

Joint Audio-Visual Words for Violent Scenes Detection in Movies

Derbas

Quénot

2014

Proceedings of International Conference on Multimedia Retrieval

View full text Add to dashboard Cite

This paper presents an audio-visual data representation for violent scenes detection in movies. Existing works in this field consider either the audio or the visual information; or their shallow fusion. None has yet explored their joint dependence for violent scenes detection. We propose a feature which provides strong multi-modal audio and visual cues by first joining the audio and the visual features and then revealing statistically the joint multi-modal patterns. Experimental validation was conducted in the context of the Violent Scenes Detection task of the MediaEval 2013 Multimedia benchmark. The obtained results show the potential of the proposed approach in comparison to methods using audio and visual features separately and other fusion methods.

show abstract

Section: Related Workmentioning

confidence: 99%

Joint Audio-Visual Words for Violent Scenes Detection in Movies

Derbas

Quénot

2014

Proceedings of International Conference on Multimedia Retrieval

View full text Add to dashboard Cite

show abstract

“…To determine the design of the variables, we compared [4], [6], [7], [24], [25], and [35] and analyzed 38 videos in order to extract those words that are most frequently used by assailants the time of an assault. After counting, we found that there are about seven words that are frequently used and present in most cases.…”

Section: Words Variablementioning

confidence: 99%

Dynamic Fuzzy Model to Detect Verbal Violence in Real Time

Campos

Pancardo²,

Nolasco

2022

csci

View full text Add to dashboard Cite

The crime rates in Mexico have been increasing in recent years, every day there are news on social media and in the news where assaults and verbal aggressions by criminals can be seen. Public transportation units suffer from violence that authorities have not been able to reduce, despite their efforts. That is why we have developed a fuzzy logic model that can adapt to almost any scenario thanks to the dynamism that we have implemented in each one of its stages. We have obtained promising results that we believe will be of great help to the authorities in the police headquarters to detect in real time the exact moment in which a verbal aggression typical of a violent assault is happening. This is a tool to help the authorities, not a substitution; making use of the latest technologies available to us.

show abstract

“…The most frequently used audio representation is the Mel-Frequency Cepstral Coefficients (MFCCs) [156], which is widely used in the field of speech recognition and is designed to be robust to noise. MFCCs are used in many AED systems [10,59,39,211,210,149]. Other audio features used in AED frameworks, both in the time and the frequency domain, include Zero-Crossing Rate (ZCR) [156], which was used in [8,59,211], Linear Prediction Coefficients (LPCs) and Linear Predictive Cepstral Coefficients (LPCCs) in [8,211], Log-Frequency Cepstral Coefficients (LFCCs) in [8,120] and Perceptual Linear Prediction (PLP) in [59,120].…”

Section: Featuresmentioning

confidence: 99%

“…Another promising approach for building audio event detectors includes dictionary learning techniques together with Bayesian Networks. For instance, in [149], Penet et al used a dictionary learning and segment quantization approach where they replaced the low-level audio features extracted for each segment with one or several symbols corresponding to audio words. The quantization dictionary learning phase was implemented with a k-means algorithm using product quantization [92].…”

Section: Event Inferencementioning

confidence: 99%

Event-based media processing and analysis: A survey of the literature

Tzelepis

Mezaris

et al. 2016

Image and Vision Computing

View full text Add to dashboard Cite

Research on event-based processing and analysis of media is receiving an increasing attention from the scientific community due to its relevance for an abundance of applications, from consumer video management and video surveillance to lifelogging and social media. Events have the ability to semantically encode relationships of different informational modalities, such as visual-audio-text, time, involved agents and objects, with the spatio-temporal component of events being a key feature for contextual analysis. This unveils an enormous potential for exploiting new information sources and opening new research directions. In this paper, we survey the existing literature in this field. We extensively review the employed conceptualization of the notion of event in multimedia, the techniques for event representation and modeling, the feature representation and event inference approaches for the problems of event detection in audio, visual, and textual content. Furthermore, we review some key event-based multimedia applications, and various benchmarking activities that provide solid frameworks for measuring the performance of different event processing and analysis systems. We provide an in-depth discussion of the insights obtained from reviewing the literature and identify future directions and challenges.Keywords: Event-based media processing and analysis, event conceptualization, event representation and modeling, multimedia event detection, event-based applications and benchmarking, survey of the literature

show abstract

Audio event detection in movies using multiple audio words and contextual Bayesian networks

Cited by 6 publications

References 11 publications

Joint Audio-Visual Words for Violent Scenes Detection in Movies

Joint Audio-Visual Words for Violent Scenes Detection in Movies

Dynamic Fuzzy Model to Detect Verbal Violence in Real Time

Event-based media processing and analysis: A survey of the literature

Contact Info

Product

Resources

About