2018
DOI: 10.1007/978-3-030-05716-9_9
|View full text |Cite
|
Sign up to set email alerts
|

Large Scale Audio-Visual Video Analytics Platform for Forensic Investigations of Terroristic Attacks

Abstract: The forensic investigation of a terrorist attack poses a huge challenge to the investigative authorities, as several thousand hours of video footage need to be spotted. To assist law enforcement agencies (LEA) in identifying suspects and securing evidences, we present a platform which fuses information of surveillance cameras and video uploads from eyewitnesses. The platform integrates analytical modules for different input-modalities on a scalable architecture. Videos are analyzed according their acoustic and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 22 publications
0
6
0
Order By: Relevance
“…For the evaluation we are using a Convolutional Recurrent Neural Network (CRNN) [3,22]. A CRNN is a combination of a Convolutional Neural Network (CNN) stack and a Recurrent Neural Network (RNN).…”
Section: Model Architecturementioning
confidence: 99%
See 1 more Smart Citation
“…For the evaluation we are using a Convolutional Recurrent Neural Network (CRNN) [3,22]. A CRNN is a combination of a Convolutional Neural Network (CNN) stack and a Recurrent Neural Network (RNN).…”
Section: Model Architecturementioning
confidence: 99%
“…Audio representations aim to capture intrinsic properties and characteristics of the audio content to facilitate complex tasks such as classification (acoustic scenes [6,16], music genres [15]), regression (emotion recognition [31]) or similarity estimation (music, [13] general audio [22]). In the context of this paper we focus on their application in audio similarity estimation and retrieval.…”
Section: Introductionmentioning
confidence: 99%
“…Implementation: The implemented approach -detailed in [14] -is a combination of the models developed in the Detection and Classification of Acoustic Scenes and Events (DCASE) [11] international evaluation campaign [9,17,18] and the approach presented in [21]. The model applies a Convolutional Recurrent Neural Network (CRNN) [14] with an attention layer on log-scaled Mel-Spectrogram inputs (9.92 seconds audio, 44,1KHz sample rate, 80 Mel-bands, 2048 samples STFT-window size with 50% overlap). It was trained on a pre-processed subset of the Audioset dataset [4].…”
Section: Sound Event Detection (Sed)mentioning
confidence: 99%
“…Implementation: The developed approach to a multi-class multitarget tracking method -also detailed in [14] -was trained and optimized on the specific scenario-relevant object categories. It is based on an appearance based tracker as in [19] and aims to add additional features such as targets motion and mutual interaction [20], as well as learning temporal dependencies as in [12] [20].…”
Section: Sound Event Detection (Sed)mentioning
confidence: 99%
“…In [3] the authors used lips reading to speech recognition. The authors in [4] presented a platform for audio-visual video analysis to assist agencies in analyzing and identifying suspects from large scale videos recorded after a terrorist attack.…”
Section: Introductionmentioning
confidence: 99%