Exploitation of Semantic Keywords for Malicious Event Classification

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

2020

Self Cite

We present a novel event recognition approach called Spatiallypreserved Doubly-injected Object Detection CNN (S-DOD-CNN), which incorporates the spatially preserved object detection information in both a direct and an indirect way. Indirect injection is carried out by simply sharing the weights between the object detection modules and the event recognition module. Meanwhile, our novelty lies in the fact that we have preserved the spatial information for the direct injection. Once multiple regions-of-intereset (RoIs) are acquired, their feature maps are computed and then projected onto a spatially-preserving combined feature map using one of the four RoI Projection approaches we present. In our architecture, combined feature maps are generated for object detection which are directly injected to the event recognition module. Our method provides the state-of-the-art accuracy for malicious event recognition.

Section: Datasetmentioning

confidence: 99%

Section: Datasetmentioning

confidence: 99%

See 1 more Smart Citation

S-DOD-CNN: Doubly Injecting Spatially-Preserved Object Information for Event Recognition

Lee

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

2020

Self Cite

“…There are two approaches to exploit the object detection |Rn|=5 first approach is to make use of a separately constructed object detection module and its output for boosting the event recognition. In this approach, the object detection results can either be directly fed into the event recognition module [1,2,3] or be integrated with the event recognition output via a late fusion [4,5,6,7,8,9,10,11]. The second approach is to transfer the object information by sharing the network weights between the object detection and event recognition and co-learning them in a unified architecture.…”

Section: Introductionmentioning

confidence: 99%

“…We evaluated the proposed approach on the Malicious Crowd Dataset [11]. The experiments demonstrate that utilizing the object detection information in both direct (injecting the feature maps) and indirect (transferring the information via shared weights) ways are effective in enhancing malicious event recognition performance.…”

Section: Introductionmentioning

confidence: 99%

DOD-CNN: Doubly-injecting Object Information for Event Recognition

Lee

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

2019

Self Cite

Recognizing an event in an image can be enhanced by detecting relevant objects in two ways: 1) indirectly utilizing object detection information within the unified architecture or 2) directly making use of the object detection output results. We introduce a novel approach, referred to as Doubly-injected Object Detection CNN (DOD-CNN), exploiting the object information in both ways for the task of event recognition. The structure of this network is inspired by the Integrated Object Detection CNN (IOD-CNN) where object information is indirectly exploited by the event recognition module through the shared portion of the network. In the DOD-CNN architecture, the intermediate object detection outputs are directly injected into the event recognition network while keeping the indirect sharing structure inherited from the IOD-CNN, thus being 'doubly-injected'. We also introduce a batch pooling layer which constructs one representative feature map from multiple object hypotheses. We have demonstrated the effectiveness of injecting the object detection information in two different ways in the task of malicious event recognition.

Object and Text-guided Semantics for CNN-based Activity Recognition

Reale

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

et al. 2019

Self Cite

Many previous methods have demonstrated the importance of considering semantically relevant objects for carrying out video-based human activity recognition, yet none of the methods have harvested the power of large text corpora to relate the objects and the activities to be transferred into learning a unified deep convolutional neural network. We present a novel activity recognition CNN which co-learns the object recognition task in an end-to-end multitask learning scheme to improve upon the baseline activity recognition performance. We further improve upon the multitask learning approach by exploiting a text-guided semantic space to select the most relevant objects with respect to the target activities. To the best of our knowledge, we are the first to investigate this approach.