Konstantinos Gkountakos scite author profile

Ioannidis

Tsikrika

et al. 2020

This work examines violence detection in video scenes of crowds and proposes a crowd violence detection framework based on a 3D convolutional deep learning architecture, the 3D-ResNet model with 50 layers. The proposed framework is evaluated on the Violent Flows dataset against several state-of-the-art approaches and achieves higher accuracy values in almost all occasions, while also performing the violence detection activities in (near) real-time.

VERGE in VBS 2021

Andreadis

Moumtzidou

et al. 2021

Crowd Violence Detection from Video Footage

Ioannidis

Tsikrika

et al. 2021

VERGE in VBS 2020

Andreadis¹,

Moumtzidou²,

Apostolidis³

et al. 2019

This paper demonstrates VERGE, an interactive video retrieval engine for browsing a collection of images or videos and searching for specific content. The engine integrates a multitude of retrieval methodologies that include visual and textual searches and further capabilities such as fusion and reranking. All search options and results appear in a web application that aims at a friendly user experience.

Incorporating Textual Similarity in Video Captioning Schemes

Dimou

Papadopoulos

et al. 2019

The problem of video captioning has been heavily investigated from the research community the last years and, especially, since Recurrent Neural Networks (RNNs) have been introduced. Aforementioned approaches of video captioning, are usually based on sequence-to-sequence models that aim to exploit the visual information by detecting events, objects, or via matching entities to words. However, the exploitation of the contextual information that can be extracted from the vocabulary has not been investigated yet, except from approaches that make use of parts of speech such as verbs, nouns, and adjectives. The proposed approach is based on the assumption that textually similar captions should represent similar visual content. Specifically, we propose a novel loss function that penalizes/rewards the wrong/correct predicted words based on the semantic cluster that they belong to. The proposed method is evaluated using two widely-known datasets in the video captioning domain, Microsoft Research-Video to Text (MSR-VTT) and Microsoft Research Video Description Corpus (MSVD). Finally, experimental analysis proves that the proposed method outperforms the baseline approach in most cases.

VERGE in VBS 2022

Andreadis

Moumtzidou

Galanopoulos

et al. 2022

Knowledge Engineering for Crime Investigation

Müller

Mühlenberg

Pallmer

et al. 2022

Incorporation of semantic segmentation information in deep hashing techniques for image retrieval

Gkountakos¹,

Semertzidis²,

Papadopoulos³

et al. 2017

Extracting discriminative image features for similarity search in nowadays large-scale databases becomes an imperative issue of paramount importance. To address the so called task of Approximate Nearest Neighbor (ANN) search in large visual dataset, deep hashing methods (i.e. approaches that make use of the recent deep learning paradigm in computer vision) have recently been introduced. In this paper, a novel approach to deep hashing is proposed, which incorporates local-level information, in the form of image semantic segmentation masks, during the hash code learning step. The proposed framework makes use of pixel-level classification labels, i.e. following a point-wise supervised learning methodology. Experimental evaluation in the significantly challenging domain of on-line terrorist propaganda video analysis, i.e. a highly diverse and heterogeneous application case, demonstrates the efficiency of the proposed approach.