Multimodal Event Processing: A Neural-Symbolic Paradigm for the Internet of Multimedia Things

Curry, Edward; Salwala, Dhaval; Dhingra, Praneet; Pontes, Felipe Arruda; Yadav, Piyush

doi:10.1109/jiot.2022.3143171

Cited by 12 publications

(4 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…NeSy visual semantic models have found useful applications in the representation of multimedia streams for realtime multimodal event processing in the Internet of Multimedia Things (IoMT) [19,54]. These models blend DNNs for object and attribute detection with symbolic rules to understand spatiotemporal relations among objects.…”

Section: Other Tasksmentioning

confidence: 99%

A survey of neurosymbolic visual reasoning with scene graphs and common sense knowledge

Khan,

Ilievski,

Breslin

et al. 2024

NAI

Self Cite

View full text Add to dashboard Cite

Combining deep learning and common sense knowledge via neurosymbolic integration is essential for semantically rich scene representation and intuitive visual reasoning. This survey paper delves into data- and knowledge-driven scene representation and visual reasoning approaches based on deep learning, common sense knowledge and neurosymbolic integration. It explores how scene graph generation, a process that detects and analyses objects, visual relationships and attributes in scenes, serves as a symbolic scene representation. This representation forms the basis for higher-level visual reasoning tasks such as visual question answering, image captioning, image retrieval, image generation, and multimodal event processing. Infusing common sense knowledge, particularly through the use of heterogeneous knowledge graphs, improves the accuracy, expressiveness and reasoning ability of the representation and allows for intuitive downstream reasoning. Neurosymbolic integration in these approaches ranges from loose to tight coupling of neural and symbolic components. The paper reviews and categorises the state-of-the-art knowledge-based neurosymbolic approaches for scene representation based on the types of deep learning architecture, common sense knowledge source and neurosymbolic integration used. The paper also discusses the visual reasoning tasks, datasets, evaluation metrics, key challenges and future directions, providing a comprehensive review of this research area and motivating further research into knowledge-enhanced and data-driven neurosymbolic scene representation and visual reasoning.

show abstract

Section: Other Tasksmentioning

confidence: 99%

A survey of neurosymbolic visual reasoning with scene graphs and common sense knowledge

Khan,

Ilievski,

Breslin

et al. 2024

NAI

Self Cite

View full text Add to dashboard Cite

show abstract

“…In this discipline, methods regard people who text about occurrences on social media as sensors. Events are also discovered using space-time scan statistics (STSS) out without aid of the text utilizing only space and time [15]. STSS perceives text in a space-time cube, which moves a cylindrical window over all imaginable space-time locations with a height (time) and variable radius (space).…”

Section: International Journal On Recent and Innovation Trends In Com...mentioning

confidence: 99%

Optimized Ensemble Approach for Multi-model Event Detection in Big data

Swapnika,

Vasumathi

2023

IJRITCC

View full text Add to dashboard Cite

Event detection acts an important role among modern society and it is a popular computer process that permits to detect the events automatically. Big data is more useful for the event detection due to large size of data. Multimodal event detection is utilized for the detection of events using heterogeneous types of data. This work aims to perform for classification of diverse events using Optimized Ensemble learning approach. The Multi-modal event data including text, image and audio are sent to the user devices from cloud or server where three models are generated for processing audio, text and image. At first, the text, image and audio data is processed separately. The process of creating a text model includes pre-processing using Imputation of missing values and data normalization. Then the textual feature extraction using integrated N-gram approach. The Generation of text model using Convolutional two directional LSTM (2DCon_LSTM). The steps involved in image model generation are pre-processing using Min-Max Gaussian filtering (MMGF). Image feature extraction using VGG-16 network model and generation of image model using Tweaked auto encoder (TAE) model. The steps involved in audio model generation are pre-processing using Discrete wavelet transform (DWT). Then the audio feature extraction using Hilbert Huang transform (HHT) and Generation of audio model using Attention based convolutional capsule network (Attn_CCNet). The features obtained by the generated models of text, image and audio are fused together by feature ensemble approach. From the fused feature vector, the optimal features are trained through improved battle royal optimization (IBRO) algorithm. A deep learning model called Convolutional duo Gated recurrent unit with auto encoder (C-Duo GRU_AE) is used as a classifier. Finally, different types of events are classified where the global model are then sent to the user devices with high security and offers better decision making process. The proposed methodology achieves better performances are Accuracy (99.93%), F1-score (99.91%), precision (99.93%), Recall (99.93%), processing time (17seconds) and training time (0.05seconds). Performance analysis exceeds several comparable methodologies in precision, recall, accuracy, F1 score, training time, and processing time. This designates that the proposed methodology achieves improved performance than the compared schemes. In addition, the proposed scheme detects the multi-modal events accurately.

show abstract

“…› Multimedia event processing (MEP) uses graphbased approaches for representing multimedia streams for real-time event processing in the middleware for the Internet of Multimedia Things. 10 MEP approaches use graph-based semantic models for representing video streams; deep learning models are used to detect objects and symbolic rules are employed to identify relationships between objects, which are required for matching high-level video events queried by users.…”

Section: Applicationsmentioning

confidence: 99%

Common Sense Knowledge Infusion for Visual Understanding and Reasoning: Approaches, Challenges, and Applications

Khan

Breslin

Curry

2022

IEEE Internet Comput.

Self Cite

View full text Add to dashboard Cite

Multimodal Event Processing: A Neural-Symbolic Paradigm for the Internet of Multimedia Things

Cited by 12 publications

References 47 publications

A survey of neurosymbolic visual reasoning with scene graphs and common sense knowledge

A survey of neurosymbolic visual reasoning with scene graphs and common sense knowledge

Optimized Ensemble Approach for Multi-model Event Detection in Big data

Common Sense Knowledge Infusion for Visual Understanding and Reasoning: Approaches, Challenges, and Applications

Contact Info

Product

Resources

About