Generalized Zero-Shot Learning Using Multimodal Variational Auto-Encoder With Semantic Concepts

Bendre, Nihar; Desai, Kevin; Najafirad, Peyman

doi:10.1109/icip42928.2021.9506108

Cited by 16 publications

(8 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Each of the three articles in the biology domain pertained to recognition [397], prediction [398] and identification [399] accordingly. In total, 4 bird species domain relevant papers were identified, where 3 were in classification [45], [400], [401] and 1 was in integration [402]. There were 2 articles each in the signal processing and gender domains.…”

Section: Inclusion Criteriamentioning

confidence: 99%

“…Of eight articles in parallel, seven pertained to co-training [66], [192], [209], [210], [329], [376], [379] and one related to transfer learning [201]. On the other side, from 25 non-parallel articles, 11 pertained to transfer learning [83], [107], [108], [146], [208], [215], [223], [246], [285], [286], [333], four related to concept grounding [82], [219], [332], [335] and 10 related to zeroshot learning [84], [85], [175], [191], [224], [247], [287], [377], [400], [401]. In hybrid co-learning, two articles were related to bridging [195], [196].…”

Section: G Co-learningmentioning

confidence: 99%

“…Conceptual grounding shares the semantic concept with modalities where most of them are related to linguistics, such as [219], [332], [335]. Zero-shot learning classifies data without having any labels, and approaches such as cross-models [191], [224] and auto-encoders [400], [401] are used for the solution. Bridging is hybrid co-learning where two non-parallel modalities share information.…”

Section: B Inspection Of Rq2mentioning

confidence: 99%

See 2 more Smart Citations

A Systematic Literature Review on Multimodal Machine Learning: Applications, Challenges, Gaps and Future Directions

2023

View full text Add to dashboard Cite

Multimodal machine learning (MML) is a tempting multidisciplinary research area where heterogeneous data from multiple modalities and machine learning (ML) are combined to solve critical problems. Usually, research works use data from a single modality, such as images, audio, text, and signals. However, real-world issues have become critical now, and handling them using multiple modalities of data instead of a single modality can significantly impact finding solutions. ML algorithms play an essential role by tuning parameters in developing MML models. This paper reviews recent advancements in the challenges of MML, namely: representation, translation, alignment, fusion and co-learning, and presents the gaps and challenges. A systematic literature review (SLR) applied to define the progress and trends on those challenges in the MML domain. In total, 1032 articles were examined in this review to extract features like source, domain, application, modality, etc. This research article will help researchers understand the constant state of MML and navigate the selection of future research directions.

show abstract

Section: Inclusion Criteriamentioning

confidence: 99%

Section: G Co-learningmentioning

confidence: 99%

Section: B Inspection Of Rq2mentioning

confidence: 99%

See 1 more Smart Citation

A Systematic Literature Review on Multimodal Machine Learning: Applications, Challenges, Gaps and Future Directions

2023

View full text Add to dashboard Cite

show abstract

“…A novel multimodal variational auto‐encoder (M‐VAE) algorithm for GZSL was proposed in Bendre et al (2021), which uses two modalities, images, and semantic concepts, as the input of the encoder. The encoder maps the input of these two modalities to a unified latent space, and the decoder reconstructs the latent embedding vector into the corresponding visual features.…”

Section: Representative Algorithmsmentioning

confidence: 99%

A review on multimodal zero‐shot learning

Cao

Sun

et al. 2023

WIREs Data Min & Knowl

View full text Add to dashboard Cite

Multimodal learning provides a path to fully utilize all types of information related to the modeling target to provide the model with a global vision. Zero-shot learning (ZSL) is a general solution for incorporating prior knowledge into data-driven models and achieving accurate class identification. The combination of the two, known as multimodal ZSL (MZSL), can fully exploit the advantages of both technologies and is expected to produce models with greater generalization ability. However, the MZSL algorithms and applicationshave not yet been thoroughly investigated and summarized. This study fills this gap by providing an objective overview of MZSL's definition, typical algorithms, representative applications, and critical issues. This article will not only provide researchers in this field with a comprehensive perspective, but it will also highlight several promising research directions.

show abstract

“…They achieve state-of-the-art results for solving particular machine learning problems. It is practically impossible to analyze all of them, but a significant number of them, at one step or another, use the classical concatenation of multimodal vectors (Chen et al, 2020 ; Xie et al, 2020 ; Bendre et al, 2021 ), without a deep examination of unique dependencies between them. Nevertheless, there are other models proposing smarter modality aggregation, such as the Contrastive Multimodal Fusion method (Liu et al, 2021 ), showing there is growing interest in the ML community for nontrivial multimodal fusion.…”

Section: State Of the Artmentioning

confidence: 99%

A Unified Software/Hardware Scalable Architecture for Brain-Inspired Computing Based on Self-Organizing Neural Models

et al. 2022

View full text Add to dashboard Cite

The field of artificial intelligence has significantly advanced over the past decades, inspired by discoveries from the fields of biology and neuroscience. The idea of this work is inspired by the process of self-organization of cortical areas in the human brain from both afferent and lateral/internal connections. In this work, we develop a brain-inspired neural model associating Self-Organizing Maps (SOM) and Hebbian learning in the Reentrant SOM (ReSOM) model. The framework is applied to multimodal classification problems. Compared to existing methods based on unsupervised learning with post-labeling, the model enhances the state-of-the-art results. This work also demonstrates the distributed and scalable nature of the model through both simulation results and hardware execution on a dedicated FPGA-based platform named SCALP (Self-configurable 3D Cellular Adaptive Platform). SCALP boards can be interconnected in a modular way to support the structure of the neural model. Such a unified software and hardware approach enables the processing to be scaled and allows information from several modalities to be merged dynamically. The deployment on hardware boards provides performance results of parallel execution on several devices, with the communication between each board through dedicated serial links. The proposed unified architecture, composed of the ReSOM model and the SCALP hardware platform, demonstrates a significant increase in accuracy thanks to multimodal association, and a good trade-off between latency and power consumption compared to a centralized GPU implementation.

show abstract

Generalized Zero-Shot Learning Using Multimodal Variational Auto-Encoder With Semantic Concepts

Cited by 16 publications

References 13 publications

A Systematic Literature Review on Multimodal Machine Learning: Applications, Challenges, Gaps and Future Directions

A Systematic Literature Review on Multimodal Machine Learning: Applications, Challenges, Gaps and Future Directions

A review on multimodal zero‐shot learning

A Unified Software/Hardware Scalable Architecture for Brain-Inspired Computing Based on Self-Organizing Neural Models

Contact Info

Product

Resources

About