How to Sense the World: Leveraging Hierarchy in Multimodal Perception for Robust Reinforcement Learning Agents

Vasco, Miguel; Yin, H.; Melo, Francisco S.; Paiva, Ana

doi:10.48550/arxiv.2110.03608

Cited by 2 publications

(5 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another approach, the Multimodal Factorization Model (MFM) 43 , proposes the factorization of the multimodal representation into separate, independent representations. Vasco et al 44 proposed a hierarchical design, called MUSE, to learn a hierarchical multimodal representation, beginning with low-level modality-specific representations from raw observation data and ending with a high-level multimodal representation encoding joint-modality information.…”

Section: Multi-modal Perception Learningmentioning

confidence: 99%

Optimizing Learning Across Multimodal Transfer Features for Modelling Olfactory Perception

Shin,

Pei,

Kumari

et al. 2024

Preprint

View full text Add to dashboard Cite

For humans and other animals, the sense of smell provides crucial information in many situations of everyday life. Still, the study of olfactory perception has received only limited attention outside of the biological sciences. From an AI perspective, the complexity of the interactions between olfactory receptors and volatile molecules and the scarcity of comprehensive olfactory datasets present unique challenges in this sensory domain. Previous works have explored the relationship between molecular structure and odor descriptors using fully supervised training approaches. However, these methods are data-intensive and poorly generalized due to labeled data scarcity, particularly for rare-class samples. Our study partially tackles the challenges of data scarcity and label skewness through multimodal transfer learning. We investigate the potential of large molecular foundation models trained on extensive unlabeled molecular data to effectively model olfactory perception. Additionally, we explore the integration of different molecular representations, including molecular graphs and text-based SMILES encodings, to achieve data efficiency and generalization of the learned model, particularly on sparsely represented classes. By leveraging complementary representations, we aim to learn robust perceptual features of odorants. However, we observe that traditional methods of combining modalities do not yield substantial gains in high-dimensional skewed label spaces. To address this challenge, we introduce a novel \emph{label-balancer} technique specifically designed for high-dimensional multi-label and multi-modal training. The label-balancer technique distributes learning objectives across modalities to optimize collaboratively for distinct subsets of labels. Our results suggest that multi-modal transfer features learned using the label-balancer technique are more effective and robust, surpassing the capabilities of traditional uni- or multi-modal approaches, particularly on rare-class samples.

show abstract

Section: Multi-modal Perception Learningmentioning

confidence: 99%

Optimizing Learning Across Multimodal Transfer Features for Modelling Olfactory Perception

Shin,

Pei,

Kumari

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

“…Prior work has demonstrated success in using complete representations z 1:M in a diverse set of applications, such as image generation (Wu & Goodman, 2018;Shi et al, 2019) and control of Atari games (Silva et al, 2019;Vasco et al, 2021). Intuitively, if complete representations z 1:M are sufficient to perform a downstream task then learning modality-specific representations z m that are geometrically aligned with z 1:M in the same representation space should ensure that {z m } contain necessary information to perform the task even when z 1:M cannot be provided.…”

Section: The Problem Of Geometric Misalignment In Multimodal Represen...mentioning

confidence: 99%

“…Recently, hierarchical multimodal VAEs have been proposed to facilitate the learning of aligned multimodal representations such as Nexus (Vasco et al, 2022) and Multimodal Sensing (MUSE) (Vasco et al, 2021). Nexus considers a two-level hierarchy of modality-specific and multimodal representation spaces employing a dropout-based training scheme.…”

Section: Related Workmentioning

confidence: 99%

“…Models We consider the MVAE (Wu & Goodman, 2018) and the MUSE (Vasco et al, 2021) models which are two commonly used approaches for the perception of multimodal RL agents. For GMC, we employ the same modalityspecific encoders f 1 (•), f 2 (•) as the baselines in addition to a joint-modality encoder f 1:2 (•).…”

Section: Experiments 3: Reinforcement Learningmentioning

confidence: 99%

“…Naturally, the performance of machine learning models can be enhanced by leveraging the redundant and complementary information provided by multiple modalities (Baltrušaitis et al, 2018). In particular, exploiting such multimodal information has been shown to be successful in tasks such as classification (Tsai et al, 2018;, generation (Wu & Goodman, 2018;Shi et al, 2019) and control (Silva et al, 2019;Vasco et al, 2021). The advances of many of these methods can be attributed to the efficient learning of multimodal data representations, which reduces the inherent complexity of raw multimodal data and enables the extraction of the underlying semantic correlations among the different modalities (Baltrušaitis et al, 2018;Guo et al, 2019).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Geometric Multimodal Contrastive Representation Learning

Poklukar¹,

Vasco²,

Yin³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Learning representations of multimodal data that are both informative and robust to missing modalities at test time remains a challenging problem due to the inherent heterogeneity of data obtained from different channels. To address it, we present a novel Geometric Multimodal Contrastive (GMC) representation learning method comprised of two main components: i) a twolevel architecture consisting of modality-specific base encoder, allowing to process an arbitrary number of modalities to an intermediate representation of fixed dimensionality, and a shared projection head, mapping the intermediate representations to a latent representation space; ii) a multimodal contrastive loss function that encourages the geometric alignment of the learned representations. We experimentally demonstrate that GMC representations are semantically rich and achieve state-of-the-art performance with missing modality information on three different learning problems including prediction and reinforcement learning tasks.

show abstract

How to Sense the World: Leveraging Hierarchy in Multimodal Perception for Robust Reinforcement Learning Agents

Cited by 2 publications

References 0 publications

Optimizing Learning Across Multimodal Transfer Features for Modelling Olfactory Perception

Optimizing Learning Across Multimodal Transfer Features for Modelling Olfactory Perception

Geometric Multimodal Contrastive Representation Learning

Contact Info

Product

Resources

About