2021
DOI: 10.1007/978-3-030-71278-5_33
|View full text |Cite
|
Sign up to set email alerts
|

Self-supervised Disentanglement of Modality-Specific and Shared Factors Improves Multimodal Generative Models

Abstract: Multimodal generative models learn a joint distribution over multiple modalities and thus have the potential to learn richer representations than unimodal models. However, current approaches are either inefficient in dealing with more than two modalities or fail to capture both modality-specific and shared variations. We introduce a new multimodal generative model that integrates both modality-specific and shared factors and aggregates shared information across any subset of modalities efficiently. Our method … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 14 publications
(23 citation statements)
references
References 13 publications
(5 reference statements)
0
23
0
Order By: Relevance
“…Furthermore, the stability of their learning remains a challenge, such as mode collapse, which is confirmed to be more serious in a multimodal setting [113]. Therefore, VAEs have become the mainstream in multimodal deep generative models, where GANs are sometimes used to improve the quality of the generation of multimodal VAEs [92,113] or to implement divergence between distributions in VAEs [15]. Table 1.…”
Section: Advantages Of Vaes As Multimodal Generative Modelsmentioning
confidence: 99%
See 3 more Smart Citations
“…Furthermore, the stability of their learning remains a challenge, such as mode collapse, which is confirmed to be more serious in a multimodal setting [113]. Therefore, VAEs have become the mainstream in multimodal deep generative models, where GANs are sometimes used to improve the quality of the generation of multimodal VAEs [92,113] or to implement divergence between distributions in VAEs [15]. Table 1.…”
Section: Advantages Of Vaes As Multimodal Generative Modelsmentioning
confidence: 99%
“…× × (memory cost) MVAE [112] ( ) (require sub-sampling) × MMVAE [79] × (MoE) × mmJSD [88] × (MoE) × mmJSD (MS) [88] × (MoE) AVAE [118] × × MoPoE-VAE [89] × × (computational cost) PVAE [33] × (memory cost) DMVAE [48] DMVAE [15] MFM [99] × (memory cost) [78] × (MoE) ×…”
Section: Modelsmentioning
confidence: 99%
See 2 more Smart Citations
“…Similar to previous work, we have only considered models with simple priors, such as Gauss and Laplace distributions with independent dimensions. Further, we have not considered models with modality-specific latent spaces, which seem to yield better empirical results (Hsu and Glass, 2018;Sutter et al, 2020;Daunhawer et al, 2020), but currently lack theoretical grounding. Modality-specific latent spaces offer a potential solution to the problem of cross-modal prediction by providing modality-specific context from the target modalities to each decoder.…”
Section: Discussionmentioning
confidence: 99%