Deep Mutual Information Maximin for Cross-Modal Clustering

Mao, Yiqiao; Yan, Xiaoqiang; Guo, Qiang; Ye, Yangdong

doi:10.1609/aaai.v35i10.17076

Cited by 33 publications

(16 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In recent years, deep learning architectures have seen widespread adoption in MVC, resulting in the deep MVC subfield. Methods developed within this subfield have shown state-of-the-art clustering performance on several multi-view datasets [1][2][3][4][5][6], largely outperforming traditional, non-deeplearning-based methods [1]. Despite these promising developments, we identify significant drawbacks with the current state of the field.…”

Section: Introductionmentioning

confidence: 91%

“…Despite these promising developments, we identify significant drawbacks with the current state of the field. Selfsupervised learning (SSL) is a crucial component in many recent methods for deep MVC [1][2][3][4][5][6]. However, the large number of methods, all with unique components and arguments about how they work, makes it challenging to identify clear directions and trends in the development of new components and methods.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Reconsidering Representation Alignment for Multi-view Clustering

Trosten

Løkse

Jenssen

et al. 2021

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

Self-supervised learning is a central component in recent approaches to deep multi-view clustering (MVC). However, we find large variations in the development of selfsupervision-based methods for deep MVC, potentially slowing the progress of the field. To address this, we present Deep-MVC, a unified framework for deep MVC that includes many recent methods as instances. We leverage our framework to make key observations about the effect of self-supervision, and in particular, drawbacks of aligning representations with contrastive learning. Further, we prove that contrastive alignment can negatively influence cluster separability, and that this effect becomes worse when the number of views increases. Motivated by our findings, we develop several new DeepMVC instances with new forms of self-supervision. We conduct extensive experiments and find that (i) in line with our theoretical findings, contrastive alignments decreases performance on datasets with many views; (ii) all methods benefit from some form of self-supervision; and (iii) our new instances outperform previous methods on several datasets. Based on our results, we suggest several promising directions for future research. To enhance the openness of the field, we provide an open-source implementation of Deep-MVC, including recent models and our new instances. Our implementation includes a consistent evaluation protocol, facilitating fair and accurate evaluation of methods and components 1 .* UiT Machine Learning group (machine-learning.uit.no) and Visual Intelligence Centre (visual-intelligence.no).† Norwegian Computing Center (nr.no).

show abstract

Section: Introductionmentioning

confidence: 91%

Section: Introductionmentioning

confidence: 99%

Reconsidering Representation Alignment for Multi-view Clustering

Trosten

Løkse

Jenssen

et al. 2021

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

show abstract

“…MI is useful in cross-modality data processing tasks as the statistical features are assumed to be identical. It has been applied to tackle many unsupervised learning problems such as cross-modality data retrieval [25], data representations [22,74,17], domain adaptation [42], and cross-modal clustering [40] etc.. A particular case of MI is using MI for measuring a random variable itself: M I(X, X), which is called Entropy.…”

Section: Mutual Informationmentioning

confidence: 99%

Zero-shot-Learning Cross-Modality Data Translation Through Mutual Information Guided Stochastic Diffusion

Wang¹,

Yingyu²,

Sermesant³

et al. 2023

Preprint

View full text Add to dashboard Cite

Cross-modality data translation has attracted great interest in image computing. Deep generative models (e.g., GANs) show performance improvement in tackling those problems. Nevertheless, as a fundamental challenge in image translation, the problem of Zero-shot-Learning Cross-Modality Data Translation with fidelity remains unanswered. This paper proposes a new unsupervised zero-shot-learning method named Mutual Information guided Diffusion cross-modality data translation Model (MIDiffusion), which learns to translate the unseen source data to the target domain. The MIDiffusion leverages a score-matching-based generative model, which learns the prior knowledge in the target domain. We propose a differentiable local-wise-MI-Layer (LM I) for conditioning the iterative denoising sampling. The LM I captures the identical cross-modality features in the statistical domain for the diffusion guidance; thus, our method does not require retraining when the source domain is changed, as it does not rely on any direct mapping between the source and target domains. This advantage is critical for applying cross-modality data translation methods in practice, as a reasonable amount of source domain dataset is not always available for supervised training. We empirically show the advanced performance of MIDiffusion in comparison with an influential group of generative models, including adversarial-based and other scorematching-based models.

show abstract

“…Wang et al [30] utilized the subgraph-level summary to build an effective mutual information estimator, which was optimized to strengthen the robustness of graph representation. Mao et al [31] explored the shared information across modalities via maximizing the mutual information between them. Schnapp et al [32] selected important features with minimum mutual information with labels.…”

Section: B Representation Learning Based On Mutual Informationmentioning

confidence: 99%