Disentangling Identifiable Features from Noisy Data with Structured Nonlinear ICA

Hälvä, Hermanni; Corff, Sylvain Le; Lehéricy, Luc; So, Jonathan; Zhu, Yu-Xian; Gassiat, Élisabeth; Hyvärinen, Aapo

doi:10.48550/arxiv.2106.09620

Cited by 3 publications

(4 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…More recently, Khemakhem et al (2020a) proved a major breakthrough by showing that given side information u, identifiability of the entire generative model is possible up to certain (nonlinear) equivalences. Since this pathbreaking work, many generalizations have been proposed (Hälvä and Hyvarinen, 2020;Hälvä et al, 2021;Khemakhem et al, 2020b;Li et al, 2019;Mita et al, 2021;Sorrenson et al, 2019;Yang et al, 2021;Klindt et al, 2020;Brehmer et al, 2022), all of which require some form of auxiliary information. Other approaches to identifiability include various forms of weak supervision such as contrastive learning (Zimmermann et al, 2021), group-based disentanglement (Locatello et al, 2020), and independent mechanisms (Gresele et al, 2021).…”

Section: Related Workmentioning

confidence: 99%

“…This contrasts a recent line of work that has established fundamental new results regarding the identifiability of VAEs that requires conditioning on an auxiliary variable u that renders each latent dimension conditionally independent (Khemakhem et al, 2020a). While this result has been generalized and relaxed in several directions (Hälvä and Hyvarinen, 2020;Hälvä et al, 2021;Khemakhem et al, 2020b;Li et al, 2019;Mita et al, 2021;Sorrenson et al, 2019;Yang et al, 2021;Klindt et al, 2020;Brehmer et al, 2022), fundamentally these results still crucially rely on the side information u. We show that this is in fact unnecessary-confirming existing empirical studies (e.g Willetts and Paige, 2021;Falck et al, 2021)-and do so without sacrificing any representational capacity.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Identifiability of deep generative models under mixture priors without auxiliary information

Kivva¹,

Rajendran²,

Ravikumar³

et al. 2022

Preprint

View full text Add to dashboard Cite

We prove identifiability of a broad class of deep latent variable models that (a) have universal approximation capabilities and (b) are the decoders of variational autoencoders that are commonly used in practice. Unlike existing work, our analysis does not require weak supervision, auxiliary information, or conditioning in the latent space. Recently, there has been a surge of works studying identifiability of such models. In these works, the main assumption is that along with the data, an auxiliary variable u (also known as side information) is observed as well. At the same time, several works have empirically observed that this doesn't seem to be necessary in practice. In this work, we explain this behavior by showing that for a broad class of generative (i.e. unsupervised) models with universal approximation capabilities, the side information u is not necessary: We prove identifiability of the entire generative model where we do not observe u and only observe the data x. The models we consider are tightly connected with autoencoder architectures used in practice that leverage mixture priors in the latent space and ReLU/leaky-ReLU activations in the encoder. Our main result is an identifiability hierarchy that significantly generalizes previous work and exposes how different assumptions lead to different "strengths" of identifiability. For example, our weakest result establishes (unsupervised) identifiability up to an affine transformation, which already improves existing work. It's well known that these models have universal approximation capabilities and moreover, they have been extensively used in practice to learn representations of data.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Identifiability of deep generative models under mixture priors without auxiliary information

Kivva¹,

Rajendran²,

Ravikumar³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…For time-series data, history information is widely used as side information for nonlinear ICA. However, most existing work that establishes identifiability results considers either stationary independent sources such as PCL (Hyvarinen & Morioka, 2017), SlowVAE (Klindt et al, 2020) or under lin-ear transition assumptions such as SlowVAE (Klindt et al, 2020) and SNICA (Hälvä et al, 2021), or with certain structure such as Markov properties in HM-NLICA (Hälvä & Hyvarinen, 2020). LEAP (Yao et al, 2021), which is the closest work to ours, has established the identifiability of the nonparametric latent temporal processes in certain nonstationary cases, under the condition that the distribution of the noise terms of the latent processes vary across segments.…”

Section: Introductionmentioning

confidence: 99%

Learning Latent Causal Dynamics

Yao¹,

Chen²,

Zhang³

2022

Preprint

View full text Add to dashboard Cite

One critical challenge of time-series modeling is how to learn and quickly correct the model under unknown distribution shifts. In this work, we propose a principled framework, called LiLY, to first recover time-delayed latent causal variables and identify their relations from measured temporal data under different distribution shifts. The correction step is then formulated as learning the low-dimensional change factors with a few samples from the new environment, leveraging the identified causal structure. Specifically, the framework factorizes unknown distribution shifts into transition distribution changes caused by fixed dynamics and time-varying latent causal relations, and by global changes in observation. We establish the identifiability theories of nonparametric latent causal dynamics from their nonlinear mixtures under fixed dynamics and under changes. Through experiments, we show that time-delayed latent causal influences are reliably identified from observed variables under different distribution changes. By exploiting this modular representation of changes, we can efficiently learn to correct the model under unknown distribution shifts with only a few samples.

show abstract

“…These methods typically require auxiliary information or weak supervision for identifiability. Examples of weak supervision include the following strategies: using auxiliary information (Khemakhem et al, 2020;Locatello et al, 2019b); using paired samples (Locatello et al, 2020); using data augmentation (von Kügelgen et al, 2021); and leveraging temporal or spatial dependencies among the samples (Hälvä et al, 2021). In contrast, the Sparse VAE does not need auxiliary information or weak supervision for identifiability; instead, the anchor feature assumption is sufficient.…”

Section: Introductionmentioning

confidence: 99%

Identifiable Deep Generative Models via Sparse Decoding

Moran¹,

Sridhar²,

Wang³

et al. 2021

Preprint

View full text Add to dashboard Cite

We develop the Sparse VAE, a deep generative model for unsupervised representation learning on high-dimensional data. Given a dataset of observations, the Sparse VAE learns a set of latent factors that captures its distribution. The model is sparse in the sense that each feature of the dataset (i.e., each dimension) depends on a small subset of the latent factors. As examples, in ratings data each movie is only described by a few genres; in text data each word is only applicable to a few topics; in genomics, each gene is active in only a few biological processes. We first show that the Sparse VAE is identifiable: given data drawn from the model, there exists a uniquely optimal set of factors. (In contrast, most VAE-based models are not identifiable.) The key assumption behind Sparse-VAE identifiability is the existence of "anchor features", where for each factor there exists a feature that depends only on that factor. Importantly, the anchor features do not need to be known in advance. We then show how to fit the Sparse VAE with variational EM. Finally, we empirically study the Sparse VAE with both simulated and real data. We find that it recovers meaningful latent factors and has smaller heldout reconstruction error than related methods.

show abstract

Disentangling Identifiable Features from Noisy Data with Structured Nonlinear ICA

Cited by 3 publications

References 24 publications

Identifiability of deep generative models under mixture priors without auxiliary information

Identifiability of deep generative models under mixture priors without auxiliary information

Learning Latent Causal Dynamics

Identifiable Deep Generative Models via Sparse Decoding

Contact Info

Product

Resources

About