On the Importance of the Kullback-Leibler Divergence Term in Variational Autoencoders for Text Generation

Prokhorov, Victor; Shareghi, Ehsan; Li, Yingzhen; Pilehvar, Mohammad Taher; Collier, Nigel

doi:10.18653/v1/d19-5612

Cited by 22 publications

(14 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The decoding part then strove to reconstruct the ( A i , T i ) from the latent coordinate z i using two parallel networks. The VAE would be trained to minimize the VAE loss, including a reconstruction term and a Kullback−Leibler divergence term ( Prokhorov et al, 2019 ). After that, a Gaussian process regression (GPR) surrogate model was used to predict the fitness function f i of all unsimulated sequences depending on their local positions in the VAE latent space.…”

Section: Application Of ML For Understanding and Design Of Polymer Chainsmentioning

confidence: 99%

Integration of Machine Learning and Coarse-Grained Molecular Simulations for Polymer Materials: Physical Understandings and Molecular Design

2022

View full text Add to dashboard Cite

In recent years, the synthesis of monomer sequence-defined polymers has expanded into broad-spectrum applications in biomedical, chemical, and materials science fields. Pursuing the characterization and inverse design of these polymer systems requires our fundamental understanding not only at the individual monomer level, but also considering the chain scales, such as polymer configuration, self-assembly, and phase separation. However, our accessibility to this field is still rudimentary due to the limitations of traditional design approaches, the complexity of chemical space along with the burdened cost and time issues that prevent us from unveiling the underlying monomer sequence-structure-property relationships. Fortunately, thanks to the recent advancements in molecular dynamics simulations and machine learning (ML) algorithms, the bottlenecks in the tasks of establishing the structure-function correlation of the polymer chains can be overcome. In this review, we will discuss the applications of the integration between ML techniques and coarse-grained molecular dynamics (CGMD) simulations to solve the current issues in polymer science at the chain level. In particular, we focus on the case studies in three important topics—polymeric configuration characterization, feed-forward property prediction, and inverse design—in which CGMD simulations are leveraged to generate training datasets to develop ML-based surrogate models for specific polymer systems and designs. By doing so, this computational hybridization allows us to well establish the monomer sequence-functional behavior relationship of the polymers as well as guide us toward the best polymer chain candidates for the inverse design in undiscovered chemical space with reasonable computational cost and time. Even though there are still limitations and challenges ahead in this field, we finally conclude that this CGMD/ML integration is very promising, not only in the attempt of bridging the monomeric and macroscopic characterizations of polymer materials, but also enabling further tailored designs for sequence-specific polymers with superior properties in many practical applications.

show abstract

Section: Application Of ML For Understanding and Design Of Polymer Chainsmentioning

confidence: 99%

Integration of Machine Learning and Coarse-Grained Molecular Simulations for Polymer Materials: Physical Understandings and Molecular Design

2022

View full text Add to dashboard Cite

show abstract

“…10 https://huggingface.co/transformers/ model_doc/bert.html Coupling Encoder with Decoder. To connect the encoder with the decoder we concatenate the latent variable , sampled from the posterior distribution, to word embeddings of the decoder at each time step (Prokhorov et al, 2019). Also, for GRU encoders we take the last hidden state to parameterise the posterior distribution.…”

Section: Kl-collapsementioning

confidence: 99%

“…In parallel, Variational Autoencoders (VAEs) (Kingma and Welling, 2014) have been effective in capturing semantic closeness of sentences in the learned representation space (Bowman et al, 2016;Prokhorov et al, 2019;Balasubramanian et al, 2020). Furthermore, methods have been developed 2 This, for example, may allow us to cluster sentences' representations not only based on similarity of their active features (as it is the case for dense vectors) but also on active/inactive dimensions.…”

Section: Introductionmentioning

confidence: 99%

Learning Sparse Sentence Encoding without Supervision: An Exploration of Sparsity in Variational Autoencoders

Prokhorov¹,

Li²,

Shareghi³

et al. 2021

Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)

Self Cite

View full text Add to dashboard Cite

It has been long known that sparsity is an effective inductive bias for learning efficient representation of data in vectors with fixed dimensionality, and it has been explored in many areas of representation learning. Of particular interest to this work is the investigation of the sparsity within the VAE framework which has been explored a lot in the image domain, but has been lacking even a basic level of exploration in NLP. Additionally, NLP is also lagging behind in terms of learning sparse representations of large units of text e.g., sentences. We use the VAEs that induce sparse latent representations of large units of text to address the aforementioned shortcomings. First, we move in this direction by measuring the success of unsupervised state-of-the-art (SOTA) and other strong VAE-based sparsification baselines for text and propose a hierarchical sparse VAE model to address the stability issue of SOTA. Then, we look at the implications of sparsity on text classification across 3 datasets, and highlight a link between performance of sparse latent representations on downstream tasks and its ability to encode taskrelated information. 1

show abstract

“…However, this is not so simple because increasing capacity leads to a worse model fit, as was noted by Alemi et al (2018). More specifically, on text data, Prokhorov et al (2019) noted that the coherence of samples decreases as the target rate increases. Pelsmaeker and Aziz (2019) reported similar findings, and also, that more complex priors or posteriors do not help.…”

Section: The Problem With Memorizationmentioning

confidence: 99%

Do sequence-to-sequence VAEs learn global features of sentences?

Bosc

Vincent

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Autoregressive language models are powerful and relatively easy to train. However, these models are usually trained without explicit conditioning labels and do not offer easy ways to control global aspects such as sentiment or topic during generation. Bowman et al. (2016) adapted the Variational Autoencoder (VAE) for natural language with the sequenceto-sequence architecture and claimed that the latent vector was able to capture such global features in an unsupervised manner. We question this claim. We measure which words benefit most from the latent information by decomposing the reconstruction loss per position in the sentence. Using this method, we find that VAEs are prone to memorizing the first words and the sentence length, producing local features of limited usefulness. To alleviate this, we investigate alternative architectures based on bag-of-words assumptions and language model pretraining. These variants learn latent variables that are more global, i.e., more predictive of topic or sentiment labels. Moreover, using reconstructions, we observe that they decrease memorization: the first word and the sentence length are not recovered as accurately than with the baselines, consequently yielding more diverse reconstructions.

show abstract

On the Importance of the Kullback-Leibler Divergence Term in Variational Autoencoders for Text Generation

Cited by 22 publications

References 12 publications

Integration of Machine Learning and Coarse-Grained Molecular Simulations for Polymer Materials: Physical Understandings and Molecular Design

Integration of Machine Learning and Coarse-Grained Molecular Simulations for Polymer Materials: Physical Understandings and Molecular Design

Learning Sparse Sentence Encoding without Supervision: An Exploration of Sparsity in Variational Autoencoders

Do sequence-to-sequence VAEs learn global features of sentences?

Contact Info

Product

Resources

About