Bidisha Samanta scite author profile

Deep generative models have been praised for their ability to learn smooth latent representation of images, text, and audio, which can then be used to generate new, plausible data. However, current generative models are unable to work with molecular graphs due to their unique characteristics-their underlying structure is not Euclidean or grid-like, they remain isomorphic under permutation of the nodes labels, and they come with a different number of nodes and edges. In this paper, we first propose a novel variational autoencoder for molecular graphs, whose encoder and decoder are specially designed to account for the above properties by means of several technical innovations. Moreover, in contrast with the state of the art, our decoder is able to provide the spatial coordinates of the atoms of the molecules it generates. Then, we develop a gradient-based algorithm to optimize the decoder of our model so that it learns to generate molecules that maximize the value of certain property of interest and, given a molecule of interest, it is able to optimize the spatial configuration of its atoms for greater stability. Experiments reveal that our variational autoencoder can discover plausible, diverse and novel molecules more effectively than several state of the art models. Moreover, for several properties of interest, our optimized decoder is able to identify molecules with property values 121% higher than those identified by several state of the art methods based on Bayesian optimization and reinforcement learning. *

show abstract

A Deep Generative Model for Code Switched Text

Samanta

Reddy

Jagirdar

et al. 2019

View full text Add to dashboard Cite

Code-switching, the interleaving of two or more languages within a sentence or discourse is pervasive in multilingual societies. Accurate language models for code-switched text are critical for NLP tasks. Stateof-the-art data-intensive neural language models are difficult to train well from scarce language-labeled code-switched text. A potential solution is to use deep generative models to synthesize large volumes of realistic code-switched text. Although generative adversarial networks and variational autoencoders can synthesize plausible monolingual text from continuous latent space, they cannot adequately address codeswitched text, owing to their informal style and complex interplay between the constituent languages. We introduce VACS, a novel variational autoencoder architecture specifically tailored to code-switching phenomena. VACS encodes to and decodes from a two-level hierarchical representation, which models syntactic contextual signals in the lower level, and language switching signals in the upper layer. Sampling representations from the prior and decoding them produced well-formed, diverse code-switched sentences. Extensive experiments show that using synthetic code-switched text with natural monolingual data results in significant (33.06%) drop in perplexity.

show abstract

CrysXPP: An explainable property predictor for crystalline materials

et al. 2022

View full text Add to dashboard Cite

We present a deep-learning framework, CrysXPP, to allow rapid and accurate prediction of electronic, magnetic, and elastic properties of a wide range of materials. CrysXPP lowers the need for large property tagged datasets by intelligently designing an autoencoder, CrysAE. The important structural and chemical properties captured by CrysAE from a large amount of available crystal graphs data helped in achieving low prediction errors. Moreover, we design a feature selector that helps to interpret the model’s prediction. Most notably, when given a small amount of experimental data, CrysXPP is consistently able to outperform conventional DFT. A detailed ablation study establishes the importance of different design steps. We release the large pre-trained model CrysAE. We believe by fine-tuning the model with a small amount of property-tagged data, researchers can achieve superior performance on various applications with a restricted data source.

show abstract

All that is English may be Hindi: Enhancing language identification through automatic ranking of the likeliness of word borrowing in social media

Patro¹,

Samanta²,

Singh³

et al. 2017

View full text Add to dashboard Cite

In this paper, we present a set of computational methods to identify the likeliness of a word being borrowed, based on the signals from social media. In terms of Spearman's correlation values, our methods perform more than two times better (∼ 0.62) in predicting the borrowing likeliness compared to the best performing baseline (∼ 0.26) reported in literature. Based on this likeliness estimate we asked annotators to re-annotate the language tags of foreign words in predominantly native contexts. In 88% of cases the annotators felt that the foreign language tag should be replaced by native language tag, thus indicating a huge scope for improvement of automatic language identification systems.

show abstract

STRM: A sister tweet reinforcement process for modeling hashtag popularity

Samanta

Ganguly

2017

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Bidisha Samanta

NeVAE: A Deep Generative Model for Molecular Graphs

A Deep Generative Model for Code Switched Text

CrysXPP: An explainable property predictor for crystalline materials

All that is English may be Hindi: Enhancing language identification through automatic ranking of the likeliness of word borrowing in social media

STRM: A sister tweet reinforcement process for modeling hashtag popularity

Contact Info

Product

Resources

About