Generative models are becoming a tool of choice for exploring the molecular space. These models learn on a large training dataset and produce novel molecular structures with similar properties. Generated structures can be utilized for virtual screening or training semi-supervized predictive models in the downstream tasks. While there are plenty of generative models, it is unclear how to compare and rank them. In this work, we introduce a benchmarking platform called Molecular Sets (MOSES) to standardize training and comparison of molecular generative models. MOSES provides training and testing datasets, and a set of metrics to evaluate the quality and diversity of generated structures. We have implemented and compared several molecular generation models and suggest to use our results as reference points for further advancements in generative chemistry research. The platform and source code are available at https://github.com/molecularsets/moses.
Modern computational approaches and machine learning techniques accelerate the invention of new drugs. Generative models can discover novel molecular structures within hours, while conventional drug discovery pipelines require months of work. In this article, we propose a new generative architecture, entangled conditional adversarial autoencoder, that generates molecular structures based on various properties, such as activity against a specific protein, solubility, or ease of synthesis. We apply the proposed model to generate a novel inhibitor of Janus kinase 3, implicated in rheumatoid arthritis, psoriasis, and vitiligo. The discovered molecule was tested in vitro and showed good activity and selectivity.
Convolutional neural networks (CNN) have been successfully used to handle three-dimensional data and are a natural match for data with spatial structure such as 3D molecular structures. However, a direct 3D representation of a molecule with atoms localized at voxels is too sparse, which leads to poor performance of the CNNs. In this work, we present a novel approach where atoms are extended to fill other nearby voxels with a transformation based on the wave transform. Experimenting on 4.5 million molecules from the Zinc database, we show that our proposed representation leads to better performance of CNN-based autoencoders than either the voxel-based representation or the previously used Gaussian blur of atoms and then successfully apply the new representation to classification tasks such as MACCS fingerprint prediction.
<div>
<div>
<p>The emergence of the 2019 novel coronavirus (COVID-19), for which there is no vaccine or any known effective treatment created a sense of urgency for novel drug discovery approaches. One of the most important COVID-19 protein targets is the 3C-like protease for which the crystal structure is known. Most of the immediate efforts are focused on drug repurposing of known clinically-approved drugs and virtual screening for the molecules available from chemical libraries that may not work well. For example, the IC50 of lopinavir, an HIV protease inhibitor, against the 3C-like protease is approximately 50 micromolar, which is far from ideal. In an attempt to address this challenge, on January 28th, 2020 Insilico Medicine decided to utilize a part of its generative chemistry pipeline to design novel drug-like inhibitors of COVID-19 and started generation on January 30th. It utilized three of its previously validated generative chemistry approaches: crystal-derived pocked-based generator, homology modelling-based generation, and ligand-based generation. Novel druglike compounds generated using these approaches were published at <a href="http://www.insilico.com/ncov-sprint/">www.insilico.com/ncov-sprint/</a>. Several molecules will be synthesized and tested using the internal resources; however, the team is seeking collaborations to synthesize, test, and, if needed, optimize the published molecules. <br></p>
</div>
</div>
<div>
<div>
<p>The emergence of the 2019 novel coronavirus (COVID-19), for which there is no vaccine or any known effective treatment created a sense of urgency for novel drug discovery approaches. One of the most important COVID-19 protein targets is the 3C-like protease for which the crystal structure is known. Most of the immediate efforts are focused on drug repurposing of known clinically-approved drugs and virtual screening for the molecules available from chemical libraries that may not work well. For example, the IC50 of lopinavir, an HIV protease inhibitor, against the 3C-like protease is approximately 50 micromolar, which is far from ideal. In an attempt to address this challenge, on January 28th, 2020 Insilico Medicine decided to utilize a part of its generative chemistry pipeline to design novel drug-like inhibitors of COVID-19 and started generation on January 30th. It utilized three of its previously validated generative chemistry approaches: crystal-derived pocked-based generator, homology modelling-based generation, and ligand-based generation. Novel druglike compounds generated using these approaches were published at <a href="http://www.insilico.com/ncov-sprint/">www.insilico.com/ncov-sprint/</a>. Several molecules will be synthesized and tested using the internal resources; however, the team is seeking collaborations to synthesize, test, and, if needed, optimize the published molecules. <br></p>
</div>
</div>
Deep generative models such as generative adversarial networks, variational autoencoders, and autoregressive models are rapidly growing in popularity for the discovery of new molecules and materials. In this work, we introduce MOlecular SEtS (MOSES), a benchmarking platform to support research on machine learning for drug discovery. MOSES implements several popular molecular generation models and includes a set of metrics that evaluate the diversity and quality of generated molecules. MOSES is meant to standardize the research on molecular generation and facilitate the sharing and comparison of new models. Additionally, we provide a large-scale comparison of existing state of the art models and elaborate on current challenges for generative models that might prove fertile ground for new research. Our platform and source code are freely available at https://github.com/molecularsets/moses.
The application of artificial intelligence (AI) has been considered as a revolutionary change in drug discovery and development. In 2020, AlphaFold computer program predicted protein structures for the whole human...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.