Smiles-Bert

Wang, Sheng; Guo, Yuzhi; Wang, Yuhong; Sun, Hongmao; Huang, Junzhou

doi:10.1145/3307339.3342186

Cited by 234 publications

(137 citation statements)

References 22 publications

Supporting

Mentioning

103

Contrasting

Order By: Relevance

“…There is a trend towards very large networks that ( perhaps unexpectedly [135]) do not overtrain [55]. The biggest and most successful deep networks, presently GPT-3 [55], use transformer [136] architectures, including in drug discovery [137,138]. The largest flavour of GPT-3 has 96 layers with 12 299 nodes in each.…”

Section: Methods To Improve Generalisationmentioning

confidence: 99%

Deep learning and generative methods in cheminformatics and chemical biology: navigating small molecule space intelligently

Kell

Samanta

Swainston

2020

Biochemical Journal

View full text Add to dashboard Cite

The number of ‘small’ molecules that may be of interest to chemical biologists — chemical space — is enormous, but the fraction that have ever been made is tiny. Most strategies are discriminative, i.e. have involved ‘forward’ problems (have molecule, establish properties). However, we normally wish to solve the much harder generative or inverse problem (describe desired properties, find molecule). ‘Deep’ (machine) learning based on large-scale neural networks underpins technologies such as computer vision, natural language processing, driverless cars, and world-leading performance in games such as Go; it can also be applied to the solution of inverse problems in chemical biology. In particular, recent developments in deep learning admit the in silico generation of candidate molecular structures and the prediction of their properties, thereby allowing one to navigate (bio)chemical space intelligently. These methods are revolutionary but require an understanding of both (bio)chemistry and computer science to be exploited to best advantage. We give a high-level (non-mathematical) background to the deep learning revolution, and set out the crucial issue for chemical biology and informatics as a two-way mapping from the discrete nature of individual molecules to the continuous but high-dimensional latent representation that may best reflect chemical space. A variety of architectures can do this; we focus on a particular type known as variational autoencoders. We then provide some examples of recent successes of these kinds of approach, and a look towards the future.

show abstract

Section: Methods To Improve Generalisationmentioning

confidence: 99%

Deep learning and generative methods in cheminformatics and chemical biology: navigating small molecule space intelligently

Kell

Samanta

Swainston

2020

Biochemical Journal

View full text Add to dashboard Cite

show abstract

“…Following the great success of transformer in computer vision and natural language processing domains, several transformer based models have been proposed for efficient chemical representations. Leveraging the capability of transformer as an encoder, it is usually pre-trained on massive unlabeled chemical compounds either in the form of SMILES or molecular graph, which leads to outstanding performances in downstream tasks such as absorption, distribution, and toxicity prediction [101] , [123] , [138] , [139] , [140] . The crucial point of the chemical transformer is fully exploit atom interactions and chemical structure information through self-attention mechanisms.…”

Section: Deep Learning Technologies: How Well Can We Accomplish the T...mentioning

confidence: 99%

On modeling and utilizing chemical compound information with deep learning technologies: A task-oriented approach

Lim¹,

Lee²,

Piao³

et al. 2022

Computational and Structural Biotechnology Journal

View full text Add to dashboard Cite

“…[29,139].). Quickly, the basic idea of Word2Vec [140], i.e., learning generic embeddings from large corpora to facilitate downstream prediction tasks, was borrowed by Winter et al [141] and later by SMILES-BERT [142] and SMILES Transformer [143]. Without the notion of pretraining, related ideas on obtaining better SMILES embeddings were explored in SMILES2Vec [144] and SMILES-X [145].…”

Section: Molecular Representationsmentioning

confidence: 99%

Trends in Deep Learning for Property-driven Drug Design

Born

Manica

2021

CMC

View full text Add to dashboard Cite

: It is more pressing than ever to reduce the time and costs for developing lead compounds in the pharmaceutical industry. The co-occurrence of advances in high-throughput screening and the rise of deep learning (DL) have enabled the development of large-scale multimodal predictive models for virtual drug screening. Recently, deep generative models have emerged as a powerful tool for exploring the chemical space and raising hopes to expedite the drug discovery process. Following this progress in chemocentric approaches for generative chemistry, the next challenge is to build multimodal conditional generative models that leverage disparate knowledge sources when biochemical mapping properties to target structures. Here, we call the community to bridge drug discovery more closely with systems biology when designing deep generative models. Complementing the plethora of reviews on the role of DL in chemoinformatics, we herein specifically focus on the interface of predictive and generative modeling for drug discovery. Through a systematic publication keyword search on PubMed and a selection of preprint servers (arXiv, biorXiv, chemRxiv, and medRxiv), we quantify trends in the field and find that molecular graphs and VAEs have become the most widely adopted molecular representations and architectures in generative models, respectively. We discuss progress on DL for toxicity, drug-target affinity, and drug sensitivity prediction and specifically focus on conditional molecular generative models that encompass multimodal prediction models. Moreover, we outline prospects in the field and identify challenges such as the integration of deep learning systems into experimental workflows in a closed-loop manner or the adoption of federated machine learning techniques to overcome data sharing barriers. Other challenges include, but are not limited to interpretability in generative models, more sophisticated metrics for the evaluation of molecular generative models, and, following up on that, community-accepted benchmarks for both multimodal drug property prediction and property-driven molecular design.

show abstract

Smiles-Bert

Cited by 234 publications

References 22 publications

Deep learning and generative methods in cheminformatics and chemical biology: navigating small molecule space intelligently

Deep learning and generative methods in cheminformatics and chemical biology: navigating small molecule space intelligently

On modeling and utilizing chemical compound information with deep learning technologies: A task-oriented approach

Trends in Deep Learning for Property-driven Drug Design

Contact Info

Product

Resources

About