2020
DOI: 10.1101/2020.06.26.172908
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

VAE-Sim: a novel molecular similarity measure based on a variational autoencoder

Abstract: Molecular similarity is an elusive but core ‘unsupervised’ cheminformatics concept, yet different ‘fingerprint’ encodings of molecular structures return very different similarity values even when using the same similarity metric. Each encoding may be of value when applied to other problems with objective or target functions, implying that a priori none is ‘better’ than the others, nor than encoding-free metrics such as maximum common substructure (MCSS). We here introduce a novel approach to molecular similari… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 13 publications
(6 citation statements)
references
References 88 publications
(60 reference statements)
0
6
0
Order By: Relevance
“…The causal relation has the potential for generalizability. It was first defined by Kingma and Welling in 2013 and since then has found wide applications in many research fields. ,,, In the field of new material discovery, the VAE also came into use in some studies. For example, by combining the VAE and nondominated sorting genetic algorithm, Lee et al pinpointed several novel thermomechanically controlled and processed steel alloys, and the predictions agree with the rule-based thermodynamic calculation tool .…”
Section: Methodsmentioning
confidence: 99%
“…The causal relation has the potential for generalizability. It was first defined by Kingma and Welling in 2013 and since then has found wide applications in many research fields. ,,, In the field of new material discovery, the VAE also came into use in some studies. For example, by combining the VAE and nondominated sorting genetic algorithm, Lee et al pinpointed several novel thermomechanically controlled and processed steel alloys, and the predictions agree with the rule-based thermodynamic calculation tool .…”
Section: Methodsmentioning
confidence: 99%
“…Thus, the purpose of the present article is to describe our own implementation of a simple VAE and its use in molecular similarity measurements as applied, in particular, to the set of drugs, metabolites and natural products that we have been using previously [25,[99][100][101][102][103][104] as our benchmark for similarity metrics. A preprint was deposited at bioRxiv [105].…”
Section: Drugmentioning
confidence: 99%
“…This is largely seen to contain the ∼150 000 natural products, ∼150 fluorophores, ∼1100 endogenous human metabolites (Recon2) and 1387 marketed drugs studied previously [25]. Molecules were extracted by the present authors [26] to a latent space of 100 dimensions using methods described in [27] and their vector values in the latent space used as the input to the UMAP algorithm.…”
Section: A Brief History Of Virtual Screening and The Multilayer Percmentioning
confidence: 99%