2017
DOI: 10.26434/chemrxiv.5513581.v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition

Abstract: Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors o… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
38
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 27 publications
(38 citation statements)
references
References 8 publications
0
38
0
Order By: Relevance
“…In this work, a novel approach for QSAR modeling is proposed, in which we adopt the word embedding approach to predict the activity or the propriety of compounds by using their SMILES strings. Compounds are represented by learning features from a large SMILES corpus via Word2vec (Jaeger, Fulle, & Turk, 2018). Then, we describe each chemical using the average of its interacting ligand vectors that are built by word2vec.…”
Section: Methodsmentioning
confidence: 99%
“…In this work, a novel approach for QSAR modeling is proposed, in which we adopt the word embedding approach to predict the activity or the propriety of compounds by using their SMILES strings. Compounds are represented by learning features from a large SMILES corpus via Word2vec (Jaeger, Fulle, & Turk, 2018). Then, we describe each chemical using the average of its interacting ligand vectors that are built by word2vec.…”
Section: Methodsmentioning
confidence: 99%
“…Neural graph fingerprints [69] mimicked Morgan algorithms [70], bringing atom features in radius, while neural graph fingerprints yielded continuous features of hidden layers to solve sparsity of Morgan fingerprints. Mol2vec [71] generates compound features that can be used to predict bioactivities by applying the Word2vec algorithm on a compound's molecular graph. Simplified molecular-input line-entry system (SMILES) is a well-defined representation of chemical compounds, converting the molecular graph to a sequence of atoms and bonds.…”
Section: Ligand-based Approachesmentioning
confidence: 99%
“…Each compound is fed into Multi-Layer Perceptron (MLP) and cosine similarity between their latent representation is translated to a Connectivity Map (CMap) score. Therefore, ReSimNet gives a better performance than the conventional machine learning model of ECFP and Mol2Vec [71]. Conversely, the ensemble model of the hierarchical evolutionary chemical binding similarity (ECBS) tree builds more reliable screening results [80].…”
Section: Ligand-based Approachesmentioning
confidence: 99%
See 1 more Smart Citation
“…As it is sequential and composed of text, methods inspired by NLP such as word embedding 16 , RNN 17,18 have been proposed. Mol2Vec 19 is a molecular representation inspired by word2vec. It overcomes the drawbacks of fingerprint such as bit collisions.…”
Section: Introductionmentioning
confidence: 99%