2019
DOI: 10.3390/molecules24091698
|View full text |Cite
|
Sign up to set email alerts
|

Analysis and Comparison of Vector Space and Metric Space Representations in QSAR Modeling

Abstract: The performance of quantitative structure–activity relationship (QSAR) models largely depends on the relevance of the selected molecular representation used as input data matrices. This work presents a thorough comparative analysis of two main categories of molecular representations (vector space and metric space) for fitting robust machine learning models in QSAR problems. For the assessment of these methods, seven different molecular representations that included RDKit descriptors, five different fingerprint… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
18
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
8

Relationship

1
7

Authors

Journals

citations
Cited by 21 publications
(18 citation statements)
references
References 105 publications
0
18
0
Order By: Relevance
“…Fingerprints are bit-wise strings with zeroes and ones which are folded to a fixed length [ 38 ]. Even though they work well in building QSAR models [ 39 , 40 ], the folding procedure can introduce bit collision [ 40 , 41 ] meaning that different sub-structural fragments can be assigned to the same position in the vector. As we observe this in our own work we followed recommendations to keep a longer vector and shorter radius [ 41 ].…”
Section: Resultsmentioning
confidence: 99%
“…Fingerprints are bit-wise strings with zeroes and ones which are folded to a fixed length [ 38 ]. Even though they work well in building QSAR models [ 39 , 40 ], the folding procedure can introduce bit collision [ 40 , 41 ] meaning that different sub-structural fragments can be assigned to the same position in the vector. As we observe this in our own work we followed recommendations to keep a longer vector and shorter radius [ 41 ].…”
Section: Resultsmentioning
confidence: 99%
“…To reach this goal, the molecular spaces of four data sets, captured as similarity matrices that were computed using NAMS, a graph matching algorithm. In a previous study NAMS-based molecular metric space representation was found a reliable approach to establish molecular similarity-activity relationship in QSAR modeling [38]. Accordingly NAMS-based molecular spaces for the selected data sets were reduced into a new reference space in 2D using four different algorithms.…”
Section: Resultsmentioning
confidence: 99%
“…We have thus integrated the advantages of the following different methods in the proposed molecular space mapping approach:Choice of molecular space representation: Molecular pairwise similarity was quantified using a graph matching algorithm: The Non-contiguous Atom Matching Structural similarity (NAMS) [17], which is a robust metric space representation method. This algorithm has a higher discriminative power for very similar molecules over other structural or graph matching approaches [17, 38]. However, any other similarity computation method can be used.DR methods: We applied four non-linear DR methods including Principal Coordinates Analysis (PCooA) [27], Kruskal Multidimensional Scaling (KMDS) [28], Sammon mapping (SM), [29] and t-Distributed Stochastic Neighbor Embedding (t-SNE) [39].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, QSAR analysis using the deep neural network (DNN) has shown superior prediction performance compared with other conventional machine learning (ML) methods [38][39][40][41][42]. Such high-performance prediction methods may rely on the clear definition of feature representation or selection as it depends on the chemical space [43,44]. For appropriate feature selection or representation, some exclusive procedures based on chemical intuition and observed properties or filtering methods that evaluate features according to a given criterion have been employed [43,45].…”
Section: Introductionmentioning
confidence: 99%