2016
DOI: 10.1186/s13321-016-0173-z
|View full text |Cite
|
Sign up to set email alerts
|

Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability

Abstract: BackgroundEven though circular fingerprints have been first introduced more than 50 years ago, they are still widely used for building highly predictive, state-of-the-art (Q)SAR models. Historically, these structural fragments were designed to search large molecular databases. Hence, to derive a compact representation, circular fingerprint fragments are often folded to comparatively short bit-strings. However, folding fingerprints introduces bit collisions, and therefore adds noise to the encoded structural in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 28 publications
(19 citation statements)
references
References 47 publications
(6 reference statements)
0
17
0
Order By: Relevance
“…For the predictive tasks, Morgan fingerprints (FPR) were calculated for the 8314 structures by means of the RDKit library [ 44 ]. Owing to the possibility of colliding bits in fingerprints [ 45 , 46 ], we set the fingerprint vector length to 5120 bits and the radius to 2. In order to foster reproducibility, we made the scripts that are used for data preprocessing and feature engineering available already in our recent work [ 16 ].…”
Section: Methodsmentioning
confidence: 99%
“…For the predictive tasks, Morgan fingerprints (FPR) were calculated for the 8314 structures by means of the RDKit library [ 44 ]. Owing to the possibility of colliding bits in fingerprints [ 45 , 46 ], we set the fingerprint vector length to 5120 bits and the radius to 2. In order to foster reproducibility, we made the scripts that are used for data preprocessing and feature engineering available already in our recent work [ 16 ].…”
Section: Methodsmentioning
confidence: 99%
“…Feature selection before model building can improve ML models, as shown in a study by Kramer and Gutlein 51 . They were also able to detect improvements in random forest models against other ML methods such as SVMs and naive Bayes, with faster performance and fewer features used while training models.…”
Section: Applications In Drug Discoverymentioning
confidence: 99%
“…Of the t-SNE combinations we attempted on our structure and teratology encodings, the t-SNE plot generated with 1,024-dimensional Morgan fingerprints 56 and a binary classification of teratological risk showed the strongest clustering relationships. Clusters were identified by visual inspection, with each point within a cluster representing a drug.…”
Section: Methodsmentioning
confidence: 99%