2020
DOI: 10.1021/acs.jcim.0c00232
|View full text |Cite
|
Sign up to set email alerts
|

Tautomer Standardization in Chemical Databases: Deriving Business Rules from Quantum Chemistry

Abstract: Databases of small, potentially bioactive molecules are ubiquitous across the industry and academia. Designed such that each unique compound should appear only once, the multiplicity of ways in which many compounds can be represented means that these databases require methods for standardizing the representation of chemistry. This is commonly achieved through the use of "Chemistry Business Rules", sets of predefined rules that describe the "house style" of the database in question. At Syngenta, the historical … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 78 publications
0
8
0
Order By: Relevance
“…In general, a standardization rule is a two-step sequence that checks for the presence of a specific feature and then prescribes one or more graph transformations to remedy the potential issue. The standardization process involves an ordered specification of these rules, bearing in mind that some steps are order-dependent and lead to a different outcome if commuted (Additional file 1 : S1) [ 37 ].…”
Section: Methodsmentioning
confidence: 99%
“…In general, a standardization rule is a two-step sequence that checks for the presence of a specific feature and then prescribes one or more graph transformations to remedy the potential issue. The standardization process involves an ordered specification of these rules, bearing in mind that some steps are order-dependent and lead to a different outcome if commuted (Additional file 1 : S1) [ 37 ].…”
Section: Methodsmentioning
confidence: 99%
“…In some cases, a minor tautomer of a ligand binds to a biological target and triggers the biological response. 151 It was demonstrated 39 that accounting for tautomerism may significantly affect the performance of machine learning models for anxiolytic activity, 40 logP, and pKa prediction 152 as well as retrieval information on structure-activity relationships. 153 So far, there are no applications of MIL to model molecular properties QSAR modeling based on conformation ensembles using a multi-instance learning approach using a set of tautomeric forms, but this seems an attractive way to improve the performance of modeling molecular properties dependent on the underlying tautomeric form.…”
Section: Perspectivesmentioning
confidence: 99%
“…Specifically, all compounds considered in this study come from a pool where the following filters were applied: their molecular weight was between 200 and 1000 g mol −1 , their drug likeness (QED) 16 between 0.2 and 0.9 and up to 2 ruleof-five violations. 17 Additionally, all retrieved compounds were checked so that they could successfully be read by the RDKit package, 18 and subsequently standardized, which included removal of salts, tautomer normalization, 19 and atom neutralization via O'Boyle's nocharge code. 20 For the second prelimi-nary study round and subsequent production rounds (see Section 2.2), the NIBR substructure filters were also applied, 21 and compounds with more than 10 rotatable bonds or 3 fused rings were removed, which resulted in a final pool of 1, 831, 052 molecules.…”
Section: Data Retrieval Cleaning and Pair Generationmentioning
confidence: 99%