2023
DOI: 10.26434/chemrxiv-2022-dct7l-v2
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MASSA Algorithm: automated rational sampling of training and test subsets for QSAR modelling

Abstract: QSAR models capable of predicting biological, toxicity, and pharmacokinetic properties were widely used to search lead bioactive molecules in chemical databases. The dataset’s preparation to build these models has a strong influence on the quality of the generated models, and sampling requires that the original dataset be divided into training (for model training) and test (for statistical evaluation) sets. This sampling can be done randomly or rationally, but the rational division is superior. In this paper, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 17 publications
(31 reference statements)
0
2
0
Order By: Relevance
“…The lowest energy conformer for each compound was generated using an OMEGA 3.1.1.2 (OpenEye Scientific Software, USA, 2019) followed by ionization state adjustment at pH 7.4 with FixpKa (QUACPAC 2.0.1.2, OpenEye Scientific Software, USA, 2019), selecting a single favorable ionization state. Finally, the remaining compounds ( n = 97) were split into training and test sets (80% and 20%, respectively) using a preliminary version of the MASSA algorithm implemented on the KNIME platform . The most active (lowest half-maximum inhibitory concentration; IC 50 ) compound of each study, herein referred to as 62 (IC 50 of 18 nM) and 106 (IC 50 of 21 nM), was removed from the test set.…”
Section: Methodsmentioning
confidence: 99%
“…The lowest energy conformer for each compound was generated using an OMEGA 3.1.1.2 (OpenEye Scientific Software, USA, 2019) followed by ionization state adjustment at pH 7.4 with FixpKa (QUACPAC 2.0.1.2, OpenEye Scientific Software, USA, 2019), selecting a single favorable ionization state. Finally, the remaining compounds ( n = 97) were split into training and test sets (80% and 20%, respectively) using a preliminary version of the MASSA algorithm implemented on the KNIME platform . The most active (lowest half-maximum inhibitory concentration; IC 50 ) compound of each study, herein referred to as 62 (IC 50 of 18 nM) and 106 (IC 50 of 21 nM), was removed from the test set.…”
Section: Methodsmentioning
confidence: 99%
“…54 The minimum energy conformer of each compound was identified by conformational analysis in the software OMEGA 55 using the MMFF94 force field. The data set was divided into training and test sets with 80 and 20%, respectively, using a preliminary version of the MASSA algorithm 56 implemented in the KNIME Analytics Platform 57 and its RDKit nodes 58 according to previous works following the protocol described in. 59−61 The experimental enzymatic reactivity was codified as 0 (unreactive) and 1 (reactive).…”
Section: ■ Materials and Methodsmentioning
confidence: 99%