2022
DOI: 10.1038/s41598-022-09309-3
|View full text |Cite
|
Sign up to set email alerts
|

Studying and mitigating the effects of data drifts on ML model performance at the example of chemical toxicity data

Abstract: Machine learning models are widely applied to predict molecular properties or the biological activity of small molecules on a specific protein. Models can be integrated in a conformal prediction (CP) framework which adds a calibration step to estimate the confidence of the predictions. CP models present the advantage of ensuring a predefined error rate under the assumption that test and calibration set are exchangeable. In cases where the test data have drifted away from the descriptor space of the training da… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 51 publications
1
2
0
Order By: Relevance
“…One of these BM scaffolds was exclusively found in the GRML library (Figure 13b), while the other was found in both the RML and GRML libraries (Figure 13c). This evidence not only confirms Molpher's efficacy in generating active GR scaffolds but also, considering the challenging benchmark imposed by the temporal split [74] , the rediscovery of two active compounds is a noteworthy achievement. Five distinct agonist/GR complexes were used to construct the model: dexamethasone (PDB ID: 1P93 [66] ), pyrazole-based agonist (PDB ID: 3E7C [67] ), indazole-based agonist (PDB ID: 4CSJ [68] ), pyrimidine-based agonist (PDB ID: 6EL7 [69] ), and next-gen indazole-based agonist (PDB ID: 7PRX [70] ).…”
Section: Prospective Validationsupporting
confidence: 62%
“…One of these BM scaffolds was exclusively found in the GRML library (Figure 13b), while the other was found in both the RML and GRML libraries (Figure 13c). This evidence not only confirms Molpher's efficacy in generating active GR scaffolds but also, considering the challenging benchmark imposed by the temporal split [74] , the rediscovery of two active compounds is a noteworthy achievement. Five distinct agonist/GR complexes were used to construct the model: dexamethasone (PDB ID: 1P93 [66] ), pyrazole-based agonist (PDB ID: 3E7C [67] ), indazole-based agonist (PDB ID: 4CSJ [68] ), pyrimidine-based agonist (PDB ID: 6EL7 [69] ), and next-gen indazole-based agonist (PDB ID: 7PRX [70] ).…”
Section: Prospective Validationsupporting
confidence: 62%
“…Recently, AL has played an essential role in predicting the biological and physical activities of small molecules in the fields of biology and chemistry. This includes predicting the structure of proteins, as well as the toxicity of compounds. , However, how does AL deal with data sets that have a small number of labeled elements? Numerous works have been proposed to address this issue, which are outlined below.…”
Section: Methods For Small Molecular Data Challengesmentioning
confidence: 99%
“…This includes predicting the structure of proteins, 436−438 as well as the toxicity of compounds. 439,440 However, how does AL deal with data sets that have a small number of labeled elements? Numerous works have been proposed to address this issue, which are outlined below.…”
Section: Active Learningmentioning
confidence: 99%