2021
DOI: 10.1038/s41557-021-00716-z
|View full text |Cite
|
Sign up to set email alerts
|

Best practices in machine learning for chemistry

Abstract: Best practices in machine learning for chemistryStatistical tools based on machine learning are becoming integrated into chemistry research workflows. We discuss the elements necessary to train reliable, repeatable and reproducible models, and recommend a set of guidelines for machine learning reports.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
229
0
3

Year Published

2021
2021
2023
2023

Publication Types

Select...
9
1

Relationship

1
9

Authors

Journals

citations
Cited by 317 publications
(262 citation statements)
references
References 33 publications
0
229
0
3
Order By: Relevance
“…Despite most well-performing methods for computing log P N in the SAMPL7 blind challenge belonged to empirical methodologies [ 40 ], it must be kept in mind that it presents important disadvantages regarding strategies based on molecular mechanics and/or quantum chemistry. For instance, have a high dependence on the training set as this limits the coverage of molecules that can be predicted [ 41 ] (e.g., our approach was trained for predicting partition coefficients for drug-like sulfonamides compounds) and to the best of our knowledge, empirical methods are not able to assign a partition coefficient to a specific conformation of the molecule under analysis, these facts limit subsequent applications, e.g., the study of bioactive conformations, that MM and/or QM approaches can face.…”
Section: Resultsmentioning
confidence: 99%
“…Despite most well-performing methods for computing log P N in the SAMPL7 blind challenge belonged to empirical methodologies [ 40 ], it must be kept in mind that it presents important disadvantages regarding strategies based on molecular mechanics and/or quantum chemistry. For instance, have a high dependence on the training set as this limits the coverage of molecules that can be predicted [ 41 ] (e.g., our approach was trained for predicting partition coefficients for drug-like sulfonamides compounds) and to the best of our knowledge, empirical methods are not able to assign a partition coefficient to a specific conformation of the molecule under analysis, these facts limit subsequent applications, e.g., the study of bioactive conformations, that MM and/or QM approaches can face.…”
Section: Resultsmentioning
confidence: 99%
“…and diversity of available data determines the accuracy and generality of trained model. [22] The working of OSCs is very complex and multiple types of materials are used. Therefore, data is scattered and heterogeneous.…”
Section: Chemistry-a European Journalmentioning
confidence: 99%
“…While ML techniques have been gaining popularity, the materials science and chemistry communities have not yet established rigorous quality measures for the publication of ML-based research. We believe that the key to robust and impactful ML work lies in the sharing of models and data as well as in systematic and transparent model validation (Artrith et al, 2021).…”
Section: Figure | (A)mentioning
confidence: 99%