Best practices in machine learning for chemistry

Artrith, Nongnuch; Butler, Keith T.; Coudert, François‐Xavier; Han, Seungwu; Isayev, Olexandr; Jain, Anubhav; Walsh, Aron

doi:10.1038/s41557-021-00716-z

Cited by 317 publications

(262 citation statements)

References 33 publications

Supporting

Mentioning

229

Contrasting

Unclassified

Order By: Relevance

“…Despite most well-performing methods for computing log P N in the SAMPL7 blind challenge belonged to empirical methodologies [ 40 ], it must be kept in mind that it presents important disadvantages regarding strategies based on molecular mechanics and/or quantum chemistry. For instance, have a high dependence on the training set as this limits the coverage of molecules that can be predicted [ 41 ] (e.g., our approach was trained for predicting partition coefficients for drug-like sulfonamides compounds) and to the best of our knowledge, empirical methods are not able to assign a partition coefficient to a specific conformation of the molecule under analysis, these facts limit subsequent applications, e.g., the study of bioactive conformations, that MM and/or QM approaches can face.…”

Section: Resultsmentioning

confidence: 99%

Multiple linear regression models for predicting the n‑octanol/water partition coefficients in the SAMPL7 blind challenge

Lopez

Pinheiro

Zamora

2021

J Comput Aided Mol Des

View full text Add to dashboard Cite

A multiple linear regression model called MLR-3 is used for predicting the experimental n-octanol/water partition coefficient (log P N ) of 22 N-sulfonamides proposed by the organizers of the SAMPL7 blind challenge. The MLR-3 method was trained with 82 molecules including drug-like sulfonamides and small organic molecules, which resembled the main functional groups present in the challenge dataset. Our model, submitted as "TFE-MLR", presented a root-mean-square error of 0.58 and mean absolute error of 0.41 in log P units, accomplishing the highest accuracy, among empirical methods and also in all submissions based on the ranked ones. Overall, the results support the appropriateness of multiple linear regression approach MLR-3 for computing the n-octanol/water partition coefficient in sulfonamide-bearing compounds. In this context, the outstanding performance of empirical methodologies, where 75% of the ranked submissions achieved root-mean-square errors < 1 log P units, support the suitability of these strategies for obtaining accurate and fast predictions of physicochemical properties as partition coefficients of bioorganic compounds.

show abstract

Section: Resultsmentioning

confidence: 99%

Multiple linear regression models for predicting the n‑octanol/water partition coefficients in the SAMPL7 blind challenge

Lopez

Pinheiro

Zamora

2021

J Comput Aided Mol Des

View full text Add to dashboard Cite

show abstract

“…and diversity of available data determines the accuracy and generality of trained model. [22] The working of OSCs is very complex and multiple types of materials are used. Therefore, data is scattered and heterogeneous.…”

Section: Chemistry-a European Journalmentioning

confidence: 99%

Developing Efficient Small Molecule Acceptors with sp²‐Hybridized Nitrogen at Different Positions by Density Functional Theory Calculations, Molecular Dynamics Simulations and Machine Learning

Mahmood

Irfan

Wang

2021

Chemistry A European J

115

View full text Add to dashboard Cite

Chemical structure of small molecule acceptors determines their performance in organic solar cells. Multiscale simulations are necessary to avoid trial-and-error based design, ultimately to save time and resources. In current study, the effect of sp 2 -hybridized nitrogen substitution at the inner or the outmost position of central core, side chain, and terminal group of small molecule acceptors is investigated using multiscale computational modelling. Quantum chemical analysis is used to study the electronic behavior. Nitrogen substitution at end-capping has significantly decreased the electron-reorganization energy. No big change is observed in transfer integral and excited state behavior. However, nitro-gen substitution at terminal group position is good way to improve electron-mobility. Power conversion efficiency (PCE) of newly designed acceptors is predicted using machine learning. Molecular dynamics simulations are also performed to explore the dynamics of acceptor and their blends with PBDB-T polymer donor. Florgy-Huggins parameter is calculated to study the mixing of designed small molecule acceptors with PBDB-T. Radial distribution function has indicated that PBDB-T has a closer packing with N3 and N4. From all analysis, it is found that nitrogen substitution at endcapping group is a better strategy to design efficient small molecule acceptors.

show abstract

“…While ML techniques have been gaining popularity, the materials science and chemistry communities have not yet established rigorous quality measures for the publication of ML-based research. We believe that the key to robust and impactful ML work lies in the sharing of models and data as well as in systematic and transparent model validation (Artrith et al, 2021).…”

Section: Figure | (A)mentioning

confidence: 99%

Accelerated Atomistic Modeling of Solid-State Battery Materials With Machine Learning

et al. 2021

Self Cite

View full text Add to dashboard Cite

Materials for solid-state batteries often exhibit complex chemical compositions, defects, and disorder, making both experimental characterization and direct modeling with first principles methods challenging. Machine learning (ML) has proven versatile for accelerating or circumventing first-principles calculations, thereby facilitating the modeling of materials properties that are otherwise hard to access. ML potentials trained on accurate first principles data enable computationally efficient linear-scaling atomistic simulations with an accuracy close to the reference method. ML-based property-prediction and inverse design techniques are powerful for the computational search for new materials. Here, we give an overview of recent methodological advancements of ML techniques for atomic-scale modeling and materials design. We review applications to materials for solid-state batteries, including electrodes, solid electrolytes, coatings, and the complex interfaces involved.

show abstract

Best practices in machine learning for chemistry

Cited by 317 publications

References 33 publications

Multiple linear regression models for predicting the n‑octanol/water partition coefficients in the SAMPL7 blind challenge

Multiple linear regression models for predicting the n‑octanol/water partition coefficients in the SAMPL7 blind challenge

Developing Efficient Small Molecule Acceptors with sp²‐Hybridized Nitrogen at Different Positions by Density Functional Theory Calculations, Molecular Dynamics Simulations and Machine Learning

Accelerated Atomistic Modeling of Solid-State Battery Materials With Machine Learning

Contact Info

Product

Resources

About

Best practices in machine learning for chemistry

Cited by 317 publications

References 33 publications

Multiple linear regression models for predicting the n‑octanol/water partition coefficients in the SAMPL7 blind challenge

Multiple linear regression models for predicting the n‑octanol/water partition coefficients in the SAMPL7 blind challenge

Developing Efficient Small Molecule Acceptors with sp2‐Hybridized Nitrogen at Different Positions by Density Functional Theory Calculations, Molecular Dynamics Simulations and Machine Learning

Accelerated Atomistic Modeling of Solid-State Battery Materials With Machine Learning

Contact Info

Product

Resources

About

Developing Efficient Small Molecule Acceptors with sp²‐Hybridized Nitrogen at Different Positions by Density Functional Theory Calculations, Molecular Dynamics Simulations and Machine Learning