Gabriel A. Pinheiro scite author profile

Machine learning (ML) models can potentially accelerate the discovery of tailored materials by learning a function that maps chemical compounds into their respective target properties. In this realm, a crucial step is encoding the molecular systems into the ML model, in which the molecular representation plays a crucial role. Most of the representations are based on the use of atomic coordinates (structure); however, it can increase ML training and predictions' computational cost. Herein, we investigate the impact of choosing free-coordinate descriptors based on the Simplified Molecular Input Line Entry System (SMILES) representation, which can substantially reduce the ML predictions' computational cost. Therefore, we evaluate a feed-forward neural network (FNN) model's prediction performance over five feature selection methods and nine ground-state properties (including energetic, electronic, and thermodynamic properties) from a public data set composed of ∼130k organic molecules. Our best results reached a mean absolute error, close to chemical accuracy, of ∼0.05 eV for the atomization energies (internal energy at 0 K, internal energy at 298.15 K, enthalpy at 298.15 K, and free energy at 298.15 K). Moreover, for the atomization energies, the results obtained an out-of-sample error nine times less than the same FNN model trained with the Coulomb matrix, a traditional coordinate-based descriptor. Furthermore, our results showed how limited the model's accuracy is by employing such low computational cost representation that carries less information about the molecular structure than the most state-of-the-art methods.

show abstract

SMICLR: Contrastive Learning on Multiple Molecular Representations for Semisupervised and Unsupervised Representation Learning

Pinheiro

Silva

Quiles

2022

J. Chem. Inf. Model.

View full text Add to dashboard Cite

Machine learning as a tool for chemical space exploration broadens horizons to work with known and unknown molecules. At its core lies molecular representation, an essential key to improve learning about structure–property relationships. Recently, contrastive frameworks have been showing impressive results for representation learning in diverse domains. Therefore, this paper proposes a contrastive framework that embraces multimodal molecular data. Specifically, our approach jointly trains a graph encoder and an encoder for the simplified molecular-input line-entry system (SMILES) string to perform the contrastive learning objective. Since SMILES is the basis of our method, i.e., we built the molecular graph from the SMILES, we call our framework as SMILES Contrastive Learning (SMICLR). When stacking a nonlinear regressor on the SMICLR’s pretrained encoder and fine-tuning the entire model, we reduced the prediction error by, on average, 44% and 25% for the energetic and electronic properties of the QM9 data set, respectively, over the supervised baseline. We further improved our framework’s performance when applying data augmentations in each molecular-input representation. Moreover, SMICLR demonstrated competitive representation learning results in an unsupervised setting.

show abstract

Systematic Investigation of Error Distribution in Machine Learning Algorithms Applied to the Quantum-Chemistry QM9 Data Set Using the Bias and Variance Decomposition

Azevedo

Pinheiro

Quiles

et al. 2021

J. Chem. Inf. Model.

View full text Add to dashboard Cite

Most machine learning applications in quantumchemistry (QC) data sets rely on a single statistical error parameter such as the mean square error (MSE) to evaluate their performance. However, this approach has limitations or can even yield incorrect interpretations. Here, we report a systematic investigation of the two components of the MSE, i.e., the bias and variance, using the QM9 data set. To this end, we experiment with three descriptors, namely (i) symmetry functions (SF, with two-body and three-body functions), (ii) many-body tensor representation (MBTR, with two-and three-body terms), and (iii) smooth overlap of atomic positions (SOAP), to evaluate the prediction process's performance using different numbers of molecules in training samples and the effect of bias and variance on the final MSE. Overall, low sample sizes are related to higher MSE. Moreover, the bias component strongly influences the larger MSEs. Furthermore, there is little agreement among molecules with higher errors (outliers) across different descriptors. However, there is a high prevalence among the outliers intersection set and the convex hull volume of geometric coordinates (VCH). According to the obtained results with the distribution of MSE (and its components bias and variance) and the appearance of outliers, it is suggested to use ensembles of models with a low bias to minimize the MSE, more specifically when using a small number of molecules in the training set. Article pubs.acs.org/jcim

show abstract

Screening of the Role of the Chemical Structure in the Electrochemical Stability Window of Ionic Liquids: DFT Calculations Combined with Data Mining

Moraes¹,

Pinheiro

Lourenço

et al. 2022

J. Chem. Inf. Model.

View full text Add to dashboard Cite

Ionic liquids have attracted the attention of researchers as possible electrolytes for electrochemical energy storage devices. However, their properties, such as the electrochemical stability window (ESW), ionic conductivity, and diffusivity, are influenced both by the chemical structures of cations and anions and by their combinations. Most studies in the literature focus on the understanding of common ionic liquids, and little effort has been made to find ways to improve our atomistic understanding of those systems. The goal of this paper is to explore the structural characteristics of cations and anions that form ionic liquids that can expand the HOMO/LUMO gap, a property directly linked to the ESW of the electrolyte. For that, we design a framework for randomly generating new ions by combining their fragments. Within this framework, we generate about 104 cations and 104 anions and fully optimize their structures using density functional theory. Our calculations show that aromatic cations are less stable ionic liquids than aliphatic ones, an expected result if chemical rationale is used. More importantly, we can improve the gap by adding electron-donating and electron-withdrawing functional groups to the cations and anions, respectively. The increase can be about 2 V, depending on the case. This improvement is reflected in a wider ESW.

show abstract

Modeling Social and Geopolitical Disasters as Extreme Events: A Case Study Considering the Complex Dynamics of International Armed Conflicts

Rosa

Neelakshi

Pinheiro

et al. 2019

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.