A systematic comparison is demonstrated for the predictions of frontier orbital energies�highest occupied molecular orbital (HOMO) (E H ), lowest unoccupied molecular orbital (LUMO) (E L ), and energy gap (ΔE HL ) of the molecules in the QM9 dataset, where it contains 120k-plus three-dimensional organic molecule structures determined by first-principles simulations. The target molecular properties (E H , E L , and ΔE HL ) are predicted using linear regression (LR), machine learning (random forest, RF), and continuous-filter convolutional neural network (SchNET) approaches. LR and RF models built upon various knowledge-based descriptors, being derived from SMILES of the molecules, can provide predictivity of the target properties with the mean absolute errors (MAEs) 4−6 times the chemical accuracy (0.043 eV). The best approach, SchNET, using the graph representation derived from molecular Cartesian coordinates, is confirmed to provide MAEs of E H , E L , and ΔE HL at 0.051, 0.041, and 0.076 eV, respectively. With the introduction of bond-step matrix representation with the SchNET model, the computational cost of dataset preparation can be substantially reduced, and the corresponding MAEs increase moderately to 2−3 times the chemical accuracy. The chemical interpretation of the important descriptors identified in the LR and RF models appears to align with the chemical knowledge of describing these molecular electronic properties but is accompanied with tolerable prediction errors. The combination of bond-step representation and the SchNET model can provide an assessable and balanced option for the highthroughput screening of organic molecules and the development of the data science approach.
An assessment of modifying the SchNET model for the predictions of experimental molecular photophysical properties, including absorption energy (ΔE abs), emission energy (ΔE emi), and photoluminescence quantum yield (PLQY), was reported. The solution environment was properly introduced outside the interaction layers of SchNET for not overly amplifying the solute–solvent interactions, particularly being supported by the changes of prediction errors between the presence and absence of the solvent effect. Two featurization schemes under the framework of the Schnet-bondstep approach, with featuring the concepts of reduced-atomic-number and reduced-atomic-neighbor, were demonstrated. These featurized models can consequently provide fine predictions for ΔE abs and ΔE emi with errors less than 0.1 eV. The corresponding predictions of PLQY were shown to be comparable to the previous graph convolution network model.
A systematic comparison is demonstrated for the predictions of frontier orbital energies – HOMO (EH), LUMO (EL), and energy gap (ΔEHL) of the molecules in QM9 dataset, where it contains 120k-plus three-dimensional organic molecule structures determined by first-principle simulations. The target molecular properties (EH, EL, and ΔEHL) are predicted using the linear regression (LR), machine learning (random forest, RF), and continuous-filter convolutional neural network (SchNET) approaches. LR and RF models built upon various knowledge-based descriptors, being derived from SMILES of the molecules, can provide predictivity of the target properties with the mean-absolute-errors (MAEs) at 4-6 times of chemical accuracy (0.043 eV). The best approach – SchNET, using the graph representation derived from molecular Cartesian coordinates, is confirmed to provide MAEs of EH, EL, and ΔEHL at 0.051, 0.041, and 0.076 eV, respectively. With the introduction of bond-step matrix representation with SchNET model, the computational cost of dataset preparation can be substantially reduced, and the corresponding MAEs increases moderately to 2-3 times of chemical accuracy. The chemical interpretation of the important descriptors identified in the LR and RF models appear to align with the chemical knowledge of describing these molecular electronic properties, however, being accompanied with tolerable prediction errors. The combination of bond-step representation and SchNET model can provide an assessable-and-balanced option for the high-throughput screening of organic molecules and the preparation of data science approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.