The development of functional organic fluorescent materials calls for fast and accurate predictions of photophysical parameters for processes such as high-throughput virtual screening, while the task is challenged by the limitations of quantum mechanical calculations. We establish a database covering >4,300 solvated organic fluorescent dyes and develop new machine learning (ML) approach aimed at efficient and accurate predictions of emission wavelength and photoluminescence quantum yield (PLQY). Our feature engineering has given rise to Functionalized Structure Descriptor (FSD) and Comprehensive General Solvent Descriptor (CGSD), whereby a highly black-box computational framework is realized with consistently good accuracy across different dye families, ability of describing substitution effects and solvent effects, efficiency for large-scale predictions and workability with on-the-fly learning. Evaluations with unseen molecules suggests a remarkable MAE of 0.13 for PLQY and 0.080 eV for emission energy, the latter comparable to time-dependent density functional theory (TD-DFT) calculations. An online prediction platform was constructed based on the ensemble model to make prediction in various solvents (https://www.chemfluor.top/). Our statistical learning methodology will complement quantum mechanical calculations as an efficient alternative approach for the prediction of these parameters. File list (2) download file view on ChemRxiv Manuscript_20201029.pdf (1.05 MiB) download file view on ChemRxiv SupportingInformation_update.pdf (2.71 MiB)
<div> <p>The development of functional organic fluorescent materials calls for fast and accurate predictions of photophysical parameters for processes such as high-throughput virtual screening, while the task is challenged by the limitations of quantum mechanical calculations. We establish a database covering >4,300 solvated organic fluorescent dyes and develop new machine learning (ML) approach aimed at efficient and accurate predictions of emission wavelength and photoluminescence quantum yield (PLQY). Our feature engineering has given rise to Functionalized Structure Descriptor (FSD) and Comprehensive General Solvent Descriptor (CGSD), whereby a highly black-box computational framework is realized with consistently good accuracy across different dye families, ability of describing substitution effects and solvent effects, efficiency for large-scale predictions and workability with on-the-fly learning. Evaluations with unseen molecules suggests a remarkable MAE of 0.13 for PLQY and 0.080 eV for emission energy, the latter comparable to time-dependent density functional theory (TD-DFT) calculations. An online prediction platform was constructed based on the ensemble model to make prediction in various solvents (https://www.chemfluor.top/). Our statistical learning methodology will complement quantum mechanical calculations as an efficient alternative approach for the prediction of these parameters.<br></p> </div><p> <br></p>
The prediction of photophysical parameters is of crucial practical importance for the development of functional organic fluorescent materials, whereas the expense of quantum mechanical calculations and the relatively low universality of QSAR models have challenged the task. New avenues opened up by machine learning (ML), we establish a database of solvated organic fluorescent dyes and develop highly efficient ML models for the predictions of maximum emission/absorption wavelength and photoluminescence quantum yield (PLQY), providing a reliable and efficient potential approach to high-throughput screenings. Various combinations of ML algorithms and molecular fingerprints were investigated. For emission wavelengths, TD-DFT accuracy was achieved under realworld conditions. Reliable identification of strong fluorescent materials was also demonstrated. We show that the easily obtainable fingerprint inputs combined with proper ML algorithms enables efficient re-training based on additional datapoints, whereby systematic improvements of the ML models can be achieved utilizing experimental feedbacks.
<div> <p>The predictions of photophysical parameters are of crucial practical importance for the development of functional organic fluorescent materials, whereas the expense of quantum mechanical calculations and the relatively low universality of QSAR models have challenged the task. New avenues opened up by machine learning (ML), we establish a database of solvated organic fluorescent dyes and develop highly efficient ML models for the predictions of maximum emission/absorption wavelength and photoluminescence quantum yields, providing a reliable and efficient approach to high-throughput screenings. Various combinations of ML algorithms and molecular fingerprints were investigated. For emission wavelengths, TD-DFT accuracy was achieved under real-world conditions. Reliable identification of strong fluorescent materials was also demonstrated. We show that the easily obtainable consensus fingerprint inputs combined with proper ML algorithms enables efficient re-training based on additional datapoints whereby systematic improvements of our ML models can be achieved. </p></div>
The prediction of photophysical parameters is of crucial practical importance for the development of functional organic fluorescent materials, whereas the expense of quantum mechanical calculations and the relatively low universality of QSAR models have challenged the task. New avenues opened up by machine learning (ML), we establish a database of solvated organic fluorescent dyes and develop highly efficient ML models for the predictions of maximum emission/absorption wavelength and photoluminescence quantum yield (PLQY), providing a reliable and efficient potential approach to high-throughput screenings. Various combinations of ML algorithms and molecular fingerprints were investigated. For emission wavelengths, TD-DFT accuracy was achieved under realworld conditions. Reliable identification of strong fluorescent materials was also demonstrated. We show that the easily obtainable fingerprint inputs combined with proper ML algorithms enables efficient re-training based on additional datapoints, whereby systematic improvements of the ML models can be achieved utilizing experimental feedbacks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.