Kai Zhang scite author profile

Kai Zhang

4Publications

30Citation Statements Received

406Citation Statements Given

How they've been cited

How they cite others

240

398

Affiliations

Case Western Reserve University

Publications

Order By: Most citations

Predicting Solute Descriptors for Organic Chemicals by a Deep Neural Network (DNN) Using Basic Chemical Structures and a Surrogate Metric

Zhang

2022

Environ. Sci. Technol.

View full text Add to dashboard Cite

Solute descriptors have been widely used to model chemical transfer processes through poly-parameter linear free energy relationships (pp-LFERs); however, there are still substantial difficulties in obtaining these descriptors accurately and quickly for new organic chemicals. In this research, models (PaDEL-DNN) that require only SMILES of chemicals were built to satisfactorily estimate pp-LFER descriptors using deep neural networks (DNN) and the PaDEL chemical representation. The PaDEL-DNN-estimated pp-LFER descriptors demonstrated good performance in modeling storage-lipid/water partitioning coefficient (log K storage-lipid/water ), bioconcentration factor (BCF), aqueous solubility (ESOL), and hydration free energy (freesolve). Then, assuming that the accuracy in the estimated values of widely available properties, e.g., logP (octanol−water partition coefficient), can calibrate estimates for less available but related properties, we proposed logP as a surrogate metric for evaluating the overall accuracy of the estimated pp-LFER descriptors. When using the pp-LFER descriptors to model log K storage-lipid/water , BCF, ESOL, and freesolve, we achieved around 0.1 log unit lower errors for chemicals whose estimated pp-LFER descriptors were deemed "accurate" by the surrogate metric. The interpretation of the PaDEL-DNN models revealed that, for a given test chemical, having several (around 5) "similar" chemicals in the training data set was crucial for accurate estimation while the remaining less similar training chemicals provided reasonable baseline estimates. Lastly, pp-LFER descriptors for over 2800 persistent, bioaccumulative, and toxic chemicals were reasonably estimated by combining PaDEL-DNN with the surrogate metric. Overall, the PaDEL-DNN/surrogate metric and newly estimated descriptors will greatly benefit chemical transfer modeling.

show abstract

Machine Learning Modeling of Environmentally Relevant Chemical Reactions for Organic Compounds

Zhang

2022

ACS EST Water

View full text Add to dashboard Cite

Environmental chemical reactions have been frequently investigated for various purposes; however, it remains challenging to accurately model either the reaction kinetics or reaction pathways. Existing studies mostly model reaction kinetics with traditional quantitative structure−activity relationships (QSARs) or reaction pathways with reaction template methods; however, these approaches generally require extensive feature engineering or manual extraction of reaction templates. Recently, machine learning (ML) has become a promising tool for modeling chemical reactions as ML models can perform well and are powerful in using diverse chemical representations. This Review starts with a concise comparison of traditional and ML modeling approaches for chemical reactions, followed by a brief discussion of the status of and future needs in modeling environmental organic reactions. Data collection and data cleaning techniques for reaction kinetics and pathways are then discussed. We then summarize the advantages and limitations of commonly used chemical representations and feature selection techniques. Next, we critically review general ML model evaluation and interpretation processes and propose a three-step evaluation process, that is, comparisons with general metrics, baseline models, and existing models. Lastly, we explore ML modeling approaches for small data sets, including transfer learning and active learning, which have been successfully employed in many other fields, for future modeling of environmental chemical reactions.

show abstract

Short-term Lake Erie algal bloom prediction by classification and regression models

Zhang

Sun

et al. 2023

Water Research

View full text Add to dashboard Cite

Abiotic Reduction of Organic and Inorganic Compounds by Fe(II)-Associated Reductants: Comprehensive Data Sets and Machine Learning Modeling

Gao

Zhong

Zhang

et al. 2023

Environ. Sci. Technol.

View full text Add to dashboard Cite

Iron-associated reductants play a crucial role in providing electrons for various reductive transformations. However, developing reliable predictive tools for estimating abiotic reduction rate constants (logk) in such systems has been impeded by the intricate nature of these systems. Our recent study developed a machine learning (ML) model based on 60 organic compounds toward one soluble Fe(II)-reductant. In this study, we built a comprehensive kinetic data set covering the reactivity of 117 organic and 10 inorganic compounds toward four major types of Fe(II)-associated reductants. Separate ML models were developed for organic and inorganic compounds, and the feature importance analysis demonstrated the significance of resonance structures, reducible functional groups, reductant descriptors, and pH in logk prediction. Mechanistic interpretation validated that the models accurately learned the impact of various factors such as aromatic substituents, complexation, bond dissociation energy, reduction potential, LUMO energy, and dominant reductant species. Finally, we found that 38% of the 850,000 compounds in the Distributed Structure-Searchable Toxicity (DSSTox) database contain at least one reducible functional group, and the logk of 285,184 compounds could be reasonably predicted using our model. Overall, the study is a significant step toward reliable predictive tools for anticipating abiotic reduction rate constants in iron-associated reductant systems.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kai Zhang

Predicting Solute Descriptors for Organic Chemicals by a Deep Neural Network (DNN) Using Basic Chemical Structures and a Surrogate Metric

Machine Learning Modeling of Environmentally Relevant Chemical Reactions for Organic Compounds

Short-term Lake Erie algal bloom prediction by classification and regression models

Abiotic Reduction of Organic and Inorganic Compounds by Fe(II)-Associated Reductants: Comprehensive Data Sets and Machine Learning Modeling

Contact Info

Product

Resources

About