Kirill Karpov scite author profile

The use of machine learning in chemistry has become a common practice. At the same time, despite the success of modern machine learning methods, the lack of data limits their use. Using a transfer learning methodology can help solve this problem. This methodology assumes that a model built on a sufficient amount of data captures general features of the chemical compound structure on which it was trained and that the further reuse of these features on a data set with a lack of data will greatly improve the quality of the new model. In this paper, we develop this approach for small organic molecules, implementing transfer learning with graph convolutional neural networks. The paper shows a significant improvement in the performance of the models for target properties with a lack of data. The effects of the data set composition on the model’s quality and the applicability domain of the resulting models are also considered.

show abstract

Predictive Models for HOMO and LUMO Energies of N‐Donor Heterocycles as Ligands for Lanthanides Separation

Solov'ev

Ustynyuk

Zhokhova

et al. 2018

Molecular Informatics

View full text Add to dashboard Cite

Quantum chemical calculations combined with QSPR methodology reveal challenging perspectives for the solution of a number of fundamental and applied problems. In this work, we performed the PM7 and DFT calculations and QSPR modeling of HOMO and LUMO energies for polydentate N-heterocyclic ligands promising for the extraction separation of lanthanides because these values are related to the ligands selectivity in the respect to the target cations. Data for QSPR modeling comprised the PM7 calculated HOMO and LUMO energies of N-donor heterocycles, including several types of both known and virtual undescribed polydentate ligands. Ensemble modeling included various molecular fragments as descriptors and different variable selection techniques to build consensus models (CMs) on a training set of 388 ligands using external cross-validation. CMs were then verified to make predictions for two external test sets: 45 ligands (T1) that were similar to the ligands of the training set, and 1546 structures (T2), which were substantially different from the ligands of the training set. The consensus models predict well in 5-fold cross-validation (RMSE =0.097 eV, RMSE =0.064 eV), and on the external test sets (T1: RMSE =0.26 eV, RMSE =0.24 eV; T2: RMSE =0.26 eV, RMSE =0.17 eV). An analysis of the results reveals that substituents in heteroaromatic rings of the ligands and at the amide nitrogens can deeply influence their metal binding properties.

show abstract

Tree Parzen estimator for global geometry optimization: A benchmark and database of experimental gas‐phase structures of organic molecules

Andreadi

Zankov²,

Karpov

et al. 2022

J Comput Chem

View full text Add to dashboard Cite

Finding global and local minima on the potential energy surface is a key task for most studies in computational chemistry. Having a set of possible conformations for chemical structures and their corresponding energies, one can judge their chemical activity, understand the mechanisms of reactions, describe the formation of metal‐ligand and ligand‐protein complexes, and so forth. Despite the fact that the interest in various minima search algorithms in computational chemistry arose a while ago (during the formation of this science), new methods are still emerging. These methods allow to perform conformational analysis and geometry optimization faster, more accurately, or for more specific tasks. This article presents the application of a novel global geometry optimization approach based on the Tree Parzen Estimator method. For benchmarking, a database of small organic molecule geometries in the global minimum conformation was created, as well as a software package to perform the tests.

show abstract

Fast Neural Network Engine for Natural Science Language Processing: A Drug-Search Case.

Korolev¹,

Mitrofanov

Karpov³

et al. 2020

Preprint

View full text Add to dashboard Cite

The main advantage of modern natural language processing methods is a possibility to turn an amorphous human-readable task into a strict mathematic form. That allows to extract chemical data and insights from articles and to find new semantic relations. We propose a universal engine for processing chemical and biological texts. We successfully tested it on various use-cases and applied to a case of searching a therapeutic agent for a COVID-19 disease by analyzing PubMed archive.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kirill Karpov

Benzoazacrown compound: a highly effective chelator for therapeutic bismuth radioisotopes

Size Doesn’t Matter: Predicting Physico- or Biochemical Properties Based on Dozens of Molecules

Predictive Models for HOMO and LUMO Energies of N‐Donor Heterocycles as Ligands for Lanthanides Separation

Tree Parzen estimator for global geometry optimization: A benchmark and database of experimental gas‐phase structures of organic molecules

Fast Neural Network Engine for Natural Science Language Processing: A Drug-Search Case.

Contact Info

Product

Resources

About