The
use of machine learning in chemistry has become a common practice.
At the same time, despite the success of modern machine learning methods,
the lack of data limits their use. Using a transfer learning methodology
can help solve this problem. This methodology assumes that a model
built on a sufficient amount of data captures general features of
the chemical compound structure on which it was trained and that the
further reuse of these features on a data set with a lack of data
will greatly improve the quality of the new model. In this paper,
we develop this approach for small organic molecules, implementing
transfer learning with graph convolutional neural networks. The paper
shows a significant improvement in the performance of the models for
target properties with a lack of data. The effects of the data set
composition on the model’s quality and the applicability domain
of the resulting models are also considered.
Quantum chemical calculations combined with QSPR methodology reveal challenging perspectives for the solution of a number of fundamental and applied problems. In this work, we performed the PM7 and DFT calculations and QSPR modeling of HOMO and LUMO energies for polydentate N-heterocyclic ligands promising for the extraction separation of lanthanides because these values are related to the ligands selectivity in the respect to the target cations. Data for QSPR modeling comprised the PM7 calculated HOMO and LUMO energies of N-donor heterocycles, including several types of both known and virtual undescribed polydentate ligands. Ensemble modeling included various molecular fragments as descriptors and different variable selection techniques to build consensus models (CMs) on a training set of 388 ligands using external cross-validation. CMs were then verified to make predictions for two external test sets: 45 ligands (T1) that were similar to the ligands of the training set, and 1546 structures (T2), which were substantially different from the ligands of the training set. The consensus models predict well in 5-fold cross-validation (RMSE =0.097 eV, RMSE =0.064 eV), and on the external test sets (T1: RMSE =0.26 eV, RMSE =0.24 eV; T2: RMSE =0.26 eV, RMSE =0.17 eV). An analysis of the results reveals that substituents in heteroaromatic rings of the ligands and at the amide nitrogens can deeply influence their metal binding properties.
Finding global and local minima on the potential energy surface is a key task for most studies in computational chemistry. Having a set of possible conformations for chemical structures and their corresponding energies, one can judge their chemical activity, understand the mechanisms of reactions, describe the formation of metal‐ligand and ligand‐protein complexes, and so forth. Despite the fact that the interest in various minima search algorithms in computational chemistry arose a while ago (during the formation of this science), new methods are still emerging. These methods allow to perform conformational analysis and geometry optimization faster, more accurately, or for more specific tasks. This article presents the application of a novel global geometry optimization approach based on the Tree Parzen Estimator method. For benchmarking, a database of small organic molecule geometries in the global minimum conformation was created, as well as a software package to perform the tests.
The main advantage of modern natural language processing methods is a possibility to turn an amorphous
human-readable task into a strict mathematic form. That allows to extract chemical data and insights from
articles and to find new semantic relations. We propose a universal engine for processing chemical and
biological texts. We successfully tested it on various use-cases and applied to a case of searching a
therapeutic agent for a COVID-19 disease by analyzing PubMed archive.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.