CGRtools is an open-source Python library aimed to handle molecular and reaction information. It is the sole library developed so far which can process condensed graph of reaction (CGR) handling. CGR provides the possibility for advanced operations with reaction information and could be used for reaction descriptor calculation, structure−reactivity modeling, atom-to-atom mapping comparison and correction, reaction center extraction, reaction balancing, and some other related tasks. Unlike other popular libraries, CGRtools is fully written in Python with minor dependencies on other libraries and cross-platform. Reaction, molecule, and CGR objects in CGRtools support native Python methods and are comparable with the help of operations "equal to", "less than", and "bigger than". CGRtools supports common structural formats. CGRtools is distributed via an L-GPL license and available on https://github.com/cimmkzn/CGRtools.
The quality of experimental data for chemical reactions is a critical consideration for any reaction-driven study. However, the curation of reaction data has not been extensively discussed in the literature so far. Here, we suggest a 4 steps protocol that includes the curation of individual structures (reactants and products), chemical transformations, reaction conditions and endpoints. Its implementation in Python3 using CGRTools toolkit has been used to clean three popular reaction databases Reaxys, USPTO and Pistachio. The curated USPTO database is available in the GitHub repository (Laboratoire-de-Chemoinformatique/Reaction_Data_Cleaning).
In this article, we consider cross-validation of the quantitative structure-property relationship models for reactions and show that the conventional k-fold crossvalidation (CV) procedure gives an 'optimistically' biased assessment of prediction performance. To address this issue, we suggest two strategies of model cross-validation, 'transformation-out' CV, and 'solvent-out' CV. Unlike the conventional k-fold cross-validation approach that does not consider the nature of objects, the proposed procedures provide an unbiased estimation of the predictive performance of the models for novel types of structural transformations in chemical reactions and reactions going under new conditions. Both the suggested strategies have been applied to predict the rate constants of bimolecular elimination and nucleophilic substitution reactions, and Diels-Alder cycloaddition. All suggested cross-validation methodologies and tutorial are implemented in the open-source software package CIMtools (https://github.com/cimmkzn/CIMtools).
В статье, применительно к проблеме кибербезопасности, рассматриваются актуальные вопросы геодинамических рисков на примере территории Кипра и прилегающей акватории Средиземного моря. Цель статьианализ геодинамических рисков в указанном регионе, построение адекватных моделей их оценки, а также обоснование индикаторов геодинамической опасности. Построена вероятностная модель, описывающая последовательность геодинамических состояний геологической среды исследуемой территории и удовлетворяющая условиям независимости, однородности и ординарности потоков событий. Дано ее описание в виде дифференциальных уравнений Колмогорова. На основе модели получено эквипотенциальное распределение геодинамического риска, позволяющее обосновывать зоны безопасного размещения инфотелекоммуникационной структуры на острове Кипр и его побережье. Немаловажный прикладной аспект модельных исследований -это поиск геодинамических индикаторов сейсмического риска. Одним из таких индикаторов являются вихревые структуры, образованные векторами горизонтальных напряжений в литосфере Земли. Наибольшая концентрация эпицентров произошедших землетрясений и наибольший вероятностный геодинамический риск приходятся на территорию, где наблюдается взаимное пересечение четырёх вихревых структур в районе юго-западного побережья острова. Важный аспект оценки геодинамического риска связан с задачей поиска индикаторов нефтегазоносных месторождений. Используя цифровую модель литосферы Земли, созданную авторами, установлено, что нефтегазоносные месторождения территориально размещаются на границах левовращающейся вихревой структуры, образованной конкретными физическими характеристиками литосферы. Это подтверждается эмпирическими данными на примере территории Кипра и прилегающей к нему акватории Средиземного моря. Сделан вывод о том, что для уточнения геодинамических рисков потребуются картографические данные о более детальных распределениях движений земной коры, тектонических нарушениях и аномального гравитационного поля.Ключевые слова: кибербезопасность, сейсмическая опасность, Кипр, вероятностная модель, геодинамический риск, индикатор, литосфера Земли, горизонтальные напряжения, нефтегазоносные зоны.
Here, we discuss a reaction standardization protocol followed by a comparison of popular Atom-to-atom mapping (AAM) tools (ChemAxon, Indigo, RDTool, NextMove and RXNMapper) as well as some consensus AAM strategies. For this purpose, a dataset of 1851 manually curated and mapped reactions was prepared (the Golden dataset) and used as a reference set. It has been found that RXNMapper possesses the highest accuracy, despite the fact that it has some clear disadvantages. Finally, RXNMapper was selected as the best tool, and it was applied to map the USPTO dataset. The standardization protocol used to prepare the data, as well as the data itself are available in the GitHub repository https://github.com/Laboratoire-de-Chemoinformatique.<br><br><br><br>
Graph-based architectures are becoming increasingly popular as a tool for structure generation. Here, we introduce novel open-source architecture HyFactor in which, similar to the InChI linear notation, the number of hydrogens attached to the heavy atoms was considered instead of the bond types. HyFactor was benchmarked on the ZINC 250K, MOSES, and ChEMBL data sets against conventional graph-based architecture ReFactor, representing our implementation of the reported DEFactor architecture in the literature. On average, HyFactor models contain some 20% less fitting parameters than those of ReFactor. The two architectures display similar validity, uniqueness, and reconstruction rates. Compared to the training set compounds, HyFactor generates more similar structures than ReFactor. This could be explained by the fact that the latter generates many open-chain analogues of cyclic structures in the training set. It has been demonstrated that the reconstruction error of heavy molecules can be significantly reduced using the data augmentation technique.
Here, we discuss a reaction standardization protocol followed by a comparison of popular Atom-to-atom mapping (AAM) tools (ChemAxon, Indigo, RDTool, NextMove and RXNMapper) as well as some consensus AAM strategies. For this purpose, a dataset of 1851 manually curated and mapped reactions was prepared (the Golden dataset) and used as a reference set. It has been found that RXNMapper possesses the highest accuracy, despite the fact that it has some clear disadvantages. Finally, RXNMapper was selected as the best tool, and it was applied to map the USPTO dataset. The standardization protocol used to prepare the data, as well as the data itself are available in the GitHub repository https://github.com/Laboratoire-de-Chemoinformatique.<br><br><br><br>
Graph-based architectures are becoming increasingly popular as a tool for structure generation. Here, we introduce a novel open-source architecture HyFactor which is inspired by previously reported DEFactor architecture and based on the hydrogen labeled graphs. Since the original DEFactor code was not available, its new implementation (ReFactor) was prepared in this work for the benchmarking purpose. HyFactor demonstrates its high performance on the ZINC 250K MOSES and ChEMBL data set and in molecular generation tasks, it is considerably more effective than ReFactor. The code of HyFactor and all models obtained in this study are publicly available from our GitHub repository: https://github.com/Laboratoire-de-Chemoinformatique/hyfactor
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.