For a number of years MDL products have exposed both 166 bit and 960 bit keysets based on 2D descriptors. These keysets were originally constructed and optimized for substructure searching. We report on improvements in the performance of MDL keysets which are reoptimized for use in molecular similarity. Classification performance for a test data set of 957 compounds was increased from 0.65 for the 166 bit keyset and 0.67 for the 960 bit keyset to 0.71 for a surprisal S/N pruned keyset containing 208 bits and 0.71 for a genetic algorithm optimized keyset containing 548 bits. We present an overview of the underlying technology supporting the definition of descriptors and the encoding of these descriptors into keysets. This technology allows definition of descriptors as combinations of atom properties, bond properties, and atomic neighborhoods at various topological separations as well as supporting a number of custom descriptors. These descriptors can then be used to set one or more bits in a keyset. We constructed various keysets and optimized their performance in clustering bioactive substances. Performance was measured using methodology developed by Briem and Lessel. "Directed pruning" was carried out by eliminating bits from the keysets on the basis of random selection, values of the surprisal of the bit, or values of the surprisal S/N ratio of the bit. The random pruning experiment highlighted the insensitivity of keyset performance for keyset lengths of more than 1000 bits. Contrary to initial expectations, pruning on the basis of the surprisal values of the various bits resulted in keysets which underperformed those resulting from random pruning. In contrast, pruning on the basis of the surprisal S/N ratio was found to yield keysets which performed better than those resulting from random pruning. We also explored the use of genetic algorithms in the selection of optimal keysets. Once more the performance was only a weak function of keyset size, and the optimizations failed to identify a single globally optimal keyset. Instead multiple, equally optimal keysets could be produced which had relatively low overlap of the descriptors they encoded.
In the early 1950's, Harris, Mittwoch, Robson, and Warren (1, 2) investigated the mode of inheritance of cystinuria in 27 families by using quantitative determinations of cystine and dibasic amino acids as the genetic marker. Homozygotes were identified by the formation of urinary tract calculi composed of cystine and by gross hyperexcretion of cystine, lysine, arginine, and ornithine. Investigation of known heterozygotes (parents and children of affected subjects) revealed distinct phenotypic heterogeneity and identified two types of families. In one, comprising about two-thirds of the pedigrees studied, heterozygotes uniformly excreted normal quantities of cystine and dibasic amino acids, and genetic analysis was compatible with autosomal recessive inheritance. In the second, smaller group of pedigrees, an intermediate phenotype was found. All heterozygotes tested excreted moderate excesses of cystine and lysine.
For a number of years MDL products have exposed both 166 bit and 960 bit keysets based on 2D descriptors. These keysets were originally constructed and optimized for substructure searching. We report on improvements in the performance of MDL keysets which are reoptimized for use in molecular similarity. Classification performance for a test data set of 957 compounds was increased from 0.65 for the 166 bit keyset and 0.67 for the 960 bit keyset to 0.71 for a surprisal S/N pruned keyset containing 208 bits and 0.71 for a genetic algorithm optimized keyset containing 548 bits. We present an overview of the underlying technology supporting the definition of descriptors and the encoding of these descriptors into keysets. This technology allows definition of descriptors as combinations of atom properties, bond properties, and atomic neighborhoods at various topological separations as well as supporting a number of custom descriptors. These descriptors can then be used to set one or more bits in a keyset. We constructed various keysets and optimized their performance in clustering bioactive substances. Performance was measured using methodology developed by Briem and Lessel. "Directed pruning" was carried out by eliminating bits from the keysets on the basis of random selection, values of the surprisal of the bit, or values of the surprisal S/N ratio of the bit. The random pruning experiment highlighted the insensitivity of keyset performance for keyset lengths of more than 1000 bits. Contrary to initial expectations, pruning on the basis of the surprisal values of the various bits resulted in keysets which underperformed those resulting from random pruning. In contrast, pruning on the basis of the surprisal S/N ratio was found to yield keysets which performed better than those resulting from random pruning. We also explored the use of genetic algorithms in the selection of optimal keysets. Once more the performance was only a weak function of keyset size, and the optimizations failed to identify a single globally optimal keyset. Instead multiple, equally optimal keysets could be produced which had relatively low overlap of the descriptors they encoded.
The availability of the easily implemented Gaussian-2 (G2) methodology has made it possible for the nonspecialist to calculate accurate heats of formation for many molecules on workstations. In order to quantify its performance for transition state structures, we have used G2 and a modified G2 on several transition states whose structures and energies have been well characterized either by experiment or multireference configuration interaction studies. The G2 method performs well in predicting energies of transition states (even for nonisogyric reactions), with an absolute average deviation of 1.5 kcal/mole in the classical barrier height for the cases studied, while it is less successful in predicting geometries and frequencies. We investigated modifying the G2 method for use with transition states by using QCISD/6-311G(d,p) geometries and frequencies instead of MP2/6-31G(d) geometries and scaled HF/6-31G(d) frequencies. The QCISD geometries and frequencies agree well with values from the literature, and this modified G2 procedure offers improved performance in predicting transition state energies.
We have undertaken a comprehensive study of the reaction NH2(X2B1) + NO → N2 + H2O (1a) and NH2(X 2B1) + NO → N2H + OH → N2 + H + OH (1b). Experimental measurements of the reaction rate coefficient and product branching fraction are combined with accurate ab initio calculations to give a detailed picture of this important reaction. The rate constant of reaction 1 was investigated in the temperature range 203 K ≤ T ≤ 813 K using the laser photolysis/CW laser-induced fluorescence technique for production and detection of NH2. The rate coefficient was found to be pressure independent between 10 and 100 Torr and is well described by k 1(T) = 1.65 × 10-7 T -1.54 exp (−93 K/T) cm3/(molecule·s). The deuterium kinetic isotope effect for the reactions of NH2 and ND2 with NO was investigated at temperatures between 210 and 481 K. A small, temperature-independent isotope effect of k H/k D = 1.05 ± 0.03 was found. Additional experimental work focused on measuring the product branching fraction for production of OH, Φ1b, and its deuterium isotope effect at room temperature. Measurements were performed using the discharge-flow technique with mass spectrometric detection of products. OH from channel 1b was reacted with excess CO and measured as CO2. The room temperature branching fraction was measured as Φ1b = 9.0 ± 2.5% (NH2 + NO; T = 298 K) and Φ1b = 5.5 ± 0.7% (ND2 + NO; T = 298 K). Theoretical calculations have characterized the stationary points on the potential energy surface connecting reactants with products using G2 and G2Q levels of theory. These calculations support the experimentally observed temperature dependences and kinetic isotope effects.
We have carried out a combined experimental and theoretical study of the reaction of N H (ND) (3Z-) with N O aimed at understanding the product distribution from that reaction. The reaction was studied at room temperature using the discharge flow technique with mass spectrometric detection of the reaction products.Measured product branching fractions at room temperature for production of N2O + H (D) were 0.8 0.4 for N H (3Z-) + N O and 0.87 f 0.17 for ND(3Z-) + NO (la statistical errors). Stationary points on the HNNO 2A' potential energy surface were characterized using the Gaussian 2 ab initio method. The initial addition of NH(3Z-) to N O on the 2A' surface is predicted to proceed without a barrier to form trans-HNNQ reaction to produce the cis isomer is predicted to have a barrier of 2.9 kcal/mol. The cis-and trans-HNNO are predicted to be at -48.9 and -56.0 kcal/mol relative to the separated reactants. Transition states with energies of -25.3 and -17.9 kcal/mol were located for dissociation of the cis isomer into H + N20 and O H + N2, respectively.The transition state for interconversion of the isomers was calculated to be at approximately -30.8 kcal/mol.The trans-HNNO was found to isomerize to the cis form before decomposing. The potential energy surface calculated explains the major features of the reaction.
Quantitative combustion diagnostics using laser-induced fluorescence require a knowledge of energy transfer and quenching rates at elevated temperatures. Such information is critical both for experimental design and for subsequent reduction of measured signals to measurements of temperature and species concentrations. We present the results of a study of electronic energy transfer in NO A 2 S + . These results are cast in the form of empirical correlations which have been developed to facilitate the practical applications of quenching corrections. The choice of particular functional forms for these correlations is based on a classical collisional model of the process. This model has been calibrated against an extensive set of measured cross sections. Results are presented for a number of species of interest in combustion and aerothermodynamic applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.