Joseph L. Durant scite author profile

J. Chem. Inf. Comput. Sci.

¹

,

Leland

²

,

Henry

³

et al. 2002

For a number of years MDL products have exposed both 166 bit and 960 bit keysets based on 2D descriptors. These keysets were originally constructed and optimized for substructure searching. We report on improvements in the performance of MDL keysets which are reoptimized for use in molecular similarity. Classification performance for a test data set of 957 compounds was increased from 0.65 for the 166 bit keyset and 0.67 for the 960 bit keyset to 0.71 for a surprisal S/N pruned keyset containing 208 bits and 0.71 for a genetic algorithm optimized keyset containing 548 bits. We present an overview of the underlying technology supporting the definition of descriptors and the encoding of these descriptors into keysets. This technology allows definition of descriptors as combinations of atom properties, bond properties, and atomic neighborhoods at various topological separations as well as supporting a number of custom descriptors. These descriptors can then be used to set one or more bits in a keyset. We constructed various keysets and optimized their performance in clustering bioactive substances. Performance was measured using methodology developed by Briem and Lessel. "Directed pruning" was carried out by eliminating bits from the keysets on the basis of random selection, values of the surprisal of the bit, or values of the surprisal S/N ratio of the bit. The random pruning experiment highlighted the insensitivity of keyset performance for keyset lengths of more than 1000 bits. Contrary to initial expectations, pruning on the basis of the surprisal values of the various bits resulted in keysets which underperformed those resulting from random pruning. In contrast, pruning on the basis of the surprisal S/N ratio was found to yield keysets which performed better than those resulting from random pruning. We also explored the use of genetic algorithms in the selection of optimal keysets. Once more the performance was only a weak function of keyset size, and the optimizations failed to identify a single globally optimal keyset. Instead multiple, equally optimal keysets could be produced which had relatively low overlap of the descriptors they encoded.

Evaluation of transition state properties by density functional theory

Chemical Physics Letters

¹

1996

Cystinuria: biochemical evidence for three genetically distinct diseases.

Rosenberg¹,

Downing²,

Durant³

et al. 1966

In the early 1950's, Harris, Mittwoch, Robson, and Warren (1, 2) investigated the mode of inheritance of cystinuria in 27 families by using quantitative determinations of cystine and dibasic amino acids as the genetic marker. Homozygotes were identified by the formation of urinary tract calculi composed of cystine and by gross hyperexcretion of cystine, lysine, arginine, and ornithine. Investigation of known heterozygotes (parents and children of affected subjects) revealed distinct phenotypic heterogeneity and identified two types of families. In one, comprising about two-thirds of the pedigrees studied, heterozygotes uniformly excreted normal quantities of cystine and dibasic amino acids, and genetic analysis was compatible with autosomal recessive inheritance. In the second, smaller group of pedigrees, an intermediate phenotype was found. All heterozygotes tested excreted moderate excesses of cystine and lysine.

Reoptimization of MDL Keys for Use in Drug Discovery.

¹

,

Leland

²

,

Henry

³

et al. 2003

For a number of years MDL products have exposed both 166 bit and 960 bit keysets based on 2D descriptors. These keysets were originally constructed and optimized for substructure searching. We report on improvements in the performance of MDL keysets which are reoptimized for use in molecular similarity. Classification performance for a test data set of 957 compounds was increased from 0.65 for the 166 bit keyset and 0.67 for the 960 bit keyset to 0.71 for a surprisal S/N pruned keyset containing 208 bits and 0.71 for a genetic algorithm optimized keyset containing 548 bits. We present an overview of the underlying technology supporting the definition of descriptors and the encoding of these descriptors into keysets. This technology allows definition of descriptors as combinations of atom properties, bond properties, and atomic neighborhoods at various topological separations as well as supporting a number of custom descriptors. These descriptors can then be used to set one or more bits in a keyset. We constructed various keysets and optimized their performance in clustering bioactive substances. Performance was measured using methodology developed by Briem and Lessel. "Directed pruning" was carried out by eliminating bits from the keysets on the basis of random selection, values of the surprisal of the bit, or values of the surprisal S/N ratio of the bit. The random pruning experiment highlighted the insensitivity of keyset performance for keyset lengths of more than 1000 bits. Contrary to initial expectations, pruning on the basis of the surprisal values of the various bits resulted in keysets which underperformed those resulting from random pruning. In contrast, pruning on the basis of the surprisal S/N ratio was found to yield keysets which performed better than those resulting from random pruning. We also explored the use of genetic algorithms in the selection of optimal keysets. Once more the performance was only a weak function of keyset size, and the optimizations failed to identify a single globally optimal keyset. Instead multiple, equally optimal keysets could be produced which had relatively low overlap of the descriptors they encoded.

Transition state structures and energetics using Gaussian-2 theory

¹

,

Rohlfing

²

1993

The availability of the easily implemented Gaussian-2 (G2) methodology has made it possible for the nonspecialist to calculate accurate heats of formation for many molecules on workstations. In order to quantify its performance for transition state structures, we have used G2 and a modified G2 on several transition states whose structures and energies have been well characterized either by experiment or multireference configuration interaction studies. The G2 method performs well in predicting energies of transition states (even for nonisogyric reactions), with an absolute average deviation of 1.5 kcal/mole in the classical barrier height for the cases studied, while it is less successful in predicting geometries and frequencies. We investigated modifying the G2 method for use with transition states by using QCISD/6-311G(d,p) geometries and frequencies instead of MP2/6-31G(d) geometries and scaled HF/6-31G(d) frequencies. The QCISD geometries and frequencies agree well with values from the literature, and this modified G2 procedure offers improved performance in predicting transition state energies.