Combinatorial chemistry and high-throughput screening are revolutionizing the process of lead discovery in the pharmaceutical industry. Large numbers of structures and vast quantities of biological assay data are quickly being accumulated, overwhelming traditional structure/activity relationship (SAR) analysis technologies. Recursive partitioning is a method for statistically determining rules that classify objects into similar categories or, in this case, structures into groups of molecules with similar potencies. SCAM is a computer program implemented to make extremely efficient use of this methodology. Depending on the size of the data set, rules explaining biological data can be determined interactively. An example data set of 1650 monoamine oxidase inhibitors exemplifies the method, yielding substructural rules and leading to general classifications of these inhibitors. The method scales linearly with the number of descriptors, so hundreds of thousands of structures can be analyzed utilizing thousands to millions of molecular descriptors. There are currently no methods to deal with statistical analysis problems of this size. An important aspect of this analysis is the ability to deal with mixtures, i.e., identify SAR rules for classes of compounds in the same data set that might be binding in different ways. Most current quantitative structure/activity relationship methods require that the compounds follow a single mechanism. Advantages and limitations of this methodology are presented.
Male reproductive tract abnormalities associated with testicular dysgenesis in humans also occur in male rats exposed gestationally to some phthalate esters. We examined global gene expression in the fetal testis of the rat following in utero exposure to a panel of phthalate esters. Pregnant Sprague-Dawley rats were treated by gavage daily from Gestational Days 12 through 19 with corn oil vehicle (1 ml/kg) or diethyl phthalate (DEP), dimethyl phthalate (DMP), dioctyl tere-phthalate (DOTP), dibutyl phthalate (DBP), diethylhexyl phthalate (DEHP), dipentyl phthalate (DPP), or benzyl butyl phthalate (BBP) at 500 mg/kg per day. Testes were isolated on Gestational Day 19, and global changes in gene expression were determined. Of the approximately 30 000 genes queried, expression of 391 genes was significantly altered following exposure to the developmentally toxic phthalates (DBP, BBP, DPP, and DEHP) relative to the control. The developmentally toxic phthalates were indistinguishable in their effects on global gene expression. No significant changes in gene expression were detected in the nondevelopmentally toxic phthalate group (DMP, DEP, and DOTP). Gene pathways disrupted include those previously identified as targets for DBP, including cholesterol transport and steroidogenesis, as well as newly identified pathways involved in intracellular lipid and cholesterol homeostasis, insulin signaling, transcriptional regulation, and oxidative stress. Additional gene targets include alpha inhibin, which is essential for normal Sertoli cell development, and genes involved with communication between Sertoli cells and gonocytes. The common targeting of these genes by a select group of phthalates indicates a role for their associated molecular pathways in testicular development and offers new insight into the molecular mechanisms of testicular dysgenesis.
In microarray data there are a number of biological samples, each assessed for the level of gene expression for a typically large number of genes. There is a need to examine these data with statistical techniques to help discern possible patterns in the data. Our technique applies a combination of mathematical and statistical methods to progressively take the data set apart so that different aspects can be examined for both general patterns and very specific effects. Unfortunately, these data tables are often corrupted with extreme values (outliers), missing values, and non-normal distributions that preclude standard analysis. We develop a robust analysis method to address these problems. The benefits of this robust analysis will be both the understanding of large-scale shifts in gene effects and the isolation of particular sample-by-gene effects that might be either unusual interactions or the result of experimental flaws. Our method requires a single pass and does not resort to complex ''cleaning'' or imputation of the data table before analysis. We illustrate the method with a commercial data set.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with đź’™ for researchers
Part of the Research Solutions Family.