Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees and independent investigators that evaluate methods for global gene expression analysis.
We have used a supervised classification approach to systematically mine a large microarray database derived from livers of compound-treated rats. Thirty-four distinct signatures (classifiers) for pharmacological and toxicological end points can be identified. Just 200 genes are sufficient to classify these end points. Signatures were enriched in xenobiotic and immune response genes and contain un-annotated genes, indicating that not all key genes in the liver xenobiotic responses have been characterized. Many signatures with equal classification capabilities but with no gene in common can be derived for the same phenotypic end point. The analysis of the union of all genes present in these signatures can reveal the underlying biology of that end point as illustrated here using liver fibrosis signatures. Our approach using the whole genome and a diverse set of compounds allows a comprehensive view of most pharmacological and toxicological questions and is applicable to other situations such as disease and development.
The Critical Path Institute recently established the Predictive Safety Testing Consortium, a collaboration between several companies and the U.S. Food and Drug Administration, aimed at evaluating and qualifying biomarkers for a variety of toxicological endpoints. The Carcinogenicity Working Group of the Predictive Safety Testing Consortium has concentrated on sharing data to test the predictivity of two published hepatic gene expression signatures, including the signature by Fielden et al. (2007, Toxicol. Sci. 99, 90-100) for predicting nongenotoxic hepatocarcinogens, and the signature by Nie et al. (2006, Mol. Carcinog. 45, 914-933) for predicting nongenotoxic carcinogens. Although not a rigorous prospective validation exercise, the consortium approach created an opportunity to perform a meta-analysis to evaluate microarray data from short-term rat studies on over 150 compounds. Despite significant differences in study designs and microarray platforms between laboratories, the signatures proved to be relatively robust and more accurate than expected by chance. The accuracy of the Fielden et al. signature was between 63 and 69%, whereas the accuracy of the Nie et al. signature was between 55 and 64%. As expected, the predictivity was reduced relative to internal validation estimates reported under identical test conditions. Although the signatures were not deemed suitable for use in regulatory decision making, they were deemed worthwhile in the early assessment of drugs to aid decision making in drug development. These results have prompted additional efforts to rederive and evaluate a QPCR-based signature using these samples. When combined with a standardized test procedure and prospective interlaboratory validation, the accuracy and potential utility in preclinical applications can be ascertained.
Intratumoral heterogeneity of cancer cells remains largely unexplored. Here we investigated the composition of ovarian cancer and its biological relevance. A whole-genome single nucleotide polymorphism array was applied to detect the clonal composition of 24 formalin-fixed, paraffin-embedded samples of human ovarian cancer. Genome-wide segmentation data consisting of the log2 ratio (log2R) and B allele frequency (BAF) were used to calculate an estimate of the clonal composition number (CC number) for each tumor. Somatic mutation profiles of cancer-related genes were also determined for the same 24 samples by next-generation sequencing. The CC number was estimated successfully for 23 of the 24 cancer samples. The mean ± SD value for the CC number was 1.7 ± 1.1 (range of 0-4). A somatic mutation in at least one gene was identified in 22 of the 24 ovarian cancer samples, with the mutations including those in the oncogenes KRAS (29.2%), PIK3CA (12.5%), BRAF (8.3%), FGFR2 (4.2%), and JAK2 (4.2%) as well as those in the tumor suppressor genes TP53 (54.2%), FBXW7 (8.3%), PTEN (4.2%), and RB1 (4.2%). Tumors with one or more oncogenic mutations had a significantly lower CC number than did those without such a mutation (1.0 ± 0.8 versus 2.3 ± 0.9, P = 0.0027), suggesting that cancers with driver oncogene mutations are less heterogeneous than those with other mutations. Our results thus reveal a reciprocal relation between oncogenic mutation status and clonal composition in ovarian cancer using the established method for the estimation of the CC number.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.