We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the United States Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for junction discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed, for these and qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings.
Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees and independent investigators that evaluate methods for global gene expression analysis.
We have developed GoMiner, a program package that organizes lists of 'interesting' genes (for example, under-and overexpressed genes from a microarray experiment) for biological interpretation in the context of the Gene Ontology. GoMiner provides quantitative and statistical output files and two useful visualizations. The first is a tree-like structure analogous to that in the AmiGO browser and the second is a compact, dynamically interactive 'directed acyclic graph'. Genes displayed in GoMiner are linked to major public bioinformatics resources. RationaleGene-expression profiling and other forms of high-throughput genomic and proteomic studies are revolutionizing biology. That much is universally agreed. But the new technologies pose new challenges. The first is the experiment itself, the second is statistical analysis of results, the third is biological interpretation. That third challenge is often the most vexing and time-consuming. In gene-expression microarray studies, for example, one generally obtains a list of dozens or hundreds of genes that differ in expression between samples and then asks: 'What does all of this mean biologically?' The work of the Gene Ontology (GO) Consortium [1] provides a way to address that question. GO organizes genes into hierarchical categories based on biological process, molecular function and subcellular localization. In the past, this GO information was queried one gene at a time. Recently, batch processing has been introduced [2], but with a flat-format output that does not communicate the richness of GO's hierarchical structure.We have developed, and present here, the program package GoMiner as a freely available computer resource that fully incorporates the hierarchical structure of the Gene Ontology to automate the functional categorization of gene lists of any length. GoMiner is downloadable free of charge from [3] or [4]. GoMiner was developed particularly for biological interpretation of microarray data; one can input a list of underand overexpressed genes and a list of all genes on the array, and then calculate enrichment or depletion of categories with genes that have changed expression. GoMiner thus facilitates analysis and organization of the results for rapid interpretation of 'omic' [5,6] data. For concreteness, the descriptions in
BackgroundGene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model.ResultsWe generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models.ConclusionsWe demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-015-0694-1) contains supplementary material, which is available to authorized users.
For analysis of multidrug resistance, a major barrier to effective cancer chemotherapy, we profiled mRNA expression of the 48 known human ABC transporters in 60 diverse cancer cell lines (the NCI-60) used by the National Cancer Institute to screen for anticancer activity. The use of real-time RT-PCR avoided artifacts commonly encountered with microarray technologies. By correlating the results with the growth inhibitory profiles of 1,429 candidate anticancer drugs tested against the cells, we identified which transporters are more likely than others to confer resistance to which agents. Unexpectedly, we also found and validated compounds whose activity is potentiated, rather than antagonized, by the MDR1 multidrug transporter. Such compounds may serve as leads for development.
Standardized benchmarking methods and tools are essential to robust accuracy assessment of NGS variant calling. Benchmarking variant calls requires careful attention to definitions of performance metrics, sophisticated comparison approaches, and stratification by variant type and genome context. To address these needs, the Global Alliance for Genomics and Health (GA4GH) Benchmarking Team convened representatives from sequencing technology developers, government agencies, academic bioinformatics researchers, clinical laboratories, and commercial technology and bioinformatics developers for whom benchmarking variant calls is essential to their work. This team addressed challenges in (1) matching variant calls with different representations, (2) defining standard performance metrics, (3) enabling stratification of performance by variant type and genome context, and (4) developing and describing limitations of high-confidence calls and regions that can be used as “truth”. Our methods are publicly available on GitHub (https://github.com/ga4gh/benchmarking-tools) and in a web-based app on precisionFDA, which allow users to compare their variant calls against truth sets and to obtain a standardized report on their variant calling performance. Our methods have been piloted in the precisionFDA variant calling challenges to identify the best-in-class variant calling methods within high-confidence regions. Finally, we recommend a set of best practices for using our tools and critically evaluating the results.
Chromosome rearrangement, a hallmark of cancer, has profound effects on carcinogenesis and tumor phenotype. We used a panel of 60 human cancer cell lines (the NCI-60) as a model system to identify relationships among DNA copy number, mRNA expression level, and drug sensitivity. For each of 64 cancer-relevant genes, we calculated all 4,096 possible Pearson's correlation coefficients relating DNA copy number (assessed by comparative genomic hybridization using bacterial artificial chromosome microarrays) and mRNA expression level (determined using both cDNA and Affymetrix oligonucleotide microarrays). The analysis identified an association of ERBB2 overexpression with 3p copy number, a finding supported by data from human tumors and a mouse model of ERBB2-induced carcinogenesis. When we examined the correlation between DNA copy number for all 353 unique loci on the bacterial artificial chromosome microarray and drug sensitivity for 118 drugs with putatively known mechanisms of action, we found a striking negative correlation (À0.983; 95% bootstrap confidence interval, À0.999 to À0.899) between activity of the enzyme drug L-asparaginase and DNA copy number of genes near asparagine synthetase in the ovarian cancer cells. Previous analysis of drug sensitivity and mRNA expression had suggested an inverse relationship between mRNA levels of asparagine synthetase and L-asparaginase sensitivity in the NCI-60. The concordance of pharmacogenomic findings at the DNA and mRNA levels strongly suggests further study of L-asparaginase for possible treatment of a low-synthetase subset of clinical ovarian cancers. The DNA copy number database presented here will enable other investigators to explore DNA transcript-drug relationships in their own domains of research focus. [Mol Cancer Ther 2006;5(4):853 -67]
Bone marrow angiogenesis is associated with multiple myeloma (MM) progression. Here, we report high constitutive hypoxia-inducible factor-1α (Hif-1α) expression in MM cells, which is associated with oncogenic c-Myc. A drug screen for anti-MM agents that decrease Hif-1α and c-Myc levels identified a variety of compounds, including bortezomib, lenalidomide, enzastaurin, and adaphostin. Functionally, based on transient knockdowns and overexpression, our data delineate a c-Myc/Hif-1α-dependent pathway mediating vascular endothelial growth factor production and secretion. The antiangiogenic activity of our tool compound, adaphostin, was subsequently shown in a zebrafish model and translated into a preclinical in vitro and in vivo model of MM in the bone marrow milieu. Our data, therefore, identify Hif-1α as a novel molecular target in MM and add another facet to anti-MM drug activity. [Cancer Res 2009;69(12):5082-90]
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.