We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the United States Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for junction discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed, for these and qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings.
RNA-seq facilitates unbiased genome-wide gene-expression profiling. However, its concordance with the well-established microarray platform must be rigorously assessed for confident uses in clinical and regulatory application. Here we use a comprehensive study design to generate Illumina RNA-seq and Affymetrix microarray data from the same set of liver samples of rats under varying degrees of perturbation by 27 chemicals representing multiple modes of action (MOA). The cross-platform concordance in terms of differentially expressed genes (DEGs) or enriched pathways is highly correlated with treatment effect size, gene-expression abundance and the biological complexity of the MOA. RNA-seq outperforms microarray (90% versus 76%) in DEG verification by quantitative PCR and the main gain is its improved accuracy for low expressed genes. Nonetheless, predictive classifiers derived from both platforms performed similarly. Therefore, the endpoint studied and its biological complexity, transcript abundance, and intended application are important factors in transcriptomic research and for decision-making.
Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees and independent investigators that evaluate methods for global gene expression analysis.
BackgroundGene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model.ResultsWe generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models.ConclusionsWe demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-015-0694-1) contains supplementary material, which is available to authorized users.
Batch effects are the systematic non-biological differences between batches (groups) of samples in microarray experiments due to various causes such as differences in sample preparation and hybridization protocols. Previous work focused mainly on the development of methods for effective batch effects removal. However, their impact on cross-batch prediction performance, which is one of the most important goals in microarray-based applications, has not been addressed. This paper uses a broad selection of data sets from the Microarray Quality Control Phase II (MAQC-II) effort, generated on three microarray platforms with different causes of batch effects to assess the efficacy of their removal. Two data sets from cross-tissue and cross-platform experiments are also included. Of the 120 cases studied using Support vector machines (SVM) and K nearest neighbors (KNN) as classifiers and Matthews correlation coefficient (MCC) as performance metric, we find that Ratio-G, Ratio-A, EJLR, mean-centering and standardization methods perform better or equivalent to no batch effect removal in 89, 85, 83, 79 and 75% of the cases, respectively, suggesting that the application of these methods is generally advisable and ratio-based methods are preferred.
Background With evidence of sustained transmission in more than 190 countries, coronavirus disease 2019 (COVID-19) has been declared a global pandemic. Data are urgently needed about risk factors associated with clinical outcomes. Methods A retrospective review of 323 hospitalized patients with COVID-19 in Wuhan was conducted. Patients were classified into three disease severity groups (non-severe, severe, and critical), based on initial clinical presentation. Clinical outcomes were designated as favorable and unfavorable, based on disease progression and response to treatments. Logistic regression models were performed to identify risk factors associated with clinical outcomes, and log-rank test was conducted for the association with clinical progression. Results Current standard treatments did not show significant improvement in patient outcomes. By univariate logistic regression analysis, 27 risk factors were significantly associated with clinical outcomes. Multivariate regression indicated age over 65 years (p<0.001), smoking (p=0.001), critical disease status (p=0.002), diabetes (p=0.025), high hypersensitive troponin I (>0.04 pg/mL, p=0.02), leukocytosis (>10 x 109/L, p<0.001) and neutrophilia (>75 x 109/L, p<0.001) predicted unfavorable clinical outcomes. By contrast, the administration of hypnotics was significantly associated with favorable outcomes (p<0.001), which was confirmed by survival analysis. Conclusions Hypnotics may be an effective ancillary treatment for COVID-19. We also found novel risk factors, such as higher hypersensitive troponin I, predicted poor clinical outcomes. Overall, our study provides useful data to guide early clinical decision making to reduce mortality and improve clinical outcomes of COVID-19.
Background With evidence of sustained transmission in more than 190 countries, coronavirus disease 2019 (COVID-19) has been declared a global pandemic. As such, data are urgently needed about risk factors associated with clinical outcomes. Methods A retrospective chart review of 323 hospitalized patients with COVID-19 in Wuhan was conducted. Patients were classified into three disease severity groups (non-severe, severe, and critical), based on their initial clinical presentation. Clinical outcomes were designated as favorable and unfavorable, based on disease progression and response to treatments. Logistic regression models were performed to identify factors associated with clinical outcomes, and logrank test was conducted for the association with clinical progression. Results Current standard treatments did not show significant improvement on patient outcomes in the study. By univariate logistic regression model, 27 risk factors were significantly associated with clinical outcomes. Further, multivariate regression indicated that age over 65 years, smoking, critical disease status, diabetes, high hypersensitive troponin I (>0.04 pg/mL), leukocytosis (>10 x 109/L) and neutrophilia (>75 x 109/L) predicted unfavorable clinical outcomes. By contrast, the use of hypnotics was significantly associated with favorable outcomes. Survival analysis also confirmed that patients receiving hypnotics had significantly better survival. Conclusions To our knowledge, this is the first indication that hypnotics could be an effective ancillary treatment for COVID-19. We also found that novel risk factors, such as higher hypersensitive troponin I, predicted poor clinical outcomes. Overall, our study provides useful data to guide early clinical decision making to reduce mortality and improve clinical outcomes of COVID-19.
Highlights d Cities possess a consistent ''core'' set of non-human microbes d Urban microbiomes echo important features of cities and city-life d Antimicrobial resistance genes are widespread in cities d Cities contain many novel bacterial and viral species
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.