Shanrong Zhao scite author profile

Drug discovery and development pipelines are long, complex and depend on numerous factors. Machine learning (ML) approaches provide a set of tools that can improve discovery and decision making for well-specified questions with abundant, high-quality data. Opportunities to apply ML occur in all stages of drug discovery. Examples include target validation, identification of prognostic biomarkers and analysis of digital pathology data in clinical trials. Applications have ranged in context and methodology, with some approaches yielding accurate predictions and insights. The challenges of applying ML lie primarily with the lack of interpretability and repeatability of ML-generated results, which may limit their application. In all areas, systematic and comprehensive high-dimensional data still need to be generated. With ongoing efforts to tackle these issues, as well as increasing awareness of the factors needed to validate ML approaches, the application of ML can promote data-driven decision making and has the potential to speed up the process and reduce failure rates in drug discovery and development.

show abstract

Comparison of RNA-Seq and Microarray in Transcriptome Profiling of Activated T Cells

Zhao

et al. 2014

View full text Add to dashboard Cite

To demonstrate the benefits of RNA-Seq over microarray in transcriptome profiling, both RNA-Seq and microarray analyses were performed on RNA samples from a human T cell activation experiment. In contrast to other reports, our analyses focused on the difference, rather than similarity, between RNA-Seq and microarray technologies in transcriptome profiling. A comparison of data sets derived from RNA-Seq and Affymetrix platforms using the same set of samples showed a high correlation between gene expression profiles generated by the two platforms. However, it also demonstrated that RNA-Seq was superior in detecting low abundance transcripts, differentiating biologically critical isoforms, and allowing the identification of genetic variants. RNA-Seq also demonstrated a broader dynamic range than microarray, which allowed for the detection of more differentially expressed genes with higher fold-change. Analysis of the two datasets also showed the benefit derived from avoidance of technical issues inherent to microarray probe performance such as cross-hybridization, non-specific hybridization and limited detection range of individual probes. Because RNA-Seq does not rely on a pre-designed complement sequence detection probe, it is devoid of issues associated with probe redundancy and annotation, which simplified interpretation of the data. Despite the superior benefits of RNA-Seq, microarrays are still the more common choice of researchers when conducting transcriptional profiling experiments. This is likely because RNA-Seq sequencing technology is new to most researchers, more expensive than microarray, data storage is more challenging and analysis is more complex. We expect that once these barriers are overcome, the RNA-Seq platform will become the predominant tool for transcriptome analysis.

show abstract

Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols

Zhao

Stanton

2020

RNA

275

194

View full text Add to dashboard Cite

In recent years, RNA-sequencing (RNA-seq) has emerged as a powerful technology for transcriptome profiling. For a given gene, the number of mapped reads is not only dependent on its expression level and gene length, but also the sequencing depth. To normalize these dependencies, RPKM (reads per kilobase of transcript per million reads mapped) and TPM (transcripts per million) are used to measure gene or transcript expression levels. A common misconception is that RPKM and TPM values are already normalized, and thus should be comparable across samples or RNA-seq projects. However, RPKM and TPM represent the relative abundance of a transcript among a population of sequenced transcripts, and therefore depend on the composition of the RNA population in a sample. Quite often, it is reasonable to assume that total RNA concentration and distributions are very close across compared samples. Nevertheless, the sequenced RNA repertoires may differ significantly under different experimental conditions and/or across sequencing protocols; thus, the proportion of gene expression is not directly comparable in such cases. In this review, we illustrate typical scenarios in which RPKM and TPM are misused, unintentionally, and hope to raise scientists’ awareness of this issue when comparing them across samples or different sequencing protocols.

show abstract

Subsets of ILC3−ILC1-like cells generate a diversity spectrum of innate lymphoid cells in human mucosal tissues

et al. 2019

View full text Add to dashboard Cite

Author contributions. C.S., P.L.C and S.Z. contributed equally to this work. M. Cella designed, performed and interpreted experiments. R.G. and S.Z. analyzed scRNA-seq data and wrote methods for scRNA-seq analysis. C.S. generated Aiolos-and T-bet-transduced MNK3 cells. M.L.R. and V.P. analyzed the microarray data and RNA-seq data. K.Z. and M.N.A. provided bioinformatic support. J.K.B., K.Y. and V.C. helped in flow cytometry data presentation and analysis. C.F. and R.F. generated libraries for scRNA-seq. J.S. provided critical advice for Cytof analysis. W.G., L.-L.L. and M.B. provided critical insights to the study. S.G., R.A.F. and L.S. provided key reagents. P.L.C. performed cut and run experiment and interpreted data under supervision of E.M.O. S.A.J. and M. Colonna supervised the study. M. Cella, S.A.J. and M. Colonna wrote the manuscript and all the authors contributed editing and suggestions.

show abstract

Evaluation and comparison of computational tools for RNA-seq isoform quantification

et al. 2017

View full text Add to dashboard Cite

BackgroundAlternatively spliced transcript isoforms are commonly observed in higher eukaryotes. The expression levels of these isoforms are key for understanding normal functions in healthy tissues and the progression of disease states. However, accurate quantification of expression at the transcript level is limited with current RNA-seq technologies because of, for example, limited read length and the cost of deep sequencing.ResultsA large number of tools have been developed to tackle this problem, and we performed a comprehensive evaluation of these tools using both experimental and simulated RNA-seq datasets. We found that recently developed alignment-free tools are both fast and accurate. The accuracy of all methods was mainly influenced by the complexity of gene structures and caution must be taken when interpreting quantification results for short transcripts. Using TP53 gene simulation, we discovered that both sequencing depth and the relative abundance of different isoforms affect quantification accuracyConclusionsOur comprehensive evaluation helps data analysts to make informed choice when selecting computational tools for isoform quantification.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-017-4002-1) contains supplementary material, which is available to authorized users.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shanrong Zhao

Applications of machine learning in drug discovery and development

Comparison of RNA-Seq and Microarray in Transcriptome Profiling of Activated T Cells

Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols

Subsets of ILC3−ILC1-like cells generate a diversity spectrum of innate lymphoid cells in human mucosal tissues

Evaluation and comparison of computational tools for RNA-seq isoform quantification

Contact Info

Product

Resources

About