Background: High-throughput RNA sequencing (RNA-seq) has evolved as an important analytical tool in molecular biology. Although the utility and importance of this technique have grown, uncertainties regarding the proper analysis of RNA-seq data remain. Of primary concern, there is no consensus regarding which normalization and statistical methods are the most appropriate for analyzing this data. The lack of standardized analytical methods leads to uncertainties in data interpretation and study reproducibility, especially with studies reporting high false discovery rates. In this study, we compared a recently developed normalization method, UQ-pgQ2, with three of the most frequently used alternatives including RLE (relative log estimate), TMM (Trimmed-mean M values) and UQ (upper quartile normalization) in the analysis of RNA-seq data. We evaluated the performance of these methods for gene-level differential expression analysis by considering the factors, including: 1) normalization combined with the choice of a Wald test from DESeq2 and an exact test/QL (Quasi-likelihood) F-Test from edgeR; 2) sample sizes in two balanced two-group comparisons; and 3) sequencing read depths. Results: Using the MAQC RNA-seq datasets with small sample replicates, we found that UQ-pgQ2 normalization combined with an exact test can achieve better performance in term of power and specificity in differential gene expression analysis. However, using an intra-group analysis of false positives from real and simulated data, we found that a Wald test performs better than an exact test when the number of sample replicates is large and that a QL Ftest performs the best given sample sizes of 5, 10 and 15 for any normalization. The RLE, TMM and UQ methods performed similarly given a desired sample size. Conclusion: We found the UQ-pgQ2 method combined with an exact test/QL F-test is the best choice in order to control false positives when the sample size is small. When the sample size is large, UQ-pgQ2 with a QL F-test is a better choice for the type I error control in an intra-group analysis. We observed read depths have a minimal impact for differential gene expression analysis based on the simulated data.
The high throughput RNA sequencing (RNA-seq) technology has become the popular method of choice for transcriptomics and the detection of differentially expressed genes. Sample size calculations for RNA-seq experimental design are an important consideration in biological research and clinical trials. Currently, the sample size formulas derived from the Wald and the likelihood ratio statistical tests with a Poisson distribution to model RNA-seq data have been developed. However, since the mean read counts in the real RNA-seq data are not equal to the variance, an extended method to calculate sample sizes based on a negative binomial distribution using an exact test statistic was proposed by . In this study, we alternatively derive five sample size calculation methods based on the negative binomial distribution using the Wald test, the log-transformed Wald test and the log-likelihood ratio test statistics. A comparison of our five methods and an existing method was performed by calculating the sample sizes and the simulated power in different scenarios. We first calculated the sample sizes for testing a single gene using the six methods given a nominal significance level α at 0.05 and 80% power. Then, we calculated the sample sizes for testing multiple genes given a false discovery rate (FDR) at 0.05 and 0.10. The empirical power and true prognostic genes for differential gene expression analysis corresponding to the estimated sample sizes from the six methods are also estimated via the simulation studies. Using the sample size formulas derived from log-transformed and Wald-based tests, we observed smaller sample properties while maintaining the nominal power close to or higher than 80% in all the settings compared to other methods. Moreover, the Wald test based sample size calculation method is easier to compute and faster in an RNA-seq experimental design. Later, several sample size calculation methods that were derived from the score statistic and the log-likelihood ratio test (LRT) statistic using the Poisson distribution were proposed [16]. However, the assumption of a Poisson distribution that the expected mean and variance are equal usually does not hold for RNA-seq studies, where the variance is typically greater than the mean of the read counts [17]. Therefore, a negative binomial distribution with a dispersion parameter is used to model RNA-seq data by the existing software packages such as DESeq [17] and edgeR [18], in which an exact test is used to test DEGs between conditions. Subsequently, a sample size calculation method based on an exact test statistic with the aid of the edgeR package [18] was proposed [19]. However, sample size methods derived from other test statistics such as the Wald test, the LRT and an extension of Wald test via log-transformation using negative binomial distribution to model the RNA-seq data have not yet been explored.
Engineered cardiac tissues (ECTs) are platforms to investigate cardiomyocyte maturation and functional integration, the feasibility of generating tissues for cardiac repair, and as models for pharmacology and toxicology bioassays. ECTs rapidly mature in vitro to acquire the features of functional cardiac muscle and respond to mechanical load with increased proliferation and maturation. ECTs are now being investigated as platforms for in vitro models for human diseases and for pharmacologic screening for drug toxicities. We tested the hypothesis that global ECT gene expression patterns are complex and sensitive to mechanical loading and tyrosine kinase inhibitors similar to the maturing myocardium. We generated ECTs from day 14.5 rat embryo ventricular cells, as previously published, and then conditioned constructs after 5 days in culture for 48 h with mechanical stretch (5%, 0.5 Hz) and/or the p38 MAPK (p38 mitogen-activated protein kinase) inhibitor BIRB796. RNA was isolated from individual ECTs and assayed using a standard Agilent rat 4 × 44k V3 microarray and Pathway Analysis software for transcript expression fold changes and changes in regulatory molecules and networks. Changes in expression were confirmed by quantitative-polymerase chain reaction (q-PCR) for selected regulatory molecules. At the threshold of a 1.5-fold change in expression, stretch altered 1559 transcripts, versus 1411 for BIRB796, and 1846 for stretch plus BIRB796. As anticipated, top pathways altered in response to these stimuli include cellular development, cellular growth and proliferation; tissue development; cell death, cell signaling, and small molecule biochemistry as well as numerous other pathways. Thus, ECTs display a broad spectrum of altered gene expression in response to mechanical load and/or tyrosine kinase inhibition, reflecting a complex regulation of proliferation, differentiation, and architectural alignment of cardiomyocytes and noncardiomyocytes within ECT.
High throughput RNA sequencing (RNA-seq) technology is increasingly used in disease-related biomarker studies. A negative binomial distribution has become the popular choice for modeling read counts of genes in RNA-seq data due to over-dispersed read counts. In this study, we propose two explicit sample size calculation methods for RNA-seq data using a negative binomial regression model. To derive these new sample size formulas, the common dispersion parameter and the size factor as an offset via a natural logarithm link function are incorporated. A two-sided Wald test statistic derived from the coefficient parameter is used for testing a single gene at a nominal significance level 0.05 and multiple genes at a false discovery rate 0.05. The variance for the Wald test is computed from the variance-covariance matrix with the parameters estimated from the maximum likelihood estimates under the unrestricted and constrained scenarios. The performance and a side-by-side comparison of our new formulas with three existing methods with a Wald test, a likelihood ratio test or an exact test are evaluated via simulation studies. Since other methods are much computationally extensive, we recommend our M1 method for quick and direct estimation of sample sizes in an experimental design. Finally, we illustrate sample sizes estimation using an existing breast cancer RNA-seq data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.