Combining information across genes in the statistical analysis of microarray data is desirable because of the relatively small number of data points obtained for each individual gene. Here we develop an estimator of the error variance that can borrow information across genes using the James-Stein shrinkage concept. A new test statistic (FS) is constructed using this estimator. The new statistic is compared with other statistics used to test for differential expression: the gene-specific F test (F1), the pooled-variance F statistic (F3), a hybrid statistic (F2) that uses the average of the individual and pooled variances, the regularized t-statistic, the posterior odds statistic B, and the SAM t-test. The FS-test shows best or nearly best power for detecting differentially expressed genes over a wide range of simulated data in which the variance components associated with individual genes are either homogeneous or heterogeneous. Thus FS provides a powerful and robust approach to test differential expression of genes that utilizes information not available in individual gene testing approaches and does not suffer from biases of the pooled variance approach.
dictive accuracy of a model, even when such is the researchers' explicit objective. This confusion persists. For The appropriateness of a statistical analysis for evaluating a model instance, see the 10 papers from a symposium on "Crop depends on the model's purpose. A common purpose for models in agricultural research and environmental management is accurate Modeling and Genomics" published recently in this jourprediction. In this context, correlation and linear regression are fre-nal (Agronomy Journal 95:4-113). That symposium ilquently used to test or compare models, including tests of intercept lustrates the frequent use of correlation and regression a ϭ 0 and slope b ϭ 1, but unfortunately such results are related only for model evaluation. obliquely to the specific matter of predictive success. The mean However, Kobayashi and Salam (2000) present cosquared deviation (MSD) between model predictions X and measured gent reasons why the correlation coefficient and linear values Y has been proposed as a directly relevant measure of predictive regression are not entirely satisfactory for model evaluasuccess, with MSD partitioned into three components to gain further tion and suggest that MSD and its components are often insight into model performance. This paper proposes a different and more informative. Further developing those findings, a better partitioning of MSD: squared bias (SB), nonunity slope (NU), different partitioning of MSD components has the adand lack of correlation (LC). These MSD components are distinct and additive, they have straightforward geometric and analysis of vantage of yielding distinct components with straightvariance (ANOVA) interpretations, and they relate transparently to forward meanings. regression parameters. Our MSD components are illustrated using several models for wheat (Triticum aestivum L.) yield. The MSD statistic and its components nicely complement correlation and linear COMPONENTS OF regression in evaluating the predictive accuracy of models.
MEAN SQUARED DEVIATIONModel-based and measured values, X and Y, can be compared for the purpose of evaluating a simulation 1442
In an earlier article, an intuitively appealing method for estimating the number of true null hypotheses in a multiple test situation was proposed. That article presented an iterative algorithm that relies on a histogram of observed p values to obtain the estimator. We characterize the limit of that iterative algorithm and show that the estimator can be computed directly without iteration. We compare the performance of the histogrambased estimator with other procedures for estimating the number of true null hypotheses from a collection of observed p values and find that the histogram-based estimator performs well in settings similar to those encountered in microarray data analysis. We demonstrate the approach using p values from a large microarray experiment aimed at uncovering molecular mechanisms of barley resistance to a fungal pathogen.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.