We propose a new statistics for the detection of differentially expressed genes when the genes are activated only in a subset of the samples. Statistics designed for this unconventional circumstance has proved to be valuable for most cancer studies, where oncogenes are activated for a small number of disease samples. Previous efforts made in this direction include cancer outlier profile analysis (Tomlins and others, 2005), outlier sum (Tibshirani and Hastie, 2007), and outlier robust t-statistics (Wu, 2007). We propose a new statistics called maximum ordered subset t-statistics (MOST) which seems to be natural when the number of activated samples is unknown. We compare MOST to other statistics and find that the proposed method often has more power then its competitors.
In this paper, we consider the problem of variable selection for highdimensional generalized varying-coefficient models and propose a polynomial-spline based procedure that simultaneously eliminates irrelevant predictors and estimates the nonzero coefficients. In a "large p, small n" setting, we demonstrate the convergence rates of the estimator under suitable regularity assumptions. In particular, we show the adaptive group lasso estimator can correctly select important variables with probability approaching one and the convergence rates for the nonzero coefficients are the same as the oracle estimator (the estimator when the important variables are known before carrying out statistical analysis). To automatically choose the regularization parameters, we use the extended Bayesian information criterion (eBIC) that effectively controls the number of false positives. Monte Carlo simulations are conducted to examine the finite sample performance of the proposed procedures.
A single-index model (SIM) provides for parsimonious multi-dimensional nonlinear regression by combining parametric (linear) projection with univariate nonparametric (non-linear) regression models. We show that a particular Gaussian process (GP) formulation is simple to work with and ideal as an emulator for some types of computer experiment as it can outperform the canonical separable GP regression model commonly used in this setting. Our contribution focuses on drastically simplifying, re-interpreting, and then generalizing a recently proposed fully Bayesian GP-SIM combination. Favorable performance is illustrated on synthetic data and a real-data computer experiment. Two R packages, both released on CRAN, have been augmented to facilitate inference under our proposed model(s).
Summary. An extension of reproducing kernel Hilbert space (RKHS) theory provides a new framework for modeling functional regression models with functional responses. The approach only presumes a general nonlinear regression structure as opposed to previously studied linear regression models. Generalized cross-validation (GCV) is proposed for automatic smoothing parameter estimation. The new RKHS estimate is applied to both simulated and real data as illustrations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.