Variable selection for recovering sparsity in nonadditive and nonparametric models with high-dimensional variables has been challenging. This problem becomes even more difficult due to complications in modeling unknown interaction terms among high-dimensional variables. There is currently no variable selection method to overcome these limitations. Hence, in this article we propose a variable selection approach that is developed by connecting a kernel machine with the nonparametric regression model. The advantages of our approach are that it can: (i) recover the sparsity; (ii) automatically model unknown and complicated interactions; (iii) connect with several existing approaches including linear nonnegative garrote and multiple kernel learning; and (iv) provide flexibility for both additive and nonadditive nonparametric models. Our approach can be viewed as a nonlinear version of a nonnegative garrote method. We model the smoothing function by a Least Squares Kernel Machine (LSKM) and construct the nonnegative garrote objective function as the function of the sparse scale parameters of kernel machine to recover sparsity of input variables whose relevances to the response are measured by the scale parameters. We also provide the asymptotic properties of our approach. We show that sparsistency is satisfied with consistent initial kernel function coefficients under certain conditions. An efficient coordinate descent/backfitting algorithm is developed. A resampling procedure for our variable selection methodology is also proposed to improve the power.
A biological pathway represents a set of genes that serves a particular cellular or a physiological function. The genes within the same pathway are expected to function together and hence may interact with each other. It is also known that many genes, and so pathways, interact with other environmental variables. However, no formal procedure has yet been developed to evaluate the pathway-environment interaction. In this article, we propose a semiparametric method to model the pathway-environment interaction. The method connects a least square kernel machine and a semiparametric mixed effects model. We model nonparametrically the environmental effect via a natural cubic spline. Both a pathway effect and an interaction between a pathway and an environmental effect are modeled nonparametrically via a kernel machine, and we estimate variance component representing an interaction effect under a semiparametric mixed effects model. We then employ a restricted likelihood ratio test and a score test to evaluate the main pathway effect and the pathway-environment interaction. The approach was applied to a genetic pathway data of Type II diabetes, and pathways with either a significant main pathway effect, an interaction effect or both were identified.Other methods previously developed determined many as having a significant main pathway effect only. Furthermore, among those significant pathways, we discovered some pathways having a significant pathway-environment interaction effect, a result that other methods would not be able to detect.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.