Feature selection to identify spatially variable genes is a key step during analyses of spatially-resolved transcriptomics data. Here, we propose nnSVG, a scalable approach to identify spatially variable genes based on nearest-neighbor Gaussian processes. Our method (i) identifies genes that vary in expression continuously across the entire tissue or within a priori defined spatial domains, (ii) uses gene-specific estimates of length scale parameters within the Gaussian process models, and (iii) scales linearly with the number of spatial locations. We demonstrate the performance of our method using experimental data from several technological platforms and simulations. A software implementation is available at https://bioconductor.org/packages/nnSVG.
In geostatistics, inference on spatial covariance parameters of the Gaussian process is often critical to scientists for understanding structural dependence in data. Finite-sample inference customarily proceeds either using posterior distributions from fully a Bayesian approach or via resampling/subsampling techniques in a frequentist setting. Resampling methods, in particular, the bootstrap, have become more attractive in the modern age of big data as, unlike Bayesian models that require sequential sampling from Markov chain Monte Carlo, they naturally lend themselves to parallel computing resources. However, a spatial bootstrap involves an expensive Cholesky decomposition to decorrelate the data. In this manuscript, we develop a highly scalable parametric spatial bootstrap that uses sparse Cholesky factors for parameter estimation and decorrelation. The proposed bootstrap for rapid inference on spatial covariances (BRISC) algorithm requires linear memory and computations and is embarrassingly parallel, thereby delivering substantial scalability. Simulation studies highlight the accuracy and computational efficiency of our approach. Analysing large satellite temperature data, BRISC produces inference that closely matches that delivered from a state-of-the-art Bayesian approach, while being several times faster. The R package BRISC is now available for download from GitHub (https://github.com/ArkajyotiSaha/BRISC) and will be available on CRAN soon.where 2 controls the variance of the spatial component, denotes the decay in spatial correlation, controls the process smoothness and K denotes the Bessel function of the second kind with order . If y denotes the vector of observations for all the locations and X is the corresponding covariate matrix, then marginalizing out w, the model for the observed data is given by y N.Xˇ, C.Â/ C 2 I/.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.