Datasets from remote-sensing platforms and sensor networks are often spatial, temporal, and very large. Processing massive amounts of data to provide current estimates of the (hidden) state from current and past data is challenging, even for the Kalman filter. A large number of spatial locations observed through time can quickly lead to an overwhelmingly high-dimensional statistical model. Dimension reduction without sacrificing complexity is our goal in this article. We demonstrate how a Spatio-Temporal Random Effects (STRE) component of a statistical model reduces the problem to one of fixed dimension with a very fast statistical solution, a methodology we call Fixed Rank Filtering (FRF). This is compared in a simulation experiment to successive, spatialonly predictions based on an analogous Spatial Random Effects (SRE) model, and the value of incorporating temporal dependence is quantified. A remote-sensing dataset of aerosol optical depth (AOD), from the Multi-angle Imaging SpectroRadiometer (MISR) instrument on the Terra satellite, is used to compare spatio-temporal FRF with spatialonly prediction. FRF achieves rapid production of optimally filtered AOD predictions, along with their prediction standard errors. In our case, over 100,000 spatio-temporal data were processed: Parameter estimation took 64.4 seconds and optimal predictions and their standard errors took 77.3 seconds to compute. Supplemental materials giving complete details on the design and analysis of a simulation experiment, the simulation code, and the MISR data used are available on-line.
This paper focuses on obtaining clustering information about a distribution from its i.i.d. samples. We develop theoretical results to understand and use clustering information contained in the eigenvectors of data adjacency matrices based on a radial kernel function with a sufficiently fast tail decay. In particular, we provide population analyses to gain insights into which eigenvectors should be used and when the clustering information for the distribution can be recovered from the sample. We learn that a fixed number of top eigenvectors might at the same time contain redundant clustering information and miss relevant clustering information. We use this insight to design the data spectroscopic clustering (DaSpec) algorithm that utilizes properly selected eigenvectors to determine the number of clusters automatically and to group the data accordingly. Our findings extend the intuitions underlying existing spectral techniques such as spectral clustering and Kernel Principal Components Analysis, and provide new understanding into their usability and modes of failure. Simulation studies and experiments on real-world data are conducted to show the potential of our algorithm. In particular, DaSpec is found to handle unbalanced groups and recover clusters of different shapes better than the competing methods.Comment: Published in at http://dx.doi.org/10.1214/09-AOS700 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
The National Aeronautics and Space Administration (NASA) has a remote‐sensing program with a large array of satellites whose mission is earth‐system science. To carry out this mission, NASA produces data at various levels; level‐2 data have been calibrated to the satellite's footprint at high temporal resolution, although there is often a lot of missing data. Level‐3 data are produced on a regular latitude—longitude grid over the whole globe at a coarser spatial and temporal resolution (such as a day, a month, or a repeat‐cycle of the satellite), and there are still missing data. This article demonstrates that spatio‐temporal statistical models can be made operational and provide a way to estimate level‐3 values over the whole grid and attach to each value a measure of its uncertainty. Specifically, a hierarchical statistical model is presented that includes a spatio‐temporal random effects (STRE) model as a dynamical component and a temporally independent spatial component for the fine‐scale variation. Optimal spatio‐temporal predictions and their mean squared prediction errors are derived in terms of a fixed‐dimensional Kalman filter. The predictions provide estimates of missing values and filter out unwanted noise. The resulting fixed‐rank filter is scalable, in that it can handle very large data sets. Its functionality relies on estimation of the model's parameters, which is presented in detail. It is demonstrated how both past and current remote‐sensing observations on aerosol optical depth (AOD) can be combined, yielding an optimal statistical predictor of AOD on the log scale along with its prediction standard error. The Canadian Journal of Statistics 38: 271–289; 2010 © 2010 Statistical Society of Canada
BackgroundThe formation of an allopolyploid is a two step process, comprising an initial wide hybridization event, which is later followed by a whole genome doubling. Both processes can affect the transcription of homoeologues. Here, RNA-Seq was used to obtain the genome-wide leaf transcriptome of two independent Triticum turgidum × Aegilops tauschii allotriploids (F1), along with their spontaneous allohexaploids (S1) and their parental lines. The resulting sequence data were then used to characterize variation in homoeologue transcript abundance.ResultsThe hybridization event strongly down-regulated D-subgenome homoeologues, but this effect was in many cases reversed by whole genome doubling. The suppression of D-subgenome homoeologue transcription resulted in a marked frequency of parental transcription level dominance, especially with respect to genes encoding proteins involved in photosynthesis. Singletons (genes where no homoeologues were present) were frequently transcribed at both the allotriploid and allohexaploid plants.ConclusionsThe implication is that whole genome doubling helps to overcome the phenotypic weakness of the allotriploid, restoring a more favourable gene dosage in genes experiencing transcription level dominance in hexaploid wheat.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-017-3558-0) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.