Scott H. Holan scite author profile

Many data sources report related variables of interest that are also referenced over geographic regions and time; however, there are relatively few general statistical methods that one can readily use that incorporate these multivariate spatio-temporal dependencies. Additionally, many multivariate spatio-temporal areal data sets are extremely high dimensional, which leads to practical issues when formulating statistical models. For example, we analyze Quarterly Workforce Indicators (QWI) published by the US Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) program. QWIs are available by different variables, regions, and time points, resulting in millions of tabulations. Despite their already expansive coverage, by adopting a fully Bayesian framework, the scope of the QWIs can be extended to provide estimates of missing values along with associated measures of uncertainty. Motivated by the LEHD, and other applications in federal statistics, we introduce the multivariate spatiotemporal mixed effects model (MSTM), which can be used to efficiently model high-dimensional multivariate spatio-temporal areal data sets. The proposed MSTM extends the notion of Moran's I basis functions to the multivariate spatio-temporal setting. This extension leads to several methodological contributions, including extremely effective dimension reduction, a dynamic linear model for multivariate spatio-temporal areal processes, and the reduction of a high-dimensional parameter space using a novel parameter model.

show abstract

Modeling Complex Phenotypes: Generalized Linear Models Using Spectrogram Predictors of Animal Communication Signals

Holan

Wikle

Sullivan-Beckers

et al. 2009

Biometrics

View full text Add to dashboard Cite

A major goal of evolutionary biology is to understand the dynamics of natural selection within populations. The strength and direction of selection can be described by regressing relative fitness measurements on organismal traits of ecological significance. However, many important evolutionary characteristics of organisms are complex, and have correspondingly complex relationships to fitness. Secondary sexual characteristics such as mating displays are prime examples of complex traits with important consequences for reproductive success. Typically, researchers atomize sexual traits such as mating signals into a set of measurements including pitch and duration, in order to include them in a statistical analysis. However, these researcher-defined measurements are unlikely to capture all of the relevant phenotypic variation, especially when the sources of selection are incompletely known. In order to accommodate this complexity we propose a Bayesian dimension-reduced spectrogram generalized linear model that directly incorporates representations of the entire phenotype (one-dimensional acoustic signal) into the model as a predictor while accounting for multiple sources of uncertainty. The first stage of dimension reduction is achieved by treating the spectrogram as an "image" and finding its corresponding empirical orthogonal functions. Subsequently, further dimension reduction is accomplished through model selection using stochastic search variable selection. Thus, the model we develop characterizes key aspects of the acoustic signal that influence sexual selection while alleviating the need to extract higher-level signal traits a priori. This facet of our approach is fundamental and has the potential to provide additional biological insight, as is illustrated in our analysis.

show abstract

Computationally Efficient Multivariate Spatio-Temporal Models for High-Dimensional Count-Valued Data (with Discussion)

Bradley¹,

Holan²,

Wikle³

2018

Bayesian Anal.

View full text Add to dashboard Cite

show abstract

Bayesian Hierarchical Models With Conjugate Full-Conditional Distributions for Dependent Data From the Natural Exponential Family

Bradley

Holan

Wikle

2019

Journal of the American Statistical Association

View full text Add to dashboard Cite

We introduce a Bayesian approach for analyzing (possibly) high-dimensional dependent data that are distributed according to a member from the natural exponential family of distributions. This problem requires extensive methodological advancements, as jointly modeling high-dimensional dependent data leads to the so-called "big n problem." The computational complexity of the "big n problem" is further exacerbated when allowing for non-Gaussian data models, as is the case here. Thus, we develop new computationally efficient distribution theory for this setting. In particular, we introduce the "conjugate multivariate distribution," which is motivated by the univariate distribution introduced in Diaconis and Ylvisaker (1979). Furthermore, we provide substantial theoretical and methodological development including: results regarding conditional distributions, an asymptotic relationship with the multivariate normal distribution, conjugate prior distributions, and full-conditional distributions for a Gibbs sampler. To demonstrate the wide-applicability of the proposed methodology, we provide two simulation studies and three applications based on an epidemiology dataset, a federal statistics dataset, and an environmental dataset, respectively.

show abstract

The soil health assessment protocol and evaluation applied to soil organic carbon

Nunes

Veum

Parker

et al. 2021

Soil Science Soc of Amer J

View full text Add to dashboard Cite

The concept of soil health has evolved over the past several decades, recognizing that dynamic soil property response to management and land use is highly dependent on sitespecific factors that must be considered when interpreting soil health measurements. Initially, the Soil Management Assessment Framework (SMAF) and Comprehensive Assessment of Soil Health (CASH) were developed and used globally for scoring soil health indicators. However, both SMAF and CASH frameworks were developed using a relatively small dataset and their interpretation curves were not validated at the nationwide scale. Expanding upon these concepts, we propose the Soil Health Assessment Protocol and Evaluation (SHAPE) tool. SHAPE was developed using 14,680 soil organic carbon (SOC) observations from across the United States and accounts for edaphic and climate factors at the continental scale. Data were compiled from the literature, the Cornell Soil Health Laboratory, and the Kellogg Soil Survey Laboratory. In this approach, scoring curves are Bayesian model-based estimates of the conditional cumulative distribution function (CDF) for defined soil peer groups reflecting five soil texture and five soil suborder classes adjusted for mean annual temperature and precipitation. Specifically, SHAPE produces scores between 0 and 1 (0 to 100%) for measured SOC values that reflect the quantile or position within the conditional This article is protected by copyright. All rights reserved. 4 CDF along with measures of uncertainty. Herein, we focus on development of the SHAPE scoring curve for SOC with four case studies. SHAPE is a flexible, quantitative tool that provides a regionally relevant interpretation of this key soil health indicator.

show abstract

Bayesian Spatial Change of Support for Count-Valued Survey Data With Application to the American Community Survey

Bradley¹,

Wikle²,

Holan³

2016

Journal of the American Statistical Association

View full text Add to dashboard Cite

We introduce Bayesian spatial change of support methodology for count-valued survey data with known survey variances. Our proposed methodology is motivated by the American Community Survey (ACS), an ongoing survey administered by the U.S. Census Bureau that provides timely information on several key demographic variables. Specifically, the ACS produces 1-year, 3-year, and 5-year "period-estimates," and corresponding margins of errors, for published demographic and socio-economic variables recorded over predefined geographies within the United States. Despite the availability of these predefined geographies it is often of interest to data users to specify customized user-defined spatial supports. In particular, it is useful to estimate demographic variables defined on "new" spatial supports in "real-time." This problem is known as spatial change of support (COS), which is typically performed under the assumption that the data follows a Gaussian distribution. However, count-valued survey data is naturally non-Gaussian and, hence, we consider modeling these data using a Poisson distribution. Additionally, survey-data are often accompanied by estimates of error, which we incorporate into our analysis. We interpret Poisson count-valued data in small areas as an aggregation of events from a spatial point process. This approach provides us with the flexibility necessary to allow ACS users to consider a variety of spatial supports in "real-time." We demonstrate the effectiveness of our approach through a simulated example as well as through an analysis using public-use ACS data.

show abstract

Regionalization of Multiscale Spatial Processes by Using a Criterion for Spatial Aggregation Error

Bradley

Wikle

Holan

2016

View full text Add to dashboard Cite

The modifiable areal unit problem and the ecological fallacy are known problems that occur when modeling multiscale spatial processes. We investigate how these forms of spatial aggregation error can guide a regionalization over a spatial domain of interest. By "regionalization" we mean a specification of geographies that define the spatial support for areal data. This topic has been studied vigorously by geographers, but has been given less attention by spatial statisticians. Thus, we propose a criterion for spatial aggregation error (CAGE), which we minimize to obtain an optimal regionalization. To define CAGE we draw a connection between spatial aggregation error and a new multiscale representation of the Karhunen-Loéve (K-L) expansion. This relationship between CAGE and the multiscale K-L expansion leads to illuminating theoretical developments including: connections between spatial aggregation error, squared prediction error, spatial variance, and a novel extension of Obled-Creutin eigenfunctions. The effectiveness of our approach is demonstrated through an analysis of two datasets, one using the American Community Survey and one related to environmental ocean winds.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Scott H. Holan

A National Survey of Assisted Living Facilities

Multivariate spatio-temporal models for high-dimensional areal data with application to Longitudinal Employer-Household Dynamics

Modeling Complex Phenotypes: Generalized Linear Models Using Spectrogram Predictors of Animal Communication Signals

Computationally Efficient Multivariate Spatio-Temporal Models for High-Dimensional Count-Valued Data (with Discussion)

Bayesian Hierarchical Models With Conjugate Full-Conditional Distributions for Dependent Data From the Natural Exponential Family

The soil health assessment protocol and evaluation applied to soil organic carbon

Bayesian Spatial Change of Support for Count-Valued Survey Data With Application to the American Community Survey

Regionalization of Multiscale Spatial Processes by Using a Criterion for Spatial Aggregation Error

Contact Info

Product

Resources

About