Brian J. Reich
    Assistant Professor
    Department of Statistics
    North Carolina State University

Research on Spatial Statistics

Spatial methods are among the most important tools in environmental applications. I have worked to develop new methodolgy to accomodate the complex features often seen in modern applications, including huge data sets, non-normality, non-random sampling, and non-stationarity.

Selected papers

Reich, Chang, Foley (2014). A spectral method for spatial downscaling. Biometrics.

Complex computer models play a crucial role in air quality research. These models are used to evaluate potential regulatory impacts of emission control strategies and to estimate air quality in areas without monitoring data. For both of these purposes, it is important to calibrate model output with monitoring data to adjust for model biases and improve spatial prediction. In this paper, we propose a new spectral method to study and exploit complex relationships between model output and monitoring data. Spectral methods allow us to estimate the relationship between model output and monitoring data separately at different spatial scales, and to use model output for prediction only at the appropriate scales. The proposed method is computationally efficient and can be implemented using standard software. We apply the method to compare Community Multiscale Air Quality (CMAQ) model output with ozone measurements in the United States in July, 2005. We find that CMAQ captures large-scale spatial trends, but has low correlation with the monitoring data at small spatial scales.

Sun, Reich, Cai, Guindani, Schwartzman (2014). False Discovery Control in Large-Scale Spatial Multiple Testing. JRSS-B.

This article develops a unified theoretical and computational framework for false discovery control in multiple testing of spatial signals. We consider both point-wise and cluster-wise spatial analyses, and derive oracle procedures which optimally control the false discovery rate, false discovery exceedance and false cluster rate, respectively. A data-driven finite approximation strategy is developed to mimic the oracle procedures on a continuous spatial domain. Our multiple testing procedures are asymptotically valid and can be effectively implemented using Bayesian computational algorithms for analysis of large spatial data sets. Numerical results show that the proposed procedures lead to more accurate error control and better power performance than conventional methods. We demonstrate our methods for analyzing the time trends in tropospheric ozone in eastern US.

Reich, Bandyopadhyay, Bondell (2013). A nonparametric spatial model for periodontal data with non-random missingness. JASA.

Periodontal disease progression is often quantified by clinical attachment level (CAL) defined as the distance down a tooth's root that is detached from the surrounding bone. Measured at 6 locations per tooth throughout the mouth (excluding the molars), it gives rise to a dependent data set-up. These data are often reduced to a one-number summary, such as the whole mouth average or the number of observations greater than a threshold, to be used as the response in a regression to identify important covariates related to the current state of a subject's periodontal health. Rather than a simple one-number summary, we set forward to analyze all available CAL data for each subject, exploiting the presence of spatial dependence, non-stationarity, and non-normality. Also, many subjects have a considerable proportion of missing teeth which cannot be considered missing at random because periodontal disease is the leading cause of adult tooth loss. Under a Bayesian paradigm, we propose a nonparametric flexible spatial (joint) model of observed CAL and the location of missing tooth via kernel convolution methods, incorporating the aforementioned features of CAL data under a unified framework. Application of this methodology to a data set recording the periodontal health of an African-American population, as well as simulation studies reveal the gain in model fit and inference, and provides a new perspective into unraveling covariate-response relationships in presence of complexities posed by these data.

Eidsvik, Shaby, Reich, Wheeler, Niemi (2013). Estimation and prediction in spatial models with block composite likelihoods. JCGS.

A block composite likelihood is developed for estimation and prediction in large spatial datasets. The composite likelihood is constructed from the joint densities of pairs of adjacent spatial blocks. This allows large datasets to be split into many smaller datasets, each of which can be evaluated separately, and combined through a simple summation. Estimates for unknown parameters are obtained by maximizing the block composite likelihood function. In addition, a new method for optimal spatial prediction under the block composite likelihood is presented. Asymptotic variances for both parameter estimates and predictions are computed using Godambe sandwich matrices. The approach gives considerable improvements in computational efficiency, and the composite structure obviates the need to load entire datasets into memory at once, completely avoiding memory limitations imposed by massive datasets. Moreover, computing time can be reduced even further by distributing the operations using parallel computing. A simulation study shows that composite likelihood estimates and predictions, as well as their corresponding asymptotic confidence intervals, are competitive with those based on the full likelihood. The procedure is demonstrated on one dataset from the mining industry and one dataset of satellite retrievals. The real-data examples show that the block composite results tend to outperform two competitors; the predictive process model and fixed rank Kriging. Supplemental material for this article is available online.

Boehm, Reich, Bandyopadhyay (2013). Bridging conditional and marginal inference for spatially-referenced binary data. Biometrics.

Spatially-referenced binary data are common in epidemiology and public health. Owing to its elegant log-odds interpretation of the regression coefficients, a natural model for these data is logistic regression. To account for missing confounding variables that might exhibit a spatial pattern (say, socioeconomic, biological or environmental conditions), it is customary to include a Gaussian spatial random effect. Conditioned on the spatial random effect, the coefficients may be interpreted as log odds ratios. However, marginally over the random effects, the coefficients no longer preserve the log-odds interpretation, and the estimates are hard to interpret and generalize to other spatial regions. To resolve this issue, we propose a new spatial random effect distribution through a copula framework which ensures that the regression coefficients maintain the log-odds interpretation both conditional on and marginally over the spatial random effects. We present simulations to assess the robustness of our approach to various random effects, and apply it to an interesting dataset assessing periodontal health of Gullah-speaking African Americans. The proposed methodology is flexible enough to handle areal or geo-statistical datasets, and hierarchical models with multiple random intercepts.

Reich BJ, Eidsvik J, Guindani M, Nail AJ, Schmidt AM (2011). A class of covariate-dependent spatiotemporal covariance functions for the analysis of daily ozone concentration. AOAS.

In geostatistics, it is common to model spatially distributed phenomena through an underlying stationary and isotropic spatial process. However, these assumptions are often untenable in practice because of the influence of local effects in the correlation structure. Therefore, it has been of prolonged interest in the literature to provide flexible and effective ways to model nonstationarity in the spatial effects. Arguably, due to the local nature of the problem, we might envision that the correlation structure would be highly dependent on local characteristics of the domain of study, namely, the latitude, longitude and altitude of the observation sites, as well as other locally defined covariate information. In this work, we provide a flexible and computationally feasible way for allowing the correlation structure of the underlying processes to depend on local covariate information. We discuss the properties of the induced covariance functions and methods to assess its dependence on local covariate information. The proposed method is used to analyze daily ozone in the southeast United States.

Pati, Reich, Dunson (2010). Bayesian geostatistical modeling with informative sampling locations. Biometrika.

We consider geostatistical models that allow the locations at which data are collected to be informative about the outcomes. A Bayesian approach is proposed, which models the locations using a log Gaussian Cox process, while modeling the outcomes conditionally on the locations as Gaussian with a Gaussian process spatial random effect and adjustment for the location intensity process. We prove posterior propriety under an improper prior on the parameter controlling the degree of informative sampling, demonstrating that the data are informative. In addition, we show that the density of the locations and mean function of the outcome process can be estimated consistently under mild assumptions. The methods show significant evidence of informative sampling when applied to ozone data over the Eastern United States.

Reich, Bandyopadhyay (2010). A latent factor model for spatial data with informative missingness. AOAS.

A large amount of data is typically collected during a periodontal exam. Analyzing these data poses several challenges. Several types of measurements are taken at many locations throughout the mouth. These spatially-referenced data are a mix of binary and continuous responses, making joint modeling difficult. Also, most patients have missing teeth. Periodontal disease is a leading cause of tooth loss, so it is likely that the number and location of missing teeth informs about the patient's periodontal health. In this paper we develop a multivariate spatial framework for these data which jointly models the binary and continuous responses as a function of a single latent spatial process representing general periodontal health.We also use the latent spatial process to model the location of missing teeth. We show using simulated and real data that exploiting spatial associations and jointly modeling the responses and locations of missing teeth mitigates the problems presented by these data.

Hodges, Reich (2010). Adding Spatially-Correlated Errors Can Mess Up the Fixed Effect You Love. TAS.

Many statisticians have had the experience of fitting a linear model with uncorrelated errors, then adding a spatially-correlated error term (random effect) and finding that the estimates of the fixed-effect coefficients have changed substantially. We show that adding a spatially-correlated error term to a linear model is equivalent to adding a saturated collection of canonical regressors, the coefficients of which are shrunk toward zero, where the spatial map determines both the canonical regressors and the relative extent of the coefficients' shrinkage. Adding a spatially-correlated error term can also be seen as inflating the error variances associated with specific contrasts of the data, where the spatial map determines the contrasts and the extent of error-variance inflation. We show how to avoid this spatial confounding by restricting the spatial random effect to the orthogonal complement (residual space) of the fixed effects, which we call restricted spatial regression. We consider five proposed interpretations of spatial confounding and draw implications about what, if anything, one should do about it. In doing so, we debunk the common belief that adding a spatially-correlated random effect adjusts fixed-effect estimates for spatially-structured missing covariates.

Reich, Fuentes (2007). A multivariate semiparametric Bayesian spatial modeling framework for hurricane surface wind fields. AOAS.

Storm surge, the onshore rush of sea water caused by the high winds and low pressure associated with a hurricane, can compound the effects of inland flooding caused by rainfall, leading to loss of property and loss of life for residents of coastal areas. Numerical ocean models are essential for creating storm surge forecasts for coastal areas. These models are driven primarily by the surface wind forcings. Currently, the gridded wind fields used by ocean models are specified by deterministic formulas that are based on the central pressure and location of the storm center. While these equations incorporate important physical knowledge about the structure of hurricane surface wind fields, they cannot always capture the asymmetric and dynamic nature of a hurricane. A new Bayesian multivariate spatial statistical modeling framework is introduced combining data with physical knowledge about the wind fields to improve the estimation of the wind vectors. Many spatial models assume the data follow a Gaussian distribution. However, this may be overly-restrictive for wind fields data which often display erratic behavior, such as sudden changes in time or space. In this paper we develop a semiparametric multivariate spatial model for these data. Our model builds on the stick-breaking prior, which is frequently used in Bayesian modeling to capture uncertainty in the parametric form of an outcome. The stick-breaking prior is extended to the spatial setting by assigning each location a different, unknown distribution, and smoothing the distributions in space with a series of kernel functions. This semiparametric spatial model is shown to improve prediction compared to usual Bayesian Kriging methods for the wind field of Hurricane Ivan.

Reich, Hodges, Zadnik (2006). Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models. Biometrics.

Disease-mapping models for areal data often have fixed effects to measure the effect of spatially-varying covariates and random effects with a conditionally autoregressive (CAR) prior to account for spatial clustering. In such spatial regressions, the objective may be to estimate the fixed effects while accounting for the spatial correlation. But adding the CAR random effects can cause large changes in the posterior mean and variances of fixed effects compared to the non-spatial regression model. This paper explores the impact of adding spatial random effects on fixed-effect estimates and posterior variance. Diagnostics are proposed to measure posterior variance inflation from collinearity between the fixed effect covariates and the CAR random effects and to measure each region's influence on the change in the fixed effect's estimates from adding the CAR random effects. A new model that alleviates the collinearity between the fixed-effect covariates and the CAR random effects is developed and extensions of these methods to point-referenced data models are discussed.