Brian Reich

Selected Papers on Spatial Statistics

Spatial methods are among the most important tools in environmental applications. I have worked to develop new methodolgy to accomodate the complex features often seen in modern applications, including huge data sets, non-normality, non-random sampling, and non-stationarity.

Sun W, Reich BJ, Cai T, Guindani M, Schwartzman A (2015). False Discovery Control in Large-Scale Spatial Multiple Testing. JRSS-B.

This article develops a unified theoretical and computational framework for false discovery control in multiple testing of spatial signals. We consider both point-wise and cluster-wise spatial analyses, and derive oracle procedures which optimally control the false discovery rate, false discovery exceedance and false cluster rate, respectively. A data-driven finite approximation strategy is developed to mimic the oracle procedures on a continuous spatial domain. Our multiple testing procedures are asymptotically valid and can be effectively implemented using Bayesian computational algorithms for analysis of large spatial data sets. Numerical results show that the proposed procedures lead to more accurate error control and better power performance than conventional methods. We demonstrate our methods for analyzing the time trends in tropospheric ozone in eastern US.

Reich BJ, Chang HH, Foley KM (2014). A spectral method for spatial downscaling. Biometrics.

Complex computer models play a crucial role in air quality research. These models are used to evaluate potential regulatory impacts of emission control strategies and to estimate air quality in areas without monitoring data. For both of these purposes, it is important to calibrate model output with monitoring data to adjust for model biases and improve spatial prediction. In this paper, we propose a new spectral method to study and exploit complex relationships between model output and monitoring data. Spectral methods allow us to estimate the relationship between model output and monitoring data separately at different spatial scales, and to use model output for prediction only at the appropriate scales. The proposed method is computationally efficient and can be implemented using standard software. We apply the method to compare Community Multiscale Air Quality (CMAQ) model output with ozone measurements in the United States in July, 2005. We find that CMAQ captures large-scale spatial trends, but has low correlation with the monitoring data at small spatial scales.

  • Reich BJ, Bandyopadhyay D, Bondell HD (2013). A nonparametric spatial model for periodontal data with non-random missingness. JASA.

    Periodontal disease progression is often quantified by clinical attachment level (CAL) defined as the distance down a tooth's root that is detached from the surrounding bone. Measured at 6 locations per tooth throughout the mouth (excluding the molars), it gives rise to a dependent data set-up. These data are often reduced to a one-number summary, such as the whole mouth average or the number of observations greater than a threshold, to be used as the response in a regression to identify important covariates related to the current state of a subject's periodontal health. Rather than a simple one-number summary, we set forward to analyze all available CAL data for each subject, exploiting the presence of spatial dependence, non-stationarity, and non-normality. Also, many subjects have a considerable proportion of missing teeth which cannot be considered missing at random because periodontal disease is the leading cause of adult tooth loss. Under a Bayesian paradigm, we propose a nonparametric flexible spatial (joint) model of observed CAL and the location of missing tooth via kernel convolution methods, incorporating the aforementioned features of CAL data under a unified framework. Application of this methodology to a data set recording the periodontal health of an African-American population, as well as simulation studies reveal the gain in model fit and inference, and provides a new perspective into unraveling covariate-response relationships in presence of complexities posed by these data.