Optimal seed deployment under climate change using spatial models: Application to loblolly pine in the Southeastern US
Farjat, Reich, Guinness, Whetten, McKeand, Isik (2017) JASA (bibtex)

Provenance tests are a common tool in forestry designed to identify superior genotypes for planting at specific locations. The trials are replicated experiments established with seed from parent trees collected from different regions and grown at several locations. In this work a Bayesian spatial approach is developed for modeling the expected relative performance of seed sources using climate variables as predictors associated with the origin of seed source and the planting site. The proposed modeling technique accounts for the spatial dependence in the data and introduces a separable Mat´ern covariance structure that provides a flexible means to estimate effects associated with the origin and planting site locations. The statistical model was used to develop a quantitative tool for seed deployment aimed to identify the location of superior performing seed sources that could be suitable for a specific planting site under a given climate scenario. Cross-validation results indicate that the proposed spatial models provide superior predictive ability compared to multiple linear regression methods in unobserved locations. The general trend of performance predictions based on future climate scenarios suggest an optimal assisted migration of loblolly pine seed sources from southern and warmer regions to northern and colder areas in the southern USA.

Integrating multiple data sources in species distribution modeling: A framework for data fusion
Pacifici, Reich, et al (2016+) Ecology (bibtex)

The last decade has seen a dramatic increase in the use of species distribution models (SDMs) to characterize patterns of species’ occurrence and abundance. Efforts to parameterize SDMs often create a tension between the quality and quantity of data available to fit models. Estimation methods that integrate both standardized and nonstandardized data types offer a potential solution to the trade-off between data quality and quantity. Recently several authors have developed approaches for jointly modeling two sources of data (one of high quality and one of lesser quality). We extend their work by allowing for explicit spatial autocorrelation in occurrence and detection error using a Multivariate Conditional Autoregressive (MVCAR) model and develop three models that share information in a less direct manner resulting in more robust performance when the auxiliary data is of lesser quality. We describe these three new approaches (‘Shared’, ‘Correlation’, ‘Covariates’) for combining data sources and show their use in a case study of the Brown-headed Nuthatch in the Southeastern U.S. and through simulations. All three of the approaches which used the second data source improved out-of-sample predictions relative to a single data source (‘Single’). When information in the second data source is of high quality, the Shared model performs the best, but the Correlation and Covariates model also perform well. When the information quality in the second data source is of lesser quality, the Correlation and Covariates model performed better suggesting they are robust alternatives when little is known about auxiliary data collected opportunistically or through citizen scientists. Methods that allow for both data types to be used will maximize the useful information available for estimating species distributions.

A space-time skew-t model for threshold exceedances
Morris SA, Reich BJ, Thibaud E, Cooley D (2016+) Biometrics (bibtex)

To assess the compliance of air quality regulations, the Environmental Protection Agency (EPA) must know if a site exceeds a pre-specified level. In the case of ozone, the level for compliance is fixed at 75 parts per billion, which is high, but not extreme at all locations. We present a new space-time model for threshold exceedances based on the skew-t process. Our method incorporates a random partition to permit long-distance asymptotic independence while allowing for sites that are near one another to be asymptotically dependent, and we incorporate thresholding to allow the tails of the data to speak for themselves. We also introduce a transformed AR(1) time-series to allow for temporal dependence. Finally, our model allows for high-dimensional Bayesian inference that is comparable in computation time to traditional geostatistical methods for large datasets. We apply our method to an ozone analysis for July 2005, and find that our model improves over both Gaussian and max-stable methods in terms of predicting exceedances of a high level.

Circulant embedding of approximate covariances for inference from Gaussian data on large lattices
Guinness J, Fuentes M (2016+). JCGS (bibtex)

Recently proposed computationally efficient Markov chain Monte Carlo and Monte Carlo Expectation-Maximization (EM) methods for estimating covariance parameters from lattice data rely on successive imputations of values on an embedding lattice that is at least two times larger in each dimension. These methods can be considered exact in some sense, but we demonstrate that using such a large number of imputed values leads to slowly converging Markov chains and EM algorithms. We propose instead the use of a discrete spectral approximation to allow for the implementation of these methods on smaller embedding lattices. While our methods are approximate, our examples indicate that the error introduced by this approximation is small compared to the Monte Carlo errors present in long Markov chains or many iterations of Monte Carlo EM algorithms. Our results are demonstrated in simulation studies, as well as in numerical studies that explore both increasing domain and fixed domain asymptotics. We compare the exact methods to our approximate methods on a large satellite dataset, and show that the approximate methods are also faster to compute, especially when the aliased spectral density is modeled directly.

A fused lasso approach to nonstationary spatial covariance estimation
Parker RJ, Reich BJ, Eidsvik J (2016+). JABES (bibtex)

Spatial data are increasing in size and complexity due to technological advances. For an analysis of a large and diverse spatial domain, simplifying assumptions such as stationarity are questionable and standard computational algorithms are inadequate. In this paper we propose a computationally efficient method to estimate a nonstationary covariance function. We partition the spatial domain into a fine grid of subregions and assign each subregion its own set of spatial covariance parameters. This introduces a large number of parameters, and to stabilize the procedure we impose a penalty to spatially smooth the estimates. By penalizing the absolute difference between parameters for adjacent subregions, the solution can be identical for adjacent subregions and thus the method identifies stationary subdomains. To apply the method to large datasets, we use a block composite likelihood which is natural in this setting because it also operates on a partition of the spatial domain. The method is applied to tropospheric ozone in the US, and we find that the spatial covariance on the west coast differs from the rest of the country.

A multiresolution approach to estimating value added by regional climate models
Parker RJ, Reich BJ, Sain SR (2016+). Journal of Climate (bibtex)

Climate models have emerged as an essential tool for studying the earth’s climate. Global models are computationally expensive, and so a relatively coarse spatial resolution must be used within the model. This hinders direct application for many impacts studies that require regional and local climate information. A regional model with boundary conditions taken from the global model achieves a finer spatial scale for local analysis. In this paper we pro- pose a new method for assessing the value added by these higher resolution models, and we demonstrate the method within the context of Regional Climate Models (RCMs) from the North American Regional Climate Change Assessment Program (NARCCAP) project. Our spectral approach using the discrete cosine transformation (DCT) is based on characterizing the joint relationship between observations, coarser scale models, and higher resolution models to identify how the finer scales add value over the coarser output. The joint relationship is computed by estimating the covariance of our data sources at different spatial scales with a Bayesian hierarchical model. Using this model we can then estimate the value added by each data source over the others. For the NARCCAP data, we find that the higher resolution models add value starting with low wavenumbers corresponding to features 550km apart (or 11 50km grid boxes per cycle) all the way down to higher wavenumbers at 150km apart (3 grid boxes per cycle).