**Abstracts for the Symposium Speakers**

**Keynote Speaker: Emanuel Parzen, Texas A&M University**
**Title: United Applicable Statistics: Mid-Distribution, Mid-Quantile, Mid P Confidence Intervals Proportion p**
**Abstract:** We review the seminal influence of Ben Kedem on statistical time series analysis. We motivate our research on United Applicable Statistics ("analogies between analogies") approach to a learning framework for almost all of the Science of Statistics, which we distinguish from the Statistics of Science. We describe the exciting probability and statistical theory of mid-distributions, mid-quantiles, new way to calculate (for data with ties) sample quantiles and median (mid), asymptotic normality of mid-distributions of Binomial, Poisson, hypergeometric distributions. We advocate statistical inference by mid-PVALUE function of a parameter whose inverse (under a stochastic order condition) is defined to be confidence quantile (of a confidence distribution). We show mid-P frequentist confidence intervals for discrete data have endpoint function equal to confidence quantile, which is algorithmically analogous to Bayesian posterior quantile. One computes frequentist (without assuming prior)
but interprets Bayesian. We conclude with 0-1 data and quasi-exact (Beta distribution based) confidence quantiles of parameters p and log-odds(p). We claim quasi-identity of frequentist mid-P confidence intervals and Bayesian posterior credible intervals with uninformative Jeffreys prior. For parameters of standard probability models, calculating confidence quantiles yields Bayesian posterior quantiles for non-informative conjugate priors.

[top]

**Keynote Speaker: Peter Bloomfield, North Carolina State University**
**Title: Modeling the Variance of a Time Series**
**Abstract:**We review the development of models for time-varying variances from two perspectives: as a natural development in the time-domain approach, leading to GARCH models, and using a latent process, leading to Stochastic Volatility models. The former are statistically tractable by construction; the latter are appealing but typically intractable. We show that using tools from Bayesian methods we can find a stochastic volatility model with GARCH structure.

[top]

**Speaker: Victor De Oliveira, The University of Texas at San Antonio**
**Title: Optimal Predictive Inference in Log-Gaussian Random Fields**
**Abstract:** This talk reviews recent work on optimal predictive inference in log-Gaussian random fields. The two problems to be considered are: (a) prediction of process values at unmeasured locations and (b) prediction of process integrals over bounded regions. Optimal predictors, within certain classes, are given for problems (a) and (b) as well as comparisons with other commonly used predictors. Shortest prediction intervals, within certain classes, are also given for problem (a). Finally, a brief discussion is given about some difficulties in computing prediction intervals for problem (b).

[top]

**Speaker: Kostas Fokianos, University of Cyprus**
**Title: Linear and Loglinear Poisson Autoregression**
**Abstract:** The talk considers geometric ergodicity and likelihood based inference for linear and loglinear Poisson autoregressions. In the linear case the conditional mean is linked linearly to its past values as well as the observed values of the Poisson process. This also applies to the conditional variance, implying an interpretation as an integer-valued GARCH process. In a loglinear conditional Poisson model, the conditional mean is a loglinear function of its past values and a nonlinear function of past observations. Under geometric ergodicity the maximum likelihood estimators of the parameters are shown to be asymptotically Gaussian in the linear model. In addition we provide a consistent estimator of the asymptotic covariance, which is used in the simulations and the analysis of some transaction data. Our approach to verifying geometric ergodicity proceeds via Markov theory and irreducibility. Finding transparent conditions for proving ergodicity turns out to be a delicate problem in the original model formulation. This problem is circumvented by allowing a perturbation of the model. We show that as the perturbations can be chosen to be arbitrarily small, the differences between the perturbed and non-perturbed versions vanish as far as the asymptotic distribution of the parameter estimates is concerned.

[top]

**Speaker: Neal Jeffries, National Heart, Lung, and Blood Institute, NIH**
**Title: Odds Ratio Bias in Case-Control Studies Using Robust Genetic Models **
**Abstract:** Selection bias is a general and well-known term to describe bias that arises from an initial selection or condition that alters the characteristics of an estimator's properties. This paper explores the degree of selection bias arising from robust genetic procedures that first assess the genetic model and then estimate an odds ratio associated with that model. The resulting bias in the point estimate for the odds ratio is assessed for two types of robust methods and methods for correction are examined.

[top]

**Speaker: Ta-Hsin Li, IBM **
**Title: LaPlace Periodogram and Beyond**
**Abstract:** The ordinary periodogram and its smoothed variants are important tools for analyzing the serial dependence of time series data in the frequency domain. Although widely used in practice, the lack of robustness to outliers or heavy-tailed noise and nonlinear distortions limits their applicability. A new type of periodogram is obtained by replacing the least-squares criterion with an Your browser may not support display of this image.-norm criterion in general and the Your browser may not support display of this image.-norm in particular in the regression formulation of the ordinary periodogram. Through large-sample asymptotic analysis, a mathematical relationship is established between the new periodogram and the serial dependence of a random process, leading to the notion of zero-crossing spectra and fractional autocorrelation spectra. The new periodogram has the expected robustness against heavy-tailed noise and nonlinear distortions. Simulation results and real-data examples are provided to demonstrate the performance of the new periodogram for spectral analysis and signal detection.

[top]

**Speaker: Guanhua Lu**
**Title: Asymptotic Theories for Multiple-Sample Semiparametric Density Ratio Models**
**Abstract:** A multiple-sample semiparametric density ratio model can be constructed by multiplicative exponential distortions of the reference distribution. Distortion functions are assumed to be nonnegative and of a known finite-dimensional parametric form, and the reference distribution is left nonparametric. The combined data from all the samples are used in the large sample problem of estimating each distortion and the reference distribution. The large sample behavior for both the parameters and the unknown reference distribution are studied. The estimated reference distribution is proved to converge weakly to a zero-mean Gaussian process. The corresponding covariance structure is used to provide the confidence bands. A Kolmogorov-Smirnov type statistic is also studied for the goodness-of-fit test of the density ratio model.

[top]

**Speaker: Donald E. K. Martin, North Carolina State University **
**Title: Distribution of Statistics of Hidden State Sequences Through the Sum-Product Algorithm **
**Abstract:** We extend methodology for computing exact distributions of statistics of hidden state sequences to more general settings. Distributions are computed for undirected and directed graphical models that are modeled by factor graphs. The methods apply to graphs with sparseness of edges that allows exact computation of the partition function. In cases where the order of evaluating the statistic is a perfect elimination sequence associated with the sum-product algorithm, the distributions are obtained by including matrix operators in messages to sequentially update a vector that indicates the statistic value corresponding to sums of products of evaluated potential functions. For statistics that require a different order (such as sequential computation), we quantify the effect on computation of the order used relative to that of a perfect elimination sequence through the increased complexity of clustered factor nodes of a new factor graph corresponding to the marginalization procedure. Applications of this work include discrete hidden state sequences perturbed by noise and/or missing values, and state sequences that serve to classify observations. Examples are given to illustrate the methods.

[top]

**Speaker: Gerald North, Texas A&M University**
**Title: A Tale about Hockey Sticks**
**Abstract:** This is a story about an experience I had a few years ago when politicians, statisticians and climatologists slugged it out over some published work on inferences of the earth's temperature record over the last thousand years. A committee was formed by the National Academy of Sciences to sort out whether the authors had not done their stats correctly. Congressmen picked up on the controversy and attempted to exploit the controversy for whatever purposes. While Kedem is innocent in these proceedings, I will try to come up with a note or two about my interaction with him more than twenty years ago.

[top]

**Speaker: Harry Pavlopoulos, Athens University of Economics and Business**
**Title: Introducing a Fractional Integer-valued Autoregressive Model for Time Series of Counts**
**Abstract:** The "Threshold Method" (TM) suggests that an instantaneous spatial average of rain rate (SARR) or more generally of an intermittent non-negative random field, may be adequately predicted via the contemporary instantaneous fraction of areal τ-Coverage with values exceeding a given threshold level τ = 0. In particular, the predictor of SARR is a certain linear function of τ-Coverage, provided that the threshold level τ is "optimal" in a certain sense. That prediction scheme points to the possibility of modeling time series of SARR (or other intermittent processes) via constructing adequate models for time series of counts, representing the numerator {

*X(t)*} of the fraction estimating areal τ-Coverage, based on digitized values (of SARR) over a population of pixels of a fixed scale, for an optimal threshold level τ. This key observation has been recently addressed by Pavlopoulos and Karlis (Environmetrics 2008, 19, 369-393), motivating the conception of a new model introduced here for the first time. Namely, the integer valued auto-regressive model:

*X(t)* =

*p(t)* ο

*X(t-1)* + ε

*(t)*, where the innovations {ε

*(t)*} are an i.i.d. sequence of zero-inflated Poisson random variables, {

*p(t)* = exp(-

*w(t)*²)} is an auto-correlated sequence of (0,1)-valued random variables, with {

*w(t)*} being a fractional Gaussian noise sequence independent of the innovations. The operation "ο" denotes a suitable generalization of binomial thinning, which shall be referred to as randomized binomial thinning (RBT). The model is further specified by independence of both

*p(t)* and ε

*(t)* on the past history {

*X(s)*,

*s*<

*t*} . Under the assumption that a stationary solution exists for the model specified above, the mean, variance, index of dispersion and auto-correlation function of a stationary solution are calculated explicitly. Based on these calculations, inference for model parameters by the method of moments is discussed, revealing the need to address in depth the issue of stationarity or pertinent sufficient and necessary conditions.

[top]

**Speaker: Jing Qin, National Institute of Allergy and Infectious Diseases, NIH**
**Title: Statistical Methods for Analyzing Right-Censored Length-Biased Data under Cox Model **
**Abstract:** Length-biased time to event data are commonly encountered in applications ranging from epidemiological cohort studies or cancer prevention trials to studies of labor economy. A longstanding statistical problem is how to assess the association of risk factors with survival in the target population given the observed length-biased data. In this talk, we demonstrate how to estimate these effects under the semiparametric Cox proportional hazards model. The structure of the Cox model is changed under length-biased sampling in general. Although the existing partial likelihood approach for left-truncated data can be used to estimate covariate effects, it may not be efficient for analyzing length-biased data. We propose two estimating equation approaches for estimating the covariate coefficients under the Cox model. We use the modern stochastic process and martingale theory to develop the asymptotic properties of the estimates. We evaluate the empirical performance and efficiency of the two methods through extensive simulation studies. We use data from a dementia study to illustrate the proposed methodology, and demonstrate the computational algorithms for point estimates, which can be directly linked to the existing functions in S-PLUS or R.

[top]

**Speaker: Ritaja Sur, University of Maryland**
**Title: Classification of Eye Movement Data**
**Abstract:** In this work, we consider the eye gaze of human subjects as a response to the motion of other body parts. The objective of this study is to determine whether eye movements differ significantly for two different cases denoted as "watch" and "imitate". This study of discrimination between these two cases is important for development of artificial intelligence. We consider the eye gaze as a time series data. For the purpose of discrimination, both parametric and non-parametric distance metrics are used. In particular, we consider a metric based on the higher order crossing (HOC) sequences. We compare the performance of these different distance measures in cluster analysis. Both hierarchical and non-hierarchical cluster algorithm is used. Results will show the first-time application of HOC sequences to the eye gaze data.

[top]

**Speaker: James Troendle, National Institute of Child Health and Development, NIH**
**Title: Empirical Likelihood and the Nonparametric Behrens-Fisher Problem**
**Abstract:** Consider the two-sample problem where nothing whatsoever is assumed about the distribution of either sample. The nonparametric Behrens-Fisher hypothesis states that the probability an observation drawn at random from the first population exceeds an observation drawn at random from the second population plus half the probability of equality is ½. Can a likelihood ratio test be obtained for this problem? Owen, in his classic book on empirical likelihood, appears to give a theoretical solution. However, Owen's solution does not work well in practice. A straightforward Lagrange multiplier approach appears to lead to solving n1+n2+3 score equations in as many unknowns. The approach taken here effectively reduces the number of parameters in the score equations to one by using a recursive formula for the resulting parameters. The remaining single dimensional problem can be solved numerically. The power of the empirical likelihood ratio test (ELRT) is compared by simulation to that of a generalized Wilcoxon test. The ELRT is also compared to a robust test based on assuming a density ratio model. Finally, an extension to the two-sample right censored problem is considered, comparing to the logrank test.

[top]

**Speaker: Anastasia Voulgaraki, University of Maryland**
**Title: Estimation of Death Rates in U.S. States with Small Subpopulations**
**Abstract:** The National Center for Health Statistics (NCHS) uses observed mortality data to publish race-gender specific life tables for individual states decennially. At ages over 85 years, the reliability of death rates based on these data is compromised to some extent by age misreporting. The eight-parameter Heligman-Pollard (HP) parametric model is then used to smooth the data and obtain estimates/extrapolation of mortality rates for advanced ages. In States with small sub-populations the observed mortality rates are often zero, particularly among young ages. The presence of zero death rates makes the fitting of the HP model difficult and at times outright impossible. In addition, since death rates are reported on a log scale, zero mortality rates are problematic. To overcome observed zero death rates, appropriate probability models are used. Using these models, observed zero mortality rates are replaced by the corresponding expected values. This enables using logarithmic transformations, and the fitting of the Heligman-Pollard model to produce mortality estimates for ages 0-130 years.

[top]