R program to compute the forward addition sequence and example use.
R program to compute the backward elimination sequence and example use.
Plot (jpeg version) comparing adjusted Cp values for FAS, BES, and the 5 best from all subsets. The plot mark "b" denotes a BES model and "f" denotes a "FAS model, both jittered from the true model size. Circles are the best models in terms of R^2 (or mse) for each model size. This plot illustrates that the best fitting models from the FAS and/or BES sequences are often close to the best when looking over all subsets. Gilmour's (1996) adjustment of the Cp value takes into account that the full model mse and each model mse are computed from the same data.
The R program backward sequence output is a bit different from that found in SAS Proc Reg or Proc Glmselect (see below). In particular, the forward and backward R programs give similar looking output, going down for increasing size models. For the forward sequence, forward selection (α), denoted FS(α), chooses the largest model such that pvmax < α, pvmax is the downward maximum of p-values to enter. For example, suppose that the sequence of first p-values is .01, .06, .04, then the first three values of the pvmax sequence are .01, .06, .06. Thus, if alpha-to-enter is .05, FS(.05) chooses a model with 1 variable. The backward elimination sequence from regsubsets is displayed similarly except that we add a second column of minimum p-values from the bottom (pvmin), and these minimum values are used in the same way as FS(α), that is, backward selection (α), denoted BS(α), chooses the largest model such that pvmin < α. An interesting point is that if pvmax has three values in a row that are the same, then the model coresponding to the middle of the sequence cannot be selected by FS(α) for any value of α. For example, in the NCAA data, FS(α) cannot select a 4 variable or a 9 variable model. Similarly, BS(α) cannot select a 4 variable or an 11 variable model.
Next we illustrate standard SAS software for forward and backward sequences. As mentioned above, the display for the backward sequence is in the reverse order of the R program output. Proc Glmselect is a new procedure that must be downloaded separately. It has many more features for variable selection than Proc Reg.
SAS Proc Reg forward addition sequence illustration.
SAS Proc Reg backward elimination sequence illustration.
SAS Proc Reg best subsets illustration.
SAS Proc Glmselect forward addition sequence illustration.
SAS Proc Glmselect backward elimination sequence illustration.