Fitting Regression Models with 2-Way Interaction and Squared Terms

Here we make available a number of SAS macros by Hugh Crews to implement Fast FSR techniques for fitting quadratic models using forward selection in standard linear regression, logistic regression, and Cox proportional hazards regression. The comments at the top of each macro explain how to use the macro. Here are part of those comments:

 fsr_linear is a sas macro that                                               
                                                                       
 0. Is based on proc glmselect - if you don't have it, first go to         
        http://support.sas.com:80/rnd/app/da/glmselect.html                 
                                                                       
 1. Standardizes all predictors to have mean 0 and variance 1 and        
    renames them as x1-xp, numbered as they occur in the data set.       
                                                                         
 2. Fits main effects, interactions, and squared terms according to a    
    modified forward selection procedure that on average has False       
    Selection Rate (FSR) = gamma (default=0.05)                          
                                                                         
 Developed by Hugh Crews, August 2008.                                   
                                                                         
 Typical call (all linear and quadratic terms):                          
                                                                         
 %fsr_linear(dataset=diabetes,model=age|sex|bmi|bp|s1|s2|s3|s4|s5|s6 @q  
          gamma=0.05,y=y,method=5,terms=20,include=0,cbound=2);          
                                                                         
 method=1  Fast FSR with Strong Hierarchy                                
 method=2  Fast FSR with No Hierarchy                                    
 method=3  Fast FSR with Weak Hierarchy                                  
 method=4  Fast FSR with No Hierarchy and Iterated Adjustment            
 method=5  Fast FSR with No Hierarchy and Sequential Adjustment          
 method=6  Fast FSR with Weak Hierarchy and Sequential Adjustment        
 method=7  Fast FSR with Main Effects only                               
                                                                         
 The default is method=1, gamma=0.05.  Terms=x restricts the forward     
 sequence to x terms.  The default is the full sequence of terms or      
 (terms=full). Include=k forces the first k terms into the model. The    
 default is to include no terms or (include=0). Cbound=b bounds the      
 adjustment used in Methods 5 and 6 by b*(k_T-p+1)/(p+1). The default    
 is to enforce no limit on c.                                            
                                                                         
 The model statement is meant to be close to general sas usage           
 except for @q which tells the program to add squared terms to 
 the code immediately preceding it. 
 Examples:                                                               
model=age|sex|bmi|bp|s1|s2|s3|s4|s5|s6 @q   all 1st & 2nd order terms    
model=age--s6 @q                                 same as above           
model=age--s6                                    only linear             
model=age|sex|bmi|bp|s1|s2|s3|s4|s5|s6     linear and interactions only  
model=age sex  bmi--s6 @q                   plus include=2               
  includes age and sex, then selects from full quadratic in the others   
model=age sex  age--s6 @q                   plus include=2               
  same as above, but now age and sex interactions and age^2 are possible 
  redundancies like age appearing twice are no problem                    
We recommend method=1 that enforces the strong hierarchy principle where main effects must enter before interactions. A main benefit of this approach is that the models chosen are invariant to centering and rescaling. We center and rescale each variable before running forward selection in order to keep correlations between main effects and second order terms as low as possible.

If one wants to search for interactions without requiring main effects to enter first, there are three versions of a "no hierarchy" approach, but we recommend the method=5 version because it adjusts for large numbers of interactions and is computationally fast. Method=6 is a compromise between the method=1 strong hierarchy and the no hierarchy approaches. It requires only one main effect to enter before an interaction involving that effect enters. Finally, method=7 is a main effects only approach.

Note that we do not have any special way to handle categorical variables. However, we include a dummy creator macro to create dummy 0-1 variables for an arbitary number of categorical variables.

Linear Regression

The macro requires proc glmselect which should be downloaded and installed.

The SAS macro for forward selection with example call and output for the diabetes data. Summary of diabetes data runs.

Logistic Regression

The SAS macro for forward selection with example call and output for the lucency data. Summary of lucency data runs.

Our logistic regression macro cannot handle data in the events/trials format, but we provide an expansion macro to create a data set with one row for each 0-1 y. Example call and output of the original and expanded data sets.

The German credit example illustrates use of the dummy creator macro before using the fsr_logistic macro, example call and output.

Proportional Hazards (Cox) Regression

The SAS macro for forward selection with example call and output for the AIDS Clinical Trial Group Protocol 175 (ACTG 175) data. Summary of ACTG 175 runs.

Sample code and output for the Primary Biliary Cirrhosis (PBC) Data. Summary of all runs.

A version of the SAS macro for forward selection based on SAS version 9.2 (PHREG changed between versions 9.1 and 9.2).