Residual analysis of the PGF golf data.

The PGA golf data are here. We have 2007 data from 195 professional golfers downloaded from espn.com. We would like to determine which golf skills are most highly associated with success. The outcome variable is prize money won (earnings). The predictors are: average yards per drive (YARDS_DRIVE), driving accuracy (DRIVING_ACC), greens in regulations (GREENS_IN_REG), putting average (PUTTING_AVG), and save percentage (SAVE_PCT). For more detail, visit espn.com.

Make residual plots: Click "plots" and than hit the "residual" tab. Check the boxes "Plot residuals vs variables", "stanardized", "Predicted Y", and "independent variables", "normal quantile-quantile plot". Then click "OK".

Inspect influence and outlier statistics: Click "plots" and than hit the "influence" tab. Check the boxes "Plot influence statistics vs variables", "DFFITS", "LEVERAGE", and "Predicted Y", and "independent variables". Then click "OK".

SAS code to produce residual diagnostics

proc reg;
model earnings = YARDS_DRIVE DRIVING_ACC GREENS_IN_REG
PUTTING_AVG SAVE_PCT / r p influence ;
run;