Pollution Data

SAS and R code to access the data (once downloaded).

Brief Description

Data are from McDonald and Schwing (1973), "Instabilities of Regression Estimates Relating Air Pollution to Mortality," Technometrics, 15, 463-481. This data set of 15 independent variables (see list below) and a measure of mortality on 60 US metropolitan areas in 1959-1961 was used to illustrate ridge regression (the full X matrix has a huge condition number).

An interesting feature is that forward addition sequences are a bit different from backward elimination sequences. Here are the least square coefficients for 5-variable forward and backward sequence models (0 estimate means the variable was not selected) and the LASSO estimates chosen by 5-fold cross-validation. The forward selected model includes variables x1, x2, x6, x9, and x14, whereas the backward sequence replaces x1 and x14 by x12 and 13. The LASSO solution is essentially a shrunken version of the forward model with several additional small coefficients on x7 and x8.

            FS(5)         BE(5)         LASSO
int.      1016.433      1145.200       997.584
x1           1.488         0             1.192
x2          -1.623        -1.563        -0.897
x3           0             0             0
x4           0             0             0
x5           0             0             0
x6         -12.764       -19.370       -11.719
x7           0             0            -0.027
x8           0             0             0.002
x9           4.066         4.461         3.396
x10          0             0             0
x11          0             0             0
x12          0            -0.984         0
x13          0             1.992         0
x14          0.284         0             0.217
x15          0             0             0
Description of Variables

Y  Total Age Adjusted Mortality Rate
x1 Mean annual precipitation in inches
x2 Mean January temperature in degrees Fahrenheit
x3 Mean July temperature in degrees Fahrenheit
x4 Percent of 1960 SMSA population that is 65 years of age or over
x5 Population per household, 1960 SMSA
x6 Median school years completed for those over 25 in 1960 SMSA
x7 Percent of housing units that are found with facilities
x8 Population per square mile in urbanized area in 1960
x9 Percent of 1960 urbanized area population that is non-white
x10 Percent employment in white-collar occupations in 1960 urbanized area
x11 Percent of families with income under 3; 000 in 1960 urbanized area
x12 Relative population potential of hydrocarbons, HC
x13 Relative pollution potential of oxides of nitrogen, NOx
x14 Relative pollution potential of sulfur dioxide, SO2
x15 Percent relative humidity, annual average at 1 p.m.
A few descriptive statistics:
                                       The MEANS Procedure

          Variable     N            Mean         Std Dev         Minimum         Maximum
          ------------------------------------------------------------------------------
          x1          60      37.3666667       9.9846775      10.0000000      60.0000000
          x2          60      33.9833333      10.1688985      12.0000000      67.0000000
          x3          60      74.5833333       4.7631768      63.0000000      85.0000000
          x4          60       8.7983333       1.4645520       5.6000000      11.8000000
          x5          60       3.2631667       0.1352523       2.9200000       3.5300000
          x6          60      10.9733333       0.8452994       9.0000000      12.3000000
          x7          60      80.9133333       5.1413731      66.8000000      90.7000000
          x8          60         3876.05         1454.10         1441.00         9699.00
          x9          60      11.8700000       8.9211480       0.8000000      38.5000000
          x10         60      46.0816667       4.6130431      33.8000000      59.7000000
          x11         60      14.3733333       4.1600956       9.4000000      26.4000000
          x12         60      37.8500000      91.9776732       1.0000000     648.0000000
          x13         60      22.6500000      46.3332896       1.0000000     319.0000000
          x14         60      53.7666667      63.3904678       1.0000000     278.0000000
          x15         60      57.6666667       5.3699309      38.0000000      73.0000000
          y           60     940.3585000      62.2066852     790.7300000         1113.16
          ------------------------------------------------------------------------------
Correlations:
       x1    x2    x3    x4    x5    x6    x7    x8    x9   x10   x11   x12   x13   x14   x15     y
x1   1.00  0.09  0.50  0.10  0.26 -0.49 -0.49  0.00  0.41 -0.30  0.51 -0.53 -0.49 -0.11 -0.08  0.51
x2   0.09  1.00  0.35 -0.40 -0.21  0.12  0.01 -0.10  0.45  0.24  0.57  0.35  0.32 -0.11  0.07 -0.03
x3   0.50  0.35  1.00 -0.43  0.26 -0.24 -0.42 -0.06  0.58 -0.02  0.62 -0.36 -0.34 -0.10 -0.45  0.28
x4   0.10 -0.40 -0.43  1.00 -0.51 -0.14  0.07  0.16 -0.64 -0.12 -0.31 -0.02  0.00  0.02  0.11 -0.17
x5   0.26 -0.21  0.26 -0.51  1.00 -0.40 -0.41 -0.18  0.42 -0.43  0.26 -0.39 -0.36  0.00 -0.14  0.36
x6  -0.49  0.12 -0.24 -0.14 -0.40  1.00  0.55 -0.24 -0.21  0.70 -0.40  0.29  0.22 -0.23  0.18 -0.51
x7  -0.49  0.01 -0.42  0.07 -0.41  0.55  1.00  0.18 -0.41  0.34 -0.68  0.39  0.35  0.12  0.12 -0.43
x8   0.00 -0.10 -0.06  0.16 -0.18 -0.24  0.18  1.00 -0.01 -0.03 -0.16  0.12  0.17  0.43 -0.12  0.27
x9   0.41  0.45  0.58 -0.64  0.42 -0.21 -0.41 -0.01  1.00  0.00  0.70 -0.03  0.02  0.16 -0.12  0.64
x10 -0.30  0.24 -0.02 -0.12 -0.43  0.70  0.34 -0.03  0.00  1.00 -0.19  0.20  0.16 -0.07  0.06 -0.28
x11  0.51  0.57  0.62 -0.31  0.26 -0.40 -0.68 -0.16  0.70 -0.19  1.00 -0.13 -0.10 -0.10 -0.15  0.41
x12 -0.53  0.35 -0.36 -0.02 -0.39  0.29  0.39  0.12 -0.03  0.20 -0.13  1.00  0.98  0.28 -0.02 -0.18
x13 -0.49  0.32 -0.34  0.00 -0.36  0.22  0.35  0.17  0.02  0.16 -0.10  0.98  1.00  0.41 -0.05 -0.08
x14 -0.11 -0.11 -0.10  0.02  0.00 -0.23  0.12  0.43  0.16 -0.07 -0.10  0.28  0.41  1.00 -0.10  0.43
x15 -0.08  0.07 -0.45  0.11 -0.14  0.18  0.12 -0.12 -0.12  0.06 -0.15 -0.02 -0.05 -0.10  1.00 -0.09
y    0.51 -0.03  0.28 -0.17  0.36 -0.51 -0.43  0.27  0.64 -0.28  0.41 -0.18 -0.08  0.43 -0.09  1.00