Analysis of the LA pollution data.
The Los Angeles pollution data was downloaded from the
NMMAPS database. The data set contains daily values (from 1/1/1987 to 12/31/2000) of the following variables:
- Date: MM/DD/YYYY
- Day of week: 1=Sunday, 2=Monday, ..., 7=Saturday
- Season: 1=Winter, 2=Spring, 3=Summer, 4=Fall
- Deaths: Number of cardiovascular deaths in LA
- Temp: Daily average temperature
- RelHumid: Daily average relative humidity
- O2: Daily average ozone
- SO2: Daily average sulfur dioxide
- NO2: Daily average nitrogen dioxide
- CO: Daily average carbon monoxide
The pollution variables (O2, SO2, NO2, and CO) have been centered to have mean zero.
There are several days with missing data which you may discard.
Our objective is to determine if daily ambient air pollution levels are associated with cardiovascular
mortality. To do this, we must first account for confounders such as day of the week, seasonal
trends, temperature, and humidity. We will account for these confounders using a generalized additive model (GAM).
First we account for long-term trends (e.g., a flu outbreak) using a GAM model with
a single predictor, date, and 100 degrees of freedom.
proc gam data=la;
model Deaths = spline(Date,df=100);
output out=predictedval all;
run;
title "Raw data vs Predicted values";
proc gplot data=predictedval;
plot Deaths*Date;
plot2 P_Deaths*Date;
run;
This model includes dummy variables for season and day of the week
(handled the same way as in proc reg) as well as nonparametric curves for
the long-term trend and temperature.
proc gam data=la;
class Day_of_week season;
model Deaths = param(Day_of_week season) spline(Date,df=100) spline(Temp,df=10);
output out=predictedval all;
run;
* Add the linear component (x*beta) and the nonparametric component (s(x)) to
get the entire nonparametric curve (x*beta+s(x));
data predictedval;
set predictedval;
fitted_Date = P_Date-0.00379*Date;
fitted_Temp = P_Temp+0.19566*Temp;
run;
title "Raw data vs Predicted values";
proc gplot data=predictedval;
plot Deaths*Date;
plot2 P_Deaths*Date;
run;
title "Estimated smooth function of Temperature";
proc gplot data=predictedval;
plot fitted_Temp*Temp;
run;
title "Estimated smooth function of Date";
proc gplot data=predictedval;
plot fitted_Date*Date;
run;