ST512 - Spring, 2005 Activity using Multiple Linear Regression to look at cigarette data ------------------------------------------------------------------- This is a study of various cigarettes in which Carbon mOnoxide is the response and is to be modeled in terms of WeighT , TAR, and NICotine. 1. Run PROC CORR; VAR CO; WITH TAR NIC WT; and plot CO against each of the 3 variables. Do this in SAS/GRAPH using the following as one of them: PROC GPLOT; PLOT CO*TAR; SYMBOL1 V=dot I=R C=red; (V is the symbol, I is interpolation I=R is regression line) Write the correlation on each plot with a pencil. Which variable is most highly related to CO? Also try this 3-D plot PROC G3D; SCATTER TAR*NIC=CO; [A comment: When the explanatory variables are highly correlated, we refer to this as "multicollinearity" With the fitted surface being a plane, it is as though the 3D plot shows the legs for a table. Notice how unstable the fitted surface would be here.] 2. Run a multiple regression of CO on the three explanatory variables. Which of the three is most significant according to the t statistics? Compute a 95% confidence interval for the coefficient of TAR. Give a verbal interpretation of this coefficient. 3. Run a simple linear regression of CO on TAR. Compare this to the full model of part 2 as follows: * (A) Compute the difference in model sums of squares. This is the sum of squares for NIC and WT adjusted for TAR. * (B) Compute the difference in model degrees of freedom. * (C) Compute the mean square associated with (NIC,WT) by dividing your A answer by your B answer. * (D) Now compute F as the mean square from part C divided by the error mean square from the full model. * (E) What is this F testing and is it significant? 4. For your simple model in part 3, compute a 95% confidence interval for the mean CO of all possible cigarettes having TAR content 10. Also compute a 95% prediction interval for the CO of an individual future cigarette with TAR content 10. 5. Run the multiple regression as follows: PROC REG; MODEL CO = TAR NIC WT/ss1; nic_wt: test NIC=0, WT=0; Explain how the answer to 3A can be computed from the list of Type I sums of squares. What did the TEST statement do for you? Try a different order, say PROC REG; MODEL CO = NIC WT TAR/ss1; Do the Type I SS change? How about the t-tests? Model F test? Fitted parameters? 6. Add to the dataset an observation with CO missing (.) and TAR = 10. WT and NIC can be anything you like. Now run PROC REG; MODEL CO = TAR/P CLM; and compare to part 4. Repeat, changing CLM to CLI. (For help, see "cigarettes.sas")