Remember that from within MATLAB you can view the most important M-Lab functions by typing
help mlab
Or you might want to have a netscape window open to the appendix listing of the functions. Only a few new functions are introduced in this lab.
Before giving details, it is helpful to introduce some basic terms: factor, level, and design. An experiment is usually carried out to study one or more variables. These variables are called the factors in the experiment. Each factor may be investigated at several different settings. These settings are called the levels of the factor. An experiment consists of a number of separate ``runs'' or measurements. The manner in which levels of all the factors are changed from one run to the next is the experimental design. There are two more terms that are often used to describe designs. Replications are observations made under identical experimental conditions. If every level of every factor appears an equal number of times, the design is said to be balanced.
In this M-lab we will use a subset of the actuator experiment named jet as an illustration. See M-Lab 2 or type viewdata(actuator) to see the full data set (you can learn more about the function viewdata by typing help viewdata).
viewdata(jet)
Obs act press force order 1 A2 30psi 0.0992 11 2 A1 30psi 0.0708 5 3 A2 30psi 0.0754 12 4 A1 30psi 0.0690 8 5 A2 100psi 0.4875 3 6 A1 100psi 0.4428 2 7 A2 100psi 0.5039 9 8 A1 100psi 0.3700 6 9 A2 30psi 0.0830 7 10 A1 30psi 0.0581 15 11 A2 30psi 0.0677 1 12 A1 30psi 0.0498 4 13 A2 100psi 0.4864 10 14 A1 100psi 0.3908 16 15 A2 100psi 0.4867 14 16 A1 100psi 0.3726 13
You might recall that actuators are small jets using compressed air that are designed for positioning and stabilizing a space platform. For this experiment the factors are 1) act, the particular actuator and 2) press, the amount of air pressure supplied to the actuator. There were two actuators in the experiment, and so this first factor has two levels: using the first actuator (A1) or the second (A2). The second factor, pressure, also has two levels: 30 psi and 100 psi. By looking at the data, it is clear that the experiment consists of 16 ``runs,'' and the design is to consider all different combinations of the levels for the two factors with 4 replications of each of the combinations. Also note that every level appears exactly 8 times in the design, so it is balanced. This is a factorial design because all different combinations of the factors levels have been included.
This experimental design has the desirable feature that both factors are varied simultaneously. At first this may seem to just scramble up the variables and make the results useless. However, this feature is actually very beneficial and gives information about how the two factors interact with each other. With the correct statistical approach the results will be easy to interpret.
For the jet data one would like to estimate the effect on the force as the pressure supply is changed from 30 psi to 100 psi. One way to do this is to find the statistics for all the measurements taken at the same level of pressure.
stats(jet.force,jet.press)
30psi 100psi
N 8.000000 8.000000
Mean 0.071625 0.442590
Std. Dev. 0.015053 0.056683
Q1 0.060500 0.377150
Median 0.069900 0.464600
Q3 0.081100 0.487300
Min 0.049800 0.370000
Max 0.099200 0.503900
Range 0.049400 0.133900
From this output we see that the mean difference between these two settings is 0.4426 - 0.0716 = 0.3710. We could also estimate the effect due to the difference between the two actuators by computing the means for the two levels of the act factor:
stats(jet.force,jet.act)
N 8.00000 8.000000 Mean 0.28623 0.227990 Std. Dev. 0.21930 0.179020 Q1 0.07730 0.060825 Median 0.29280 0.220400 Q3 0.48730 0.386250 Min 0.06770 0.049800 Max 0.50390 0.442800 Range 0.43620 0.393000yielding 0.228 - 0.286 = -.058.
We should also look at the set of means from the four groups obtained at the different combinations of factors. To make this easy we have written a function means that will print out the above means as well as the two-way table of means:
means(jet.force,jet.press,jet.act)
Means of Y variable by X variable
Source N Mean
30psi 8 0.071625
100psi 8 0.442590
Means of Y variable by X variable
Source N Mean
A2 8 0.28623
A1 8 0.22799
Table of means of Y variable by X variables x1 and x2
x1
30psi 100psi
x2 A2 0.081325 0.49113
A1 0.061925 0.39405
The means
function can handle as many factors as needed
(e.g.,
means(y,x1,x2,x3,x4)) and will
print out the mean of the first variable (the response variable)
broken down by the values of
each of the factors as well as by all combinations of two factors.
The means in the first part of the output are often called marginal means, and
the means in the second part are often called cell means.
To analyze the cell means, it helps to add a 3rd column of mean differences (and note that we have rounded to two decimals):
Pressure
100psi-30psi
30psi 100psi diff.
----- ------ -----
A2 .08 .49 .41
Actuator
A1 .06 .39 .33
--- ---
A2-A1 diff. .02 .10
Some conclusions: 1) Looking at the row and column differences of means, we can see that press is a much more important variable than act because .33 and .41, the mean changes in force due to changing pressure levels, are large compared to .02 and .10, the mean changes in force due to changing actuators.
2) The change in mean force is not uniform: the row difference column shows that a change in pressure from 30 to 100 psi for actuator 1 (A1) is .33 compared to a change of .41 for actuator 2 (A2). This nonuniformity is called an interaction between the factors act and press.
In general experiments without interactions are easier to analyze
and allow a much simpler description of how the factors
affect the response variable. (Note, however, that an interaction
is an interesting result in many experiments and may lead to
optimal settings of variables that may not have been found otherwise.)
In the absence of interaction we can express
the dependence of Y on factors
and
with two levels each as given in the following table of population means:
| X1 | |||||
| Level 2 | |||||
| Level 1 | Level 2 | - Level 1 | |||
| X2 | Level 1 | µ | µ + A | A | |
| Level 2 | µ + B | µ + A + B | A | ||
| Level 2 - Level 1 | B | B | |||
This is called an additive model because the affect of changing a factor level is to add a fixed amount (possibly negative) to the mean of Y regardless of the values of the other factors. The key point is that the difference in means for the first factor is A for both rows (levels of the second factor). Similarly, the difference in means for the second factor is B for both levels of the first factor. If these differences were not the same, then an interaction would be present. (Remember that the sample estimates of the population means would rarely exhibit the perfect additivity of the above population means because of random variation.)
As a graphical companion to these summary statistics, one could make up a panel of boxplots where the data are divided by the different factors. However, an alternative plot can be more informative for revealing information about interactions between plots. The function mplot(y,x1,x2) will plot the means of y against the levels of x1 and labeled with the levels of x2. The following plots are given by
mplot(jet.force,jet.act,jet.press)
In this type plot, parallel lines are evidence of no interaction (also called additivity), whereas nonparallel lines suggest an interaction between the factors. Thus, the plot is here suggestive of an interaction similar to the row difference column of the associated means table. Also, horizontal lines are evidence that the factor plotted on the x axis is unimportant. If the lines are close to each other, then the factor represented by different lines is unimportant.
To further iluustrate the use of mplot, make the following plots with the actuator data set:
z=actuator
mplot(z.force,z.act,z.press,z.nozzle)
The previous dicussion was intended to be simple and to emphasize that the analysis of factorial experiments is mainly about comparing mean differences between factor levels. The factors with the largest differences are the most important, and nonuniformity in the differences is called an interaction.
For experiments with more than two levels and/or more than two factors, it is helpful to use more systematic approaches to investigate the mean differences. In this section we introduce "fitted effects" and a function mfit to easily calculate them.
A fitted main effect is just a marginal factor level mean with the overall mean subtracted. For example, from the first part of the means output of the jet data set
Means of Y variable by X variable Source N Mean 30psi 8 0.071625 100psi 8 0.442590 Means of Y variable by X variable Source N Mean A2 8 0.28623 A1 8 0.22799the overall mean is just the average of the marginal means
ybar=(0.071625 + 0.442590)/2 = (0.28623 + 0.22799)/2 = 0.25711
The fitted main effects are then
Fitted
Source N Mean Main Effects
30psi 8 0.071625 - 0.25711 = -0.185
100psi 8 0.442590 - 0.25711 = 0.185
Source N Mean
A2 8 0.28623 - 0.25711 = 0.029
A1 8 0.22799 - 0.25711 = -0.029
Notice that we have rounded to three decimals for ease of viewing.
(Only two decimals would be better, but we need three for calculations below.)
The fitted main effects show us immediately that pressure is
more important than actuator because 0.185 is larger than .029.
With only two levels, the fitted main effects for a factor are
always equal in magnitude and of opposite sign.
If there is no interaction between two factors, the factors are additive, and the predicted cell means would just be the overall mean plus the fitted main effects. For the jet data we find
Predicted Cell Means under the Additive Model
Pressure
30psi 100psi
----- ------
A2 0.257+0.029-0.185=0.10 0.257+0.029+0.185=0.47
Actuator
A1 0.257-0.029-0.185=0.04 0.257-0.029+0.185=0.41
Let's compare these predicted means to the actual cell means
Predicted Means Actual Cell
Additive Model Means
Pressure Pressure
30psi 100psi 30psi 100psi
----- ------ ----- ------
A2 .10 .47 .08 .49
Actuator
A1 .04 .41 .06 .39
If the two sets of numbers are very close, then there is little evidence
of an interaction.
The fitted interaction effects are just the difference between the
actual cell means and these predicted means
Fitted Interaction Effects
Pressure
30psi 100psi
----- ------
A2 -.02 .02
Actuator
A1 .02 -.02
These fitted interaction effects seem fairly small compared to the
fitted main effect .19 for pressure.
It's useful to go throught these calculations once by hand, but after that it get's tedious. Thus we have made a function mfit with the same form as means but that gives fitted effects instead of means.
>> mfit(jet.force,jet.act,jet.press)
Intercept
0.25711
Fitted Main Effect of Y variable , y, by X variable, x1
Source N Main Effect
A2 8 0.029119
A1 8 -0.029119
Fitted Main Effect of Y variable , y, by X variable, x2
Source N Main Effect
30psi 8 -0.18548
100psi 8 0.18548
Table of 2-way x1 by x2 Interaction Effects
x1
A2 A1
x2 30psi -0.019419 0.019419
100psi 0.019419 -0.019419
In these last two sections we have given descriptive statistics
for analyzing experiments with factorial treatment structures.
In the next section we give a more
computationally complicated procedure which is used regularly by
experimenters to decide what factors are important and whether interaction
is present.
The acronym ANOVA stands for ANalysis Of VAriance. This term is somewhat misleading because our goal is to compare means. However, it turns out that in the process of deciding whether two means are different, we end up comparing measures of variability (variances). The procedure is a bit complicated, but remember that MATLAB will do all the routine calculations for you. The most important part is that you know how to interpret the results.
An ANOVA table is obtained by using the function, lm which makes the computations. Here are the steps to get the ANOVA table for the data set jet with a model which includes the factors act and press and their interaction, which is denoted in the formula by act*press. First, we type
lm(jet)
Note that the matlab prompt will change to
lm>>
(To get out of lm>> at any time, just hit the return key.) We use the function class to define the factors
class act press
Then we define the model to be fit using the function model
model force = act+ press+act*press
where the variable force is the response. We obtain the following output
Analysis of Variance Table Source df SS MS F p-val act 1 0.0135660 0.01356600 36.3596 5.9384e-05 press 1 0.5504500 0.55045000 1475.2790 6.2506e-14 act*press 1 0.0060334 0.00603340 16.1702 0.00169570 Error 12 0.0044774 0.00037312 R-square 0.99221
The Analysis of Variance Table organizes the results
for the two factors and their interaction in terms of
df (degrees of freedom), SS (sums of squares),
MS (mean squares), F (F values), and
p-val (p-values).
Here we will give a very brief description of a few entries from this
table.
The MS results for
act and press
are like estimated variances among the means of the response
variable force as the levels of act and press
are changed.
Thus, they measure the variation in the means of the
response variable force for changes in the factor levels.
They are actually a weighted sum of the square of the fitted effects.
For example, recall the fitted main effects -.185 and .185 for pressure.
The .55 for MS above for press comes from
8(-.185)2+8(.185)2=.55. The weight comes from the
number of data points that went into the relevant means, 8 in this case.
The
MS on the Error line is another estimated
variance for comparison purposes. The F values
are just the ratio of the MS for the factors to the MS
of the error. For example, in the first row
F=36.3596=0.01356600/0.00037312.
When a factor is not important, then the MS in the numerator
and denominator of the F should be of similar size. Thus,
if the F value is small (near 1), then we know that changing
the levels of a factor has little effect on the response variable.
But how large does the F have to be for us to be sure that
a factor is important?
The last column of this table, p-val, gives a
quick way to decide if the factors (rows) are statistically
significant. The elements in this column are usually called p-values. If
the p-value is less than .05, then we typically declare that the effect of
the factor associated with that row can be confidently declared to be not
equal to zero. If it is greater than .05, then there is not strong evidence
from the experiment that the particular factor or interaction has an
effect. Since all three p-values in the above table are small compared to
.05, we can be confident that both factors have a nonzero effect as well as
an interaction.
One confusing point about p-values is that they can be small if the sample size is large even when the true effect of the factor is quite small. In order to understand this important distinction between statistically signicant results and practically significant results, we will need to study the theory of statistical hypothesis testing. For now, just realize that practically significant effects can usually be seen by looking at the fitted effects. The p-values from the ANOVA table tell us whether the results we observe are likely to have occurred purely by random chance rather than by true differences in means. Thus the ANOVA table says that there is an detectable interaction betwen actuator type and pressure and a main effect due to actuator type, but the fitted effects from the last section suggest that the interaction and actuator type effects are not important compared to the effect of pressure.
Nevertheless, the ANOVA method of comparing means can is a simple and quick way to identify variables that are important in their effect on a response variable. Let us go back to the full actuator data set and compute an ANOVA table for all four factor variables and their interactions with press.
lm(actuator)
class act press nozzle line
model force = act+press+nozzle+line+act*press+press*nozzle+press*line
Source df SS MS F p-val act 1 0.01357600 0.01357600 43.012500 0.00017691 press 1 0.55048000 0.55048000 1744.058100 1.1907e-10 nozzle 1 0.00095371 0.00095371 3.021600 0.12036000 line 1 0.00095220 0.00095220 3.016800 0.12061000 act*press 1 0.00602970 0.00602970 19.103700 0.00237800 press*nozzle 1 8.8879e-06 8.8879e-06 0.028159 0.87090000 press*line 1 3.926e-05 3.926e-05 0.124380 0.73344000 Error 8 0.00252500 0.00031563 R-square 0.99561
Looking at the last column we are confirmed again that act, press, and their interaction are the most important factor effects. We also used
mfit(actuator.force,actuator.act,actuator.press,actuator.line,actuator.nozzle)
and came to similar conclusions. The output, however, is not as concise as the ANOVA table.
In summary, use the ANOVA table to quickly identify important factors by looking at the F values (are they much larger than 1?) and p-values (are they less than .05?), and then confirm the practical importance of these results by looking at the means and fitted effects.
The previous analyses were for a completely randomized design. Often, though, we have data from a randomized complete block design. Here we will show the correct way to use ANOVA in this situation.
The data set alka has the time in seconds that it takes Alka-Seltzer tablets to dissolve in water and 7UP at two different temperatures.
>> viewdata(alka) Obs liquid temp time block 1 water warm 29 block1 2 water warm 33 block2 3 water cool 97 block1 4 water cool 98 block2 5 7UP warm 80 block1 6 7UP warm 83 block2 7 7UP cool 109 block1 8 7UP cool 115 block2block1 refers to the complete 2 by 2 factorial design run at a particular time, and block2 is a full replication some time later. A completely randomized design would have had all 8 trials run in a random order at one time. Because the experiment was run at two separate times, with a random order within each time, this is a randomized complete block design.
Now the procedures means, mfit, and mplot will all look the same regardless of which kind of design was used. For example,
>> mfit(alka.time,alka.temp,alka.liquid)
Overall Mean
80.5
Fitted Main Effect of Y variable , y, by X variable, x1
Source NMain Effect
warm 4 -24.25
cool 4 24.25
Fitted Main Effect of Y variable , y, by X variable, x2
Source NMain Effect
water 4 -16.25
7UP 4 16.25
Table of 2-way x1 by x2 Interaction Effects
x1
warm cool
x2 water -9 9
7UP 9 -9
This suggests that temperature is more important than type of liquid,
and that the interaction is less important than either of the main effects.
Now the ANOVA for a completely randomized design would ignore the blocking:
>> lm(alka) Variable Names in Input Dataset temp time block liquid Enter class or model statement or type ex for examples lm>> class temp block liquid Enter model statement lm>> model time=liquid+temp+liquid*temp Sequential Sums of Squares ANOVA Table Source df SS MS F p-val liquid 1 2112.5 2112.50 272.5806 7.8816e-05 temp 1 4704.5 4704.50 607.0323 1.6105e-05 liquid*temp 1 648.0 648.00 83.6129 0.00079387 Error 4 31.0 7.75 R-square 0.99586 Standard Error 2.7839The correct analysis that takes the blocking into account is
lm>> model time=block+liquid+temp+liquid*temp Sequential Sums of Squares ANOVA Table Source df SS MS F p-val block 1 24.5 24.5000 11.3077 0.04364600 liquid 1 2112.5 2112.5000 975.0000 7.2171e-05 temp 1 4704.5 4704.5000 2171.3077 2.1761e-05 liquid*temp 1 648.0 648.0000 299.0769 0.00042130 Error 3 6.5 2.1667 R-square 0.99913 Standard Error 1.472Although the p-values are a bit different, the basic conclusions are the same. Note the main difference in the model statement is inclusion of the block variable. We will not be able to explain here why this the correct analysis, but note that by including a block effect that the error MS 2.1667 that goes in the divisor of the F statistics is smaller here than the 7.75 of the previous (and incorrect) completely randomized analysis.
means(z.rate,z.memory,z.resolution,z.processor)