M-Lab 3: Comparing Means in Factorial Studies

An important approach to learning about a system or process is to systematically vary factors that may affect the outcome. The actuator experiment from M-Lab 2 is an example of varying the type of actuator and the amount of air pressure to see how the resulting force may change. Sometimes the experiment produces such clear results that a simple table or plot is all that is needed to summarize the results. Often the results are not as obvious, however, and some statistical analysis is needed. This lab will introduce you to data from a designed experiment and cover some of the statistical methods of analysis. By the end of this lab you will have the skills to a analyze a factorial experiment and to extract a wealth of information from a small set of measurements.

Remember that from within MATLAB you can view the most important M-Lab functions by typing

help mlab

Or you might want to have a netscape window open to the appendix listing of the functions. Only a few new functions are introduced in this lab.

Factorial Designs

Before giving details, it is helpful to introduce some basic terms: factor, level, and design. An experiment is usually carried out to study one or more variables. These variables are called the factors in the experiment. Each factor may be investigated at several different settings. These settings are called the levels of the factor. An experiment consists of a number of separate ``runs'' or measurements. The manner in which levels of all the factors are changed from one run to the next is the experimental design. There are two more terms that are often used to describe designs. Replications are observations made under identical experimental conditions. If every level of every factor appears an equal number of times, the design is said to be balanced.

In this M-lab we will use a subset of the actuator experiment named jet as an illustration. See M-Lab 2 or type viewdata(actuator) to see the full data set (you can learn more about the function viewdata by typing help viewdata).

viewdata(jet)

Obs   act    press    force   order

  1    A2    30psi   0.0992      11
  2    A1    30psi   0.0708       5
  3    A2    30psi   0.0754      12
  4    A1    30psi   0.0690       8
  5    A2   100psi   0.4875       3
  6    A1   100psi   0.4428       2
  7    A2   100psi   0.5039       9
  8    A1   100psi   0.3700       6
  9    A2    30psi   0.0830       7
 10    A1    30psi   0.0581      15
 11    A2    30psi   0.0677       1
 12    A1    30psi   0.0498       4
 13    A2   100psi   0.4864      10
 14    A1   100psi   0.3908      16
 15    A2   100psi   0.4867      14
 16    A1   100psi   0.3726      13

You might recall that actuators are small jets using compressed air that are designed for positioning and stabilizing a space platform. For this experiment the factors are 1) act, the particular actuator and 2) press, the amount of air pressure supplied to the actuator. There were two actuators in the experiment, and so this first factor has two levels: using the first actuator (A1) or the second (A2). The second factor, pressure, also has two levels: 30 psi and 100 psi. By looking at the data, it is clear that the experiment consists of 16 ``runs,'' and the design is to consider all different combinations of the levels for the two factors with 4 replications of each of the combinations. Also note that every level appears exactly 8 times in the design, so it is balanced. This is a factorial design because all different combinations of the factors levels have been included.

This experimental design has the desirable feature that both factors are varied simultaneously. At first this may seem to just scramble up the variables and make the results useless. However, this feature is actually very beneficial and gives information about how the two factors interact with each other. With the correct statistical approach the results will be easy to interpret.


The Effect of a Factor on the Response

For the jet data one would like to estimate the effect on the force as the pressure supply is changed from 30 psi to 100 psi. One way to do this is to find the statistics for all the measurements taken at the same level of pressure.

stats(jet.force,jet.press)

              30psi    100psi
N          8.000000  8.000000
Mean       0.071625  0.442590
Std. Dev.  0.015053  0.056683

Q1         0.060500  0.377150
Median     0.069900  0.464600
Q3         0.081100  0.487300

Min        0.049800  0.370000
Max        0.099200  0.503900
Range      0.049400  0.133900

From this output we see that the mean difference between these two settings is 0.4426 - 0.0716 = 0.3710. We could also estimate the effect due to the difference between the two actuators by computing the means for the two levels of the act factor:

stats(jet.force,jet.act)

N          8.00000  8.000000
Mean       0.28623  0.227990
Std. Dev.  0.21930  0.179020

Q1         0.07730  0.060825
Median     0.29280  0.220400
Q3         0.48730  0.386250

Min        0.06770  0.049800
Max        0.50390  0.442800
Range      0.43620  0.393000
yielding 0.228 - 0.286 = -.058.

We should also look at the set of means from the four groups obtained at the different combinations of factors. To make this easy we have written a function means that will print out the above means as well as the two-way table of means:

means(jet.force,jet.press,jet.act)

Means of Y variable by X variable
Source    N        Mean
30psi     8    0.071625
100psi    8    0.442590


Means of Y variable by X variable
Source    N       Mean
A2        8    0.28623
A1        8    0.22799


Table of means of Y variable  by X variables x1 and x2
                  x1
               30psi     100psi
x2    A2    0.081325    0.49113
      A1    0.061925    0.39405
The means function can handle as many factors as needed (e.g., means(y,x1,x2,x3,x4)) and will print out the mean of the first variable (the response variable) broken down by the values of each of the factors as well as by all combinations of two factors. The means in the first part of the output are often called marginal means, and the means in the second part are often called cell means.

To analyze the cell means, it helps to add a 3rd column of mean differences (and note that we have rounded to two decimals):

                       Pressure
                                    100psi-30psi
                   30psi    100psi      diff.
                   -----    ------      -----
               A2   .08      .49         .41
     Actuator
               A1   .06      .39         .33
                    ---      ---
      A2-A1 diff.   .02      .10

Some conclusions: 1) Looking at the row and column differences of means, we can see that press is a much more important variable than act because .33 and .41, the mean changes in force due to changing pressure levels, are large compared to .02 and .10, the mean changes in force due to changing actuators.

2) The change in mean force is not uniform: the row difference column shows that a change in pressure from 30 to 100 psi for actuator 1 (A1) is .33 compared to a change of .41 for actuator 2 (A2). This nonuniformity is called an interaction between the factors act and press.

In general experiments without interactions are easier to analyze and allow a much simpler description of how the factors affect the response variable. (Note, however, that an interaction is an interesting result in many experiments and may lead to optimal settings of variables that may not have been found otherwise.) In the absence of interaction we can express the dependence of Y on factors and with two levels each as given in the following table of population means:

    X1  
        Level 2
    Level 1 Level 2 - Level 1
X2 Level 1 µ µ + A A
  Level 2 µ + B µ + A + B A
  Level 2 - Level 1 B B  

This is called an additive model because the affect of changing a factor level is to add a fixed amount (possibly negative) to the mean of Y regardless of the values of the other factors. The key point is that the difference in means for the first factor is A for both rows (levels of the second factor). Similarly, the difference in means for the second factor is B for both levels of the first factor. If these differences were not the same, then an interaction would be present. (Remember that the sample estimates of the population means would rarely exhibit the perfect additivity of the above population means because of random variation.)

As a graphical companion to these summary statistics, one could make up a panel of boxplots where the data are divided by the different factors. However, an alternative plot can be more informative for revealing information about interactions between plots. The function mplot(y,x1,x2) will plot the means of y against the levels of x1 and labeled with the levels of x2. The following plots are given by

mplot(jet.force,jet.act,jet.press)

In this type plot, parallel lines are evidence of no interaction (also called additivity), whereas nonparallel lines suggest an interaction between the factors. Thus, the plot is here suggestive of an interaction similar to the row difference column of the associated means table. Also, horizontal lines are evidence that the factor plotted on the x axis is unimportant. If the lines are close to each other, then the factor represented by different lines is unimportant.

To further iluustrate the use of mplot, make the following plots with the actuator data set:

z=actuator
mplot(z.force,z.act,z.press,z.nozzle)


Fitted Effects

The previous dicussion was intended to be simple and to emphasize that the analysis of factorial experiments is mainly about comparing mean differences between factor levels. The factors with the largest differences are the most important, and nonuniformity in the differences is called an interaction.

For experiments with more than two levels and/or more than two factors, it is helpful to use more systematic approaches to investigate the mean differences. In this section we introduce "fitted effects" and a function mfit to easily calculate them.

A fitted main effect is just a marginal factor level mean with the overall mean subtracted. For example, from the first part of the means output of the jet data set

Means of Y variable by X variable
Source    N        Mean
30psi     8    0.071625
100psi    8    0.442590

Means of Y variable by X variable
Source    N       Mean
A2        8    0.28623
A1        8    0.22799
the overall mean is just the average of the marginal means

ybar=(0.071625 + 0.442590)/2 = (0.28623 + 0.22799)/2 = 0.25711

The fitted main effects are then

                                      Fitted
Source    N        Mean             Main Effects
30psi     8    0.071625   - 0.25711 = -0.185
100psi    8    0.442590   - 0.25711 =  0.185

Source    N       Mean
A2        8    0.28623    - 0.25711 =  0.029
A1        8    0.22799    - 0.25711 = -0.029
Notice that we have rounded to three decimals for ease of viewing. (Only two decimals would be better, but we need three for calculations below.) The fitted main effects show us immediately that pressure is more important than actuator because 0.185 is larger than .029. With only two levels, the fitted main effects for a factor are always equal in magnitude and of opposite sign.

If there is no interaction between two factors, the factors are additive, and the predicted cell means would just be the overall mean plus the fitted main effects. For the jet data we find

            Predicted Cell Means under the Additive Model

                                      Pressure

                         30psi                    100psi
                         -----                    ------
               A2  0.257+0.029-0.185=0.10   0.257+0.029+0.185=0.47
     Actuator
               A1  0.257-0.029-0.185=0.04   0.257-0.029+0.185=0.41
Let's compare these predicted means to the actual cell means
                  Predicted Means           Actual Cell
                  Additive Model               Means

                     Pressure                Pressure
                   30psi  100psi           30psi  100psi
                   -----  ------           -----  ------
               A2   .10     .47             .08     .49
     Actuator
               A1   .04     .41             .06     .39
If the two sets of numbers are very close, then there is little evidence of an interaction. The fitted interaction effects are just the difference between the actual cell means and these predicted means
               Fitted Interaction Effects

                      Pressure
                   30psi  100psi
                   -----  ------
               A2   -.02   .02
     Actuator
               A1    .02  -.02
These fitted interaction effects seem fairly small compared to the fitted main effect .19 for pressure.

It's useful to go throught these calculations once by hand, but after that it get's tedious. Thus we have made a function mfit with the same form as means but that gives fitted effects instead of means.

>> mfit(jet.force,jet.act,jet.press)

Intercept
0.25711

Fitted Main Effect of Y variable , y, by X variable, x1
Source    N  Main Effect
A2        8     0.029119
A1        8    -0.029119


Fitted Main Effect of Y variable , y, by X variable, x2
Source    N Main Effect
30psi     8    -0.18548
100psi    8     0.18548


Table of 2-way x1 by x2 Interaction Effects
                       x1
                       A2           A1
x2    30psi     -0.019419     0.019419
      100psi     0.019419    -0.019419
In these last two sections we have given descriptive statistics for analyzing experiments with factorial treatment structures. In the next section we give a more computationally complicated procedure which is used regularly by experimenters to decide what factors are important and whether interaction is present.

The Principles of ANOVA

The acronym ANOVA stands for ANalysis Of VAriance. This term is somewhat misleading because our goal is to compare means. However, it turns out that in the process of deciding whether two means are different, we end up comparing measures of variability (variances). The procedure is a bit complicated, but remember that MATLAB will do all the routine calculations for you. The most important part is that you know how to interpret the results.

An ANOVA table is obtained by using the function, lm which makes the computations. Here are the steps to get the ANOVA table for the data set jet with a model which includes the factors act and press and their interaction, which is denoted in the formula by act*press. First, we type

lm(jet)

Note that the matlab prompt will change to

lm>>

(To get out of lm>> at any time, just hit the return key.) We use the function class to define the factors

class act press

Then we define the model to be fit using the function model

model force = act+ press+act*press

where the variable force is the response. We obtain the following output

Analysis of Variance Table

Source       df           SS            MS            F         p-val

act           1    0.0135660    0.01356600      36.3596    5.9384e-05
press         1    0.5504500    0.55045000    1475.2790    6.2506e-14
act*press     1    0.0060334    0.00603340      16.1702    0.00169570
Error        12    0.0044774    0.00037312

R-square  0.99221

The Analysis of Variance Table organizes the results for the two factors and their interaction in terms of df (degrees of freedom), SS (sums of squares), MS (mean squares), F (F values), and p-val (p-values).

Here we will give a very brief description of a few entries from this table. The MS results for act and press are like estimated variances among the means of the response variable force as the levels of act and press are changed. Thus, they measure the variation in the means of the response variable force for changes in the factor levels. They are actually a weighted sum of the square of the fitted effects. For example, recall the fitted main effects -.185 and .185 for pressure. The .55 for MS above for press comes from 8(-.185)2+8(.185)2=.55. The weight comes from the number of data points that went into the relevant means, 8 in this case.

The MS on the Error line is another estimated variance for comparison purposes. The F values are just the ratio of the MS for the factors to the MS of the error. For example, in the first row F=36.3596=0.01356600/0.00037312. When a factor is not important, then the MS in the numerator and denominator of the F should be of similar size. Thus, if the F value is small (near 1), then we know that changing the levels of a factor has little effect on the response variable. But how large does the F have to be for us to be sure that a factor is important?

The last column of this table, p-val, gives a quick way to decide if the factors (rows) are statistically significant. The elements in this column are usually called p-values. If the p-value is less than .05, then we typically declare that the effect of the factor associated with that row can be confidently declared to be not equal to zero. If it is greater than .05, then there is not strong evidence from the experiment that the particular factor or interaction has an effect. Since all three p-values in the above table are small compared to .05, we can be confident that both factors have a nonzero effect as well as an interaction.

One confusing point about p-values is that they can be small if the sample size is large even when the true effect of the factor is quite small. In order to understand this important distinction between statistically signicant results and practically significant results, we will need to study the theory of statistical hypothesis testing. For now, just realize that practically significant effects can usually be seen by looking at the fitted effects. The p-values from the ANOVA table tell us whether the results we observe are likely to have occurred purely by random chance rather than by true differences in means. Thus the ANOVA table says that there is an detectable interaction betwen actuator type and pressure and a main effect due to actuator type, but the fitted effects from the last section suggest that the interaction and actuator type effects are not important compared to the effect of pressure.

Nevertheless, the ANOVA method of comparing means can is a simple and quick way to identify variables that are important in their effect on a response variable. Let us go back to the full actuator data set and compute an ANOVA table for all four factor variables and their interactions with press.

lm(actuator)
class act press nozzle line
model force = act+press+nozzle+line+act*press+press*nozzle+press*line

Source          df            SS            MS              F         p-val

act             1    0.01357600    0.01357600      43.012500    0.00017691
press           1    0.55048000    0.55048000    1744.058100    1.1907e-10
nozzle          1    0.00095371    0.00095371       3.021600    0.12036000
line            1    0.00095220    0.00095220       3.016800    0.12061000
act*press       1    0.00602970    0.00602970      19.103700    0.00237800
press*nozzle    1    8.8879e-06    8.8879e-06       0.028159    0.87090000
press*line      1    3.926e-05     3.926e-05        0.124380    0.73344000
Error           8    0.00252500    0.00031563

R-square  0.99561

Looking at the last column we are confirmed again that act, press, and their interaction are the most important factor effects. We also used

mfit(actuator.force,actuator.act,actuator.press,actuator.line,actuator.nozzle)

and came to similar conclusions. The output, however, is not as concise as the ANOVA table.

In summary, use the ANOVA table to quickly identify important factors by looking at the F values (are they much larger than 1?) and p-values (are they less than .05?), and then confirm the practical importance of these results by looking at the means and fitted effects.


ANOVA for Randomized Complete Block Designs

The previous analyses were for a completely randomized design. Often, though, we have data from a randomized complete block design. Here we will show the correct way to use ANOVA in this situation.

The data set alka has the time in seconds that it takes Alka-Seltzer tablets to dissolve in water and 7UP at two different temperatures.

>> viewdata(alka)

Obs   liquid   temp   time    block

  1    water   warm     29   block1
  2    water   warm     33   block2
  3    water   cool     97   block1
  4    water   cool     98   block2
  5      7UP   warm     80   block1
  6      7UP   warm     83   block2
  7      7UP   cool    109   block1
  8      7UP   cool    115   block2
block1 refers to the complete 2 by 2 factorial design run at a particular time, and block2 is a full replication some time later. A completely randomized design would have had all 8 trials run in a random order at one time. Because the experiment was run at two separate times, with a random order within each time, this is a randomized complete block design.

Now the procedures means, mfit, and mplot will all look the same regardless of which kind of design was used. For example,

>> mfit(alka.time,alka.temp,alka.liquid)

Overall Mean
80.5

Fitted Main Effect of Y variable , y, by X variable, x1
Source    NMain Effect
warm      4    -24.25
cool      4     24.25


Fitted Main Effect of Y variable , y, by X variable, x2
Source    NMain Effect
water     4    -16.25
7UP       4     16.25


Table of 2-way x1 by x2 Interaction Effects
                 x1
               warm    cool
x2    water      -9       9
      7UP         9      -9
This suggests that temperature is more important than type of liquid, and that the interaction is less important than either of the main effects. Now the ANOVA for a completely randomized design would ignore the blocking:
>> lm(alka)

Variable Names in Input Dataset

  temp
  time
  block
  liquid

Enter class or model statement or type ex for examples

lm>> class temp block liquid

Enter model statement

lm>> model time=liquid+temp+liquid*temp

Sequential Sums of Squares ANOVA Table

Source         df         SS          MS            F         p-val

liquid         1     2112.5     2112.50     272.5806    7.8816e-05
temp           1     4704.5     4704.50     607.0323    1.6105e-05
liquid*temp    1      648.0      648.00      83.6129    0.00079387
Error          4       31.0        7.75

R-square  0.99586

Standard Error  2.7839
The correct analysis that takes the blocking into account is
lm>> model time=block+liquid+temp+liquid*temp

Sequential Sums of Squares ANOVA Table

Source         df         SS            MS             F         p-val

block          1       24.5       24.5000       11.3077    0.04364600
liquid         1     2112.5     2112.5000      975.0000    7.2171e-05
temp           1     4704.5     4704.5000     2171.3077    2.1761e-05
liquid*temp    1      648.0      648.0000      299.0769    0.00042130
Error          3        6.5        2.1667

R-square  0.99913

Standard Error  1.472
Although the p-values are a bit different, the basic conclusions are the same. Note the main difference in the model statement is inclusion of the block variable. We will not be able to explain here why this the correct analysis, but note that by including a block effect that the error MS 2.1667 that goes in the divisor of the F statistics is smaller here than the 7.75 of the previous (and incorrect) completely randomized analysis.


On Your Own

  1. There are two ways one can describe a change in a mean: adding or subtracting an amount or multiplying or dividing by some factor. For example, raising all the NCSU faculty salaries by $10,000 will have a different effect than raising them by 5%. Instead of analyzing the raw force measurements in the jet experiment, consider the logarithms of these values. (Use log(jet.force) in place of jet.force within means or mplot or log(force) in place of force within lm.) How does the model change? Which analysis do you prefer?

  2. The data set magnet is from an experiment to see the effect of voltage and number of turns on the magnetic force of an electromagnet. Make an mplot, a means analysis, and an ANOVA table. The experimenters expected to see a limit to which the number of turns could increase the force. What do you conclude about the effect of voltage and number of turns on magnetic force? (This is actually a randomized complete block experiment where the three electromagnets are the blocks. The experimenters measured the magnetic force 6 times at each combination of volt and turns. Such repetitions are not true replications but rather repeated measurements to reduce measurement error. Therefore we have taken the mean each 6 measurements as our response variable force. Note that you cannot include the interaction term in the lm run because it needs to be used as the error term.

  3. The data set capac is from an experiment to see if the shape of capacitors has any effect on the capacitance. The response variable is capac, and the two factors to consider for this assignment are shape and area. Use the methods of this lab to draw conclusions about the effect of shape and area on capacitance. (You can't include an interaction term in lm because it needs to be used as an error term.)

  4. Use the functions introduced in this lab (means, mplot, lm) and any other appropriate plots (such as lplot) to analyze the data popcorn. Use help popcorn for a description of the experiment and the data. Are there any interactions among the factors? What is the best combination of levels to produce the largest volume of popcorn? (You can figure this out by just looking at the data using viewdata(popcorn). However, a more formal way is to decide on a good model using lm. Then, before getting out of lm, type output p=pred. This puts the predicted values from the model in a data set pred. Now hit return to get out of lm, and type popcorn.pred=pred. This appends pred to popcorn. Then view the results with viewdata(popcorn).)

  5. The data set solar is from an experiment investigating the effect of surface area (cell=0 is 8 sq. cm and cell=1 is 3 sq. cm), distance, and light intensity on the output voltage of photovoltaic cells (solar cells). Analyze this data using cell ,light, and distance as the factors and volt as the response variable. Assume that the experiment was run as a 2 x 3 x 3 completely randomized design.

  6. The data set viscosity is from an experiment to study the effect of temperature on the viscosity of three liquids. Visocisty was measured by the time it took for the liquids to drain out of a container--type help viscosity to learn more details. Use the means and mplot functions to analyze the data. Give a summary of your conclusions. Assume that a completely randomized design was used (although it doesn't appear that the order of different replications was properly randomized).

  7. The data set salt is from class project to look at the effect of amount of salt (in teaspoons) and salt type (RockSalt or TableSalt) on the time to complete melting of cups of ice. Use appropriate methods from this chapter to decide if amount and type are important factors. Is their an interaction?

  8. The data set yarn is from a team project to investigate the effect of count number (24 or 36) and yarn type (Ring, Open-end, Air jet) on the tensile strength of yarn. The Uster Tensorapid 3, a standard machine for measuring tensile strength, was used to take 10 measurements of strength from each of 6 yarn cones (one of each type, 10 feet between measurements on each cone). These 10 measurements for each cone are not true replications because the yarn cones should be the experimental unit. Thus we have reduced the data by averaging over the 10 measurements for each cone and put the data in yarnred. Note that yarnred.tensile is the mean of the 10 measurements and yarnred.std is the standard deviation. Use mplot, mfit, and lm to decide what factors are important and whether there appears to be interaction. (In lm you can't include the interaction term because it is required for the error.)

  9. A team of civil engineering students wanted to see if the suggested 24 hour soak time for measuring the specific gravity and absorption of course aggregate (e.g., granite or limestone) was really necessary. Thus they obtained samples of course aggregate from eight quarries and measured three types of specific gravity (BDSG=dry specific gravity, SSDSG=SSD specific gravity, ASG=apparent specific gravity), and absorption (abs) at five soak times (6, 18, 24, 48, and 72 hours). The data are in SpGravity. Use means and lm with quarry and time as class variables to see if any of the response variables BDSG, SSDSG, ASG, or abs have means that are different for the five time points. Note that the same sample was used for each of the five time points. Quarry (or the sample from each quarry) is a blocking variable here.

  10. What is the effect of processor speed, memory size, and screen resolution on a Riva TNT video card? A team varied these factors and measured frame rates using Quake II, a standard benchmarking program for 3D graphics. The data are in framerate. Use means and mfit to see which of the above factors and possibly interactions are important (ignore other variables). For example, start with

    means(z.rate,z.memory,z.resolution,z.processor)

  11. What cutting method (vertical or horizontal) yields the quickest and smoothest cuts for three types of metal stock (angle, flat, round)? Students from the Biological Engineering Department measured cutting times (seconds) and quality (0=smooth,1=rough) for the six combinations of method and stock. The results are in the data set metalcut. Use some of mfit, mplot, means, and lm to decide what effects are important and what treatment combinations lead to short times. Do a similar analysis for the quality variable. (Even though quality is a 0-1 variable, we can use the methods of M-Lab 3 to analyze it.)

  12. A student team divided a batch of Hodgson Mill Wholesome White Bread mix into three parts and mixed in 0.75, 1.0, and 1.25 teaspoons of yeast, respectively, into the mix. Each part was put into 8 different cupcakes and baked at 350 degrees. After baking, the height of each cupcake was measured. Then the experiment was repeated at 450 degrees. The data set bread contains all 48 measurements. This is really randomized complete block experiment with temperature the blocking factor. The eight cupcakes for each treatment combination are not true replications; so averaging over them produced the reduced data set bread2. First use means, mfit, and mplot to explain the importance of the factors. Note that for these functions, it doesn't matter whether you use bread or bread2. Finally, run an lm on bread2 to assess the statistical significance. Note that you cannot include an interaction term because that is needed for the error term.