Analysis of Pretest and Posttest Scores with Gain Scores and Repeated Measures

Analysis of Pretest and Postest Differences

I.   Overview

II.  Analysis of Variance of Gain Scores

III. Repeated Measures Analysis of Variance

IV. Discussion

V. References

Datafile: traitanx.sav      (Download Tips)



I. Overview

In previous sets of notes in this series we analyzed a pretest-posttest, two-group, quasi-experimental design using blocking, matching, and analysis of covariance procedures. Those procedures were used to analyze the differences in posttest scores after any  pretest score differences were "held constant." In this set of notes we will take a different approach and look at the change from the pretest and posttest scores.  

Hypothetical  pretest and posttest trait anxiety means for a two group design are shown in Figure 1. The data that we displayed as a  scattergram in the analysis of covariance notes are redisplayed here using the pretest and posttest means within each treatment condition.  The question of interest is whether the improvement in scores from pretest to posttest is greater for the treatment group than it is for the control group.

The question can be answered by computing the difference between the pretest and posttest scores for each person and then analyzing those differences in a oneway ANOVA using treatment (treatment vs. control) as the only factor.  If the treatment main effect is significant, then the change from pretest to posttest is not the same in the two groups. This analysis of difference scores is also called a gain score analysis. 

Another way of answering this question is by looking at the interaction effect in a 2 x 2 analysis of variance (ANOVA) with treatment (treatment vs. control) as a between subjects factor and time (pretest vs. posttest) as a within subjects factor.  If the interaction is significant, then the change between pretest and posttest is not the same in the two treatment conditions. 

It will be shown that the treatment by time interaction effect in the 2 x 2 analysis of variance yields identical statistical results to the treatment main effect in the gain score analysis.


II.  Analysis of Variance of Gain Scores

The general approach to a gain score analysis is: (a) to compute the gain score, and then (b) analyze those gain scores in an analysis of variance with treatment as the between-subjects factor.

Compute the Gain score

The improvement (gain) from pretest to posttest can be computed for each participant by subtracting each person's pretest score from his or her posttest score -

Gain = posttest - pretest

The SPSS syntax for computing the gain score is as follows:

COMPUTE  gain = posttest - pretest.

When you compute a gain score in this manner a positive gain score indicates that the posttest score was greater than the pretest score, a negative gain score indicates that the posttest score was less than the pretest score.  In our example the dependent variable is trait anxiety so we expect that successful treatment would lead to lower anxiety. The gain score should be negative.

The gain score controls for individual differences in pretest scores by measuring the posttest score relative to the each person's pretest score.  But, a gain score analysis does not control for the differences in pretest scores between the two groups.

The null hypothesis of no difference in improvement between the treatment and control groups can be tested by an analysis of variance on the gain scores using treatment (treatment vs. control) as a between subjects factor.  If the treatment main effect is significant, then we reject the null hypothesis. 

The Error Term

The sums of squares for the within cells error term is the amount of error in the gain scores.  Recall that in an analysis of variance the sums of squares for error is defined as

SSerror = S(Xij - M.j)�

That is, SSerror is the sum of the squared differences between a score and the group mean for that score. In a gain score analysis Xij is the observed gain score and M.j is the mean gain score for a particular treatment group.  Error will be small to the extent that the effect of the treatment is the same for each individual (i.e., the gain score is the same for each person).  The error term will be relatively large when the effect of treatment is not the same for each person.  In treatment outcome studies it is unlikely that the treatment effect will be exactly the same for every individual. 

The correlation between pretest and posttest scores within the treatment group provides an estimate of the consistency of the treatment effect across individuals.   If the pretest-posttest score correlation is high, then the rank ordering of people on the posttest is similar to rank ordering of people on the posttest and the effect of treatment is similar for every individual. In this instance the error term will be relatively small.  If the pretest-posttest correlation is low, then the rank ordering of people on the posttest is not the same as the rank ordering of people on the pretest, the effect of treatment is not the same for each individual, and the error term will be relatively large.

Running the Analysis

Table 1. SPSS Syntax for the ANOVA

UNIANOVA
gain BY treatgrp
/EMMEANS = TABLES(treatgrp).

The SPSS syntax commands for running the ANOVA are shown in Table 1. The first line (UNIANOVA) tells SPSS to run a univariate analysis of variance. The second line (gain...) defines the dependent variable (gain) and the independent variable (treatgrp). The third line (/EMMEANS ...) will print the means, standard errors and the 95% confidence intervals for the means. 

The Results

Table 2. Tests of Between-Subjects Effects
Dependent Variable: COMPUTE gain = tanxpost - tanxpre
Source Type III
Sum of Squares
df Mean
Square
F Sig.
TREATGRP 4010.641 1 4010.641 47.140 .000
Error 6466.038 76 85.079    
Total 16705.000 78      
 
Table 3. Means, Standard errors, and 95% Confidence Interval for the Two Treatment Conditions
Dependent Variable: COMPUTE gain = tanxpost - tanxpre

Mean Std. Error 95%
Confidence Interval
Condition Lower
Bound
Upper
Bound
Treatment -15.925 1.458 -18.830 -13.020
Control -1.579 1.496 -4.559 1.401

The abbreviated analysis of variance output is shown in Table 2. The means, standard errors, and 95% confidence intervals for each mean are shown in Table 3.  The results can be summarized as follows:

Trait anxiety gain scores (posttest - pretest) were analyzed in an analysis of variance with treatment group (treatment vs. control) as the independent variable.  The decrease in trait anxiety was greater for participants in the treatment condition (M = -15.93, SE = 1.46) than for those in the control condition (M = -1.579, SE = 1.50),  F (1, 76) = 47.14, p < .0005.  

Interpretation of the 95% Confidence Interval

The 95% confidence intervals provide additional information about the effectiveness of the two conditions.  Because the gain score is computed as a difference score,   no change between pretest and posttest would be indicated by a gain score of zero.   If the 95% confidence interval includes zero, then the gain score mean is not significantly different from zero. 

The 95% confidence interval for the treatment group mean ranges from -18.83 to -13.02.   It does not include zero so the mean gain is different from zero.  That is, there was significant improvement for participants in the treatment group.  The 95% confidence interval for the control group mean ranges from -4.56 to 1.40.  It does include zero to the mean gain is not different from zero.  That is, there was no significant improvement for participants in the control group. This information could be added to the description of the results:

Trait anxiety gain scores (posttest - pretest) were analyzed in an analysis of variance with treatment group (treatment vs. control) as the independent variable. The decrease in trait anxiety was greater for participants in the treatment condition (M = -15.93, SE = 1.46) than for those in the control condition (M = -1.579, SE = 1.50),  F (1, 76) = 47.14, p < .0005.  Inspection of the 95% confidence intervals around each mean indicated that there was a significant decrease in anxiety for participants in the treatment condition, and no decrease in anxiety for participants in the control condition.


III. Repeated Measures Analysis of Variance

An alternative procedure for analyzing the pretest and posttest scores is run a 2 x 2 ANOVA with time (pretest vs. posttest) as a within-subjects factor and treatment (treatment vs. control) as a between subjects factor.

Table 4. SPSS Syntax for the ANOVA

GLM
tanxpre tanxpost BY treatgrp
/WSFACTOR = time 2 Repeated
/PLOT = PROFILE( time*treatgrp )
/EMMEANS = TABLES(treatgrp*time)               compare(time)   ADj(Bonferroni)
/WSDESIGN = time
/DESIGN = treatgrp .

The SPSS syntax commands for running the 2 x 2 ANOVA are shown  in Table 4.  The first line (GLM) tells SPSS to run the General Linear Model (GLM) procedure.  The second line (tanxpre ... ) defines the two dependent measures (the pretest score, tanxpre, and the posttest score, tanxpost) and the independent variable (treatgrp). The third line (/WSFACTOR... ) tells the GLM procedure that the two dependent measures should be treated as a within subject factor.  The fourth line (/PLOT...) will create a graphic plot of the means, such as the one shown in Figure 1.  The fifth (and sixth) line (/EMEANS...) will print the treatgrp by time interaction means [TABLES(treatgrp*time],  and run a simple main effects analysis of the effects of time within each treatment group [compare(time)] using a Bonferroni correction when testing the mean differences [ADJ(Bonferroni)]. The seventh line (/WSDESIGN) specifies the within subject factor, time..  The last line (/DESIGN) specifies the between subject factor, treatgrp.

Overall Analysis

Table 5. Tests of Within-Subjects Effects
Source Type III Sum of Squares df Mean
Square
F Sig.
TIME 2985.321 1 2985.321 70.177 .000
TIME *
TREATGRP
2005.321 1 2005.321 47.140 .000
Error(TIME) 3233.019 76 42.540    
 
Table 6. Tests of Between-Subjects Effects
Source Type III
Sum of
Squares
df Mean Square F Sig.
TREATGRP 19.206 1 19.206 .135 .714
Error 10800.323 76 142.110    

The primary output from the analysis of variance is divided into two parts tables, the within subject effects, see Table 5, and the between subjects effects, see Table 6. The output has been abbreviated somewhat for the purposes of this discussion.

As shown in Table 5, the interaction between treatment and time is significant, F (1, 76) = 47.14, p < .0005. The interaction will be interpreted with simple main effects analysis looking at the effects of time within each treatment.  The significant time main effect, F (1, 76) = 70.18, p < .0005 must be interpreted in light of the interaction effect.  As shown in Table 6, the main effect for treatment was not significant, F (1, 76) = 0.14, p = .714.

Conceptually, the interaction term in this 2 x 2 ANOVA can be thought of as a comparison of the changes from pretest to posttest within each treatment group (see the formula below).  If the changes from pretest to posttest are identical in each group, e.g., if the improvement is the same for each group, then there is no interaction.  If the change from pretest to posttest is greater in one group than the other group, e.g., if one group improves more than the other group, then there is an interaction. An interaction could also occur if one group improved from pretest to posttest while the other group deteriorated. 

 

Simple Main Effects

Table 7. Estimated Marginal Means of the Trait Anxiety Scores for the Condition * TIME Interaction

Mean Std.
Error
95%
Confidence Interval
Condition TIME Upper
Bound
Lower
Bound
Treatment 1 54.350 1.949 50.468 58.232
2 38.425 2.090 34.262 42.588
Control 1 46.184 2.000 42.201 50.167
2 44.605 2.145 40.334 48.877


Table 8. Simple Main Effects of Time Within
each Treatment Condition

Condition F Hypothesis df Error df Sig.
Treatment 119.232(a) 1.000 76.000 .000
Control 1.114(a) 1.000 76.000 .295

Each F tests the multivariate simple effects of TIME within each level combination of the other effects shown. These tests are based on the linearly independent pairwise comparisons among the estimated marginal means.

a Exact statistic

The interaction means, standard errors, and 95% Confidence Intervals for the means are shown in Table 7.  The simple main effects of time within each treatment condition are shown in Table 8. 

The interaction can be described by the following two statements:

The trait anxiety scores for participants in the treatment condition decreased from the pretest (M = 54.35, SE = 1.95) to the posttest (M = 38.43, SE = 2.09), F (1, 76) = 119.23, p < .0005.   The trait anxiety scores for participants in the control condition showed no change from the pretest (M = 46.18, SE = 1.00) to the posttest (M = 44.60, SE = 2.15), F (1, 76) = 1.11, p = .295.

This interpretation involves showing that the change in scores from the pretest to the posttest was greater for one group than for the other. It seems to me it is an interpretation that is closely related to how people think about treatment outcome studies. In general, we want to know if one treatment produced a greater effect than another treatment.

Note: You need to be careful when you interpret the 95% Confidence Interval information in SPSS output.  The 95% Confidence Intervals shown in Table 7 are based on the standard deviations of the individual means. It is appropriate for making comparisons of between subjects means (e.g., the treatment pretest mean vs. the control pretest mean), but it is too conservative for comparing the within subject means (e.g., the treatment pretest mean vs. the treatment posttest mean).

 


IV. Discussion

Alternative explanations

Both the gain score analysis and the repeated measures analysis ignore the (significant) pretest differences on trait anxiety.  Can you think of any alternative explanations to this outcome that are based on the existing pretest differences?   For example, can the regression towards the mean effect account for the pattern of results?

Comparison of the gain score results with the time by treatment ANOVA results

The F-test value of the treatment main effect in the gain score analysis, F (1, 76) = 47.14, p < .0005, was the same as the F-test value for the time by treatment interaction in the repeated measures analysis, F (1, 76) = 47.14, p < .0005.  Why is this so? Consider the following description of the time by treatment interaction term -

Time by treatment interaction = (treatment posttest - treatment pretest) - (control posttest - control pretest)

The interaction is a comparison of the differences between the posttest and pretest scores in each treatment group. As we noted earlier, if the difference is the same in each treatment group, there is no interaction.  If the difference is not the same in each treatment group, then there is an interaction. Most computer programs such as SPSS handle the within subjects factor, e.g., time,  by literally creating a difference score for each person by subtracting the posttest score from the pretest score.  The test of the main effect of time is a test of whether the overall mean difference score (across both treatment groups) is different from zero.  The test of the interaction is a test of whether the mean difference score for the treatment group is different from the mean difference score for the control group.  In the gain score analysis we first computed the difference between the posttest and pretest scores and then tested whether the differences were the same for each treatment group.  Thus the treatment main effect in the gain score analysis is the same as the time by treatment interaction in the 2 x 2 ANOVA. 

The interaction term in the ANOVA was significant. The details of the interaction were analyzed using a simple main effects analysis of the effects of time within each treatment condition.  The simple main effects analysis indicated a significant change from pretest to posttest in the treatment condition, but not in the control condition.   Similarly, the treatment main effect in the gain score analysis was significant.   The details of the main effect were analyzed using the 95% confidence intervals for each of the group means.  The 95% confidence interval analysis indicated a significant change from pretest to posttest in the treatment condition, but not in the control condition.

Technical note. You may have noted that although the F values for the gain score main effect and ANOVA interaction effect are the same, the sums of squares are not the same. This is due to the way in which SPSS creates the difference scores. Think of creating the difference score by multiplying the individual scores by a coefficient (or weight) called "c" -

Gain = c1*posttest + c2*pretest

When we computed the gain scores c1was set to +1 and c2was set to -1, that is, we simply subtracted the pretest score from the posttest score -

Gain = (+1)*posttest + (-1)*pretest

SPSS "orthonormalizes" the coefficients so that the sum of the squares of the coefficients is equal to 1.00.  The coefficients used by SPSS are as follows -

Gain = (+0.707107)*posttest + (-0.707107)*pretest

If you square each of the coefficients (0.707107� = .5000) and sum them the result is 1.00. 

You could check this out for yourself by using the SPSS coefficients to manually create the gain score and then run the gain score analysis.  You would find that both the sums of squares and the F value from the gain score analysis would equal the sums and squares and F value from the interaction term in the ANOVA.

Comparison of Gain Scores and Analysis of Covariance

The focus of the difference or gain score analysis is somewhat different from the focus of the analysis of covariance. The gain score analysis focuses on the change that occurs from the pretest to the posttest.  By analyzing the change scores within each group you  can specify whether both groups improved at different rates, whether one group improved while the other group showed no improvement, or even whether one group improved while the other group deteriorated.  The analysis of gain scores makes no assumption about the equivalence of the pretest-posttest regression line. The interpretation of the gain score analysis becomes somewhat problematic when there are pretest differences. 

The analysis of covariance focuses on the posttest differences between the treatment groups while holding constant any differences in the pretest scores.  But the analysis of covariance does not tell you anything about the how the groups changed from pretest to posttest. If you have met the assumptions of the analysis of covariance, then it is generally considered to be a statistically more powerful analysis than a difference or gain score analysis. 

I have recently seen some studies that reported both the difference score analysis and the analysis of covariance.  The paper made the argument that the effects seen in the study were robust because both analysis came to the same conclusion.

Additional information on gain score analyses

There is an extensive literature on the analysis of difference or gain scores. It has been argued that difference or gain scores are inherently unreliable. The reference section cites additional reading for anyone who might be interested.


V. References

  Cattell, R. B. (1983). The clinical use of difference scores: Some psychometric problems. Multivariate Experimental Clinical Research, 6, 87-98.

    Gardner, R. C. (1987). Use of the simple change score in correlational analysis. Educational and Psychological Measurement, 47, 849-864.

    Humphreys, L. G. (1989). Some comments on the relationship between reliability and statistical power. Applied Psychological Measurement, 13, 419-425.

    Karabinus, R. A. (1983). The use of ANOVA, multiple regression, repeated ANOVA, and effect size. Evaluation Review, 7, 841-850.

    Lord, F. M. (1956). The measurement of growth. Educational and Psychological Measurement, 16, 421-437.

    Lord, F. M. (1963). Elementary models for measuring change. In C. W. Harris (Ed.), Problems in measuring change. Madison, WI: University of Wisconsin Press.

    Rogosa, D. R., & Willett, J. B. (1983). Demonstrating the reliability of the difference score in the measurement of change. Journal of Educational Measurement, 20, 335-343.

    Stemmler, G. (1987). Implicit measurement models in methods for scoring physiological reactivity. Journal of Psychophysiology, 1, 113-125.

    Williams, R. H., Zimmerman, D. W., Rich, J. M., & Steed, J. L. (1984). An empirical study of the relative error magnitude in three measures of change. Journal of Experimental Education, 53, 55-57.=


� 1999, Lee A. Becker    -revised 03/21/00