1. Overview 
Suppose you have the following quasiexperimental design:
O_{1} X O_{2              }O_{1} O_{2} 
Participants are not randomly assigned to your two groups (treatment vs. control). Two observations are made for each group, one prior to treatment (pretest, O_{1}) and one after treatment (posttest, O_{2}).
If there are no pretest differences between the treatment and control groups the analysis is straightforward. You can run a t test or analysis of variance on the posttest scores.
But what if there are pretest differences between the two groups? Then any posttest differences might be attributed to the pretest differences. The statistical procedures of blocking, matching, analysis of covariance, and gain scores are different statistical techniques for controlling for differences in pretest scores.
Blocking looks at groups (blocks) of participants in the experimental and control conditions who have equivalent pretest scores.
Matching is a slightly more precise procedure that looks at pairs of participants in the pretest and control conditions who have equivalent pretest scores.
Analysis of covariance is more precise still. It "holds constant" the pretest scores by estimating the posttest scores that would have been obtained if everyone had the same pretest score. This is a powerful analysis if the assumptions of the analysis of covariance are met.
Gain score analysis analyzes the difference between each persons pretest and posttest score. It will be shown that an analysis of the gain scores is the same as the interaction in a 2 (pretest vs. posttest) by 2 (experimental vs. control) analysis of variance.
These techniques can be used in randomized designs to control for unwanted differences in pretest scores.
Figure 1. Overlapping distributions
If there are no differences between the groups at pretest, then posttest differences may be attributed to the treatment. In this figure the experimental group is indicated by the blue scatterplot; The control group is indicated by the yellow scatterplot. Pretest scores are shown on the xaxis, posttest scores are shown on the yaxis. The lines that drop to each axis from the scatterplot represents the mean pretest and posttest scores for that condition.. In this example there are no pretest differences. The posttest scores are higher for the treatment group than for the control group. 
Figure 2. Nonoverlapping distributions
Where there are differences at the pretest, then the interpretation of differences at the posttest must take into account the pretest differences. Can the pretest differences account for all the differences in the posttest scores? The blocking approach looks only at the pretest scores from the two conditions that overlap each over. The basic procedure is to find the participants whose pretest scores overlap each other and then compute the posttest means for only those overlapping participants. Pretest scores are held constant by deleting those participants whose pretest scores are too high or too low 


The t tests indicate that there are significant differences between the treatment and control groups at both the pretest and the posttest.
In order to control for the pretest differences we need to select out those participants that have overlapping pretest scores. The scatterplot indicates that the overlapping pretest scores fall between 9 and 11. So we would select only those participants who have a pretest score of 9, 10, or 11 and run a t test to see if the posttest scores differed for those participants. The asterisks next to the identification numbers in the raw data table above indicate 16 participants who had pretest scores of 9, 10, or 11.
The results of the t test (run on the posttest scores) indicate that there was no difference between the treatment (M = 10.71, SD = 2.75) and control (M = 10.00, SD = 2.74) conditions when the pretest scores were held constant, t(14) = 0.86, p = .405.
The t test was run on a little over half of the original data. This loss of data is a serious problem for the blocking model. It is possible that in some cases there will be no overlap in pretest scores. In that extreme instance it will not be possible to run this type of analysis. There are no agreed upon rules of thumb for how much data loss is acceptable. If the loss were more than 25% I couldn't feel very comfortable about the analysis.
The SPSS syntax commands for this analysis are as follows:
COMPUTE filter_$=(pretest >= 9 & pretest <= 11). FILTER BY filter_$. TTEST GROUPS=group(1 2) 
You could substitute the following select if command for the compute and filter commands:
SELECT IF (pretest GE 9 and pretest LE 11).
If more data were available then it would be possible to create smaller blocks of similar pretest scores. For example, you might be able to create a block with scores between, say 9 and 10 (block 1), and another block with scores between 11 and 12 (block 2). Blocks could then be used as a factor in an analysis of variance model. The syntax for the analysis of variance would look like this:
IF (prestest GE 9 AND pretest LE 10) block = 1. IF (prestest GE 11 AND pretest LE 12) block = 2. GLM = posttest BY group(1,2), block(1,2). 
The analysis would yield group and block main effects and a group by block interaction. The group main effect would be the test of whether or not there were differences between the treatment and control groups. The block and and block by group interactions would remove the variability due to pretest differences from the error term, making the group main effect test more powerful.
03/12/00