﻿ Post Hoc Statistical Procedures: Matching - Effect Size Calculators (Lee Becker) | University of Colorado Colorado Springs

Effect Size Calculators (Lee Becker)

## Post Hoc Statistical Procedures: Matching

Quasi-Analysis: Matching

## I. Overview

The purpose of the blocking, matching, analysis of covariance, and gain score procedures is to control for unwanted differences in pretreatment scores. These procedures are typically used in quasi-experimental designs because pretest differences are more likely to occur when participants are not randomly assigned to conditions. However, these procedures can be used in experimental designs when participants have been randomly assigned to conditions.

When there is a correlation between pretest and posttest scores, all of these procedures reduce error variance. The consequence is that these procedures will increase the power of your statistical test.

The blocking procedure selected treatment and control participants who were similar on their pretest scores and then analyzed the posttest scores of those selected in a between-subjects design (e.g., an independent t test or an analysis of variance with group (experimental vs. control) as the independent variable.

The matching procedure uses the pretest scores to pair an experimental participant with a control participant. The posttest scores of the matched pairs are analyzed in a within-subjects design (e.g., a dependent t test or an analysis of variance with group (experimental vs. control) as a repeated measures independent variable.

## II. How to Match

### A. Exact Matching

1. The first step is to rank order the participants according to their pretest scores. You can rank in either ascending (lowest to highest) or descending (highest to lowest) order. The ranking should be done within each of the conditions in the study.

#### Example Data

The data used in this example are stored in the file matching.dat.   If you wish you can download the data file, see the download instructions in the outline at the beginning of this set of notes.  The ranked data from the experimental group are shown in Table 1; the data from the control group are shown in Table 2.  In both tables the pretest scores have been ranked in ascending order, from the lowest the highest score.

Table 1. Experimental (treatment) Participants

 ID Treatment Condition PRETEST POSTTEST 17 Treatment 9 13 20 Treatment 9 10 21 Treatment 9 11 24 Treatment 9 11 25 Treatment 9 7 26 Treatment 11 15 28 Treatment 11 8 18 Treatment 12 11 22 Treatment 12 15 23 Treatment 12 9 27 Treatment 12 9 30 Treatment 13 10 19 Treatment 14 14 16 Treatment 15 12 29 Treatment 15 15

Table 2. Control Participants

 ID Treatment Condition PRETEST POSTTEST 4 Control 5 6 7 Control 5 7 6 Control 7 9 8 Control 7 9 5 Control 8 9 10 Control 8 9 11 Control 9 13 12 Control 9 9 13 Control 9 12 15 Control 9 6 1 Control 10 13 2 Control 11 13 3 Control 11 8 9 Control 11 8 14 Control 11 8

2.  Take the person with lowest score in the experimental group and match that person with someone from the control group who has an identical pretest score. If there is more than one person in the control group with an identical score then randomly select one of them. Then take the experimental person with the next lowest score and match that person with someone from the control group who has an identical pretest score. Continue this process until all experimental participants have been match with a control person.

In our data example the experimental scores begin at 9. None of the control participants with pretest scores less than 9 will be used in this matching procedure.

You could make a table to keep track of the matching process:

Table 3. Exact Matching

ID # of the
Treatment
Participant
ID # of the
Control
Participant
(Randomly chosen from the available data)
Treatment Group
Pretest
Score
Posttest Score
for the
Treatment
Participant
Posttest Score
for the
Control
Participant

17

12

9

13

9

20

11

9

10

13

21

13

9

11

12

24

15

9

11

6

25

no match

9

7

26

2

11

15

13

28

3

11

8

8

18

no match

12

11

22

no match

12

15

23

no match

12

9

27

no match

12

9

30

no match

13

10

19

no match

14

14

16

no match

15

12

29

no match

15

15

In the experimental group there are 5 pretest scores of "9."   In the control group there are only 4 pretest scores of "9."   Therefore only four of the five "9"s will be matched.  In the experimental group there are 2 pretest scores of "11."  There are 4 pretest scores of "11" in the control group, but only two will be randomly chosen at matches.  Posttest scores with a strikethough are not used in the data analysis.

The scores that will be entered into the computer program to be analyzed will be the two posttest scores. Data from cases without a match will not be entered into the analysis. You can use a paired t test or a repeated measures analysis of variance to analyze the posttest scores.  The results could be reported as follows:

Pretest differences between the treatment and control groups were controlled by an exact matching procedure.  Exact matching was accomplished for 40% of the participants (6 of the 15 possible pairs). After matching there was no difference between the posttest scores of experimental group (M = 11.33, SD = 2.42) and the control group (M = 10.17, SD = 2.25), t (5) = 0.93, p = .393. This result should be treated with caution due the large number participants who could not be matched.

Q: What happens if you have more participants in one group than the other?

A: The "extra" participants are dropped from consideration. If you have 10 people in one group and 15 people in the other, then, at most, you can have only 10 pairs of data to analyze.

Q: What happens if there at not exact matches for everyone?

A: In reality, it is unlikely that you would find an exact match for everyone. How likely is it that the distribution of scores for one group will exactly match the distribution of scores for the other group? The probability is near zero. Lets look at several possibilities.

First, consider the situation where the distributions have little overlap. There will be some cases each group that are out of range of the cases from the other group. Those cases will have no matches and they will be discarded from the analysis.

In this example, the pretest score overlap between the two groups is not very large.  Only 6 of the 15 scores can be matched.  The data loss is so large that that exact matching does not seem like a reasonable approach.  The loss of power is too high. And any "unusual" data points could have a large influence on the statistical test.

Second, consider those cases where the distributions have moderate overlap. Even where the distributions overlap it is unlikely that you will be able to find an exact match for everyone. One alternative is to use caliper matching.

### B. Caliper Matching

In caliper matching you establish a range of scores that you are willing to consider as "close enough for a match." For example, if you are matching IQ scores then there is not much difference between a score of 105 and a score of 104, you would probably be willing to say that those two scores were about equally matched. For IQ you might be willing to match any score that was within plus or minus 5 points of any given score.

Lets try caliper matching for this set of data where the width of the caliper is defined as the pretest score � 1. The experimental data are reproduced in Table 4.

Table 4. Caliper Matching

ID # of the
Treatment
Participant
ID # of the
Control
Participant
- Randomly chosen

(pretest)

Treatment Group
Pretest
Score
Caliper Range
(score � 1)
Posttest Score
for the
Treatment
Participant
Posttest Score
for the
Control
Participant

17

12 ( 9)

9

8-10

13

9

20

10 ( 8)

9

8-10

10

9

21

13 ( 9)

9

8-10

11

12

24

11 ( 9)

9

8-10

11

13

25

5 ( 8)

9

8-10

7

9

26

14 (11)

11

10-12

15

8

28

1 (10)

11

10-12

8

13

18

2 (11)

12

11-13

11

13

22

3 (11)

12

11-13

15

8

23

9 (11)

12

11-13

9

8

27

no match

12

11-13

9

30

no match

13

12-14

10

19

no match

14

13-15

14

16

no match

15

14-16

12

29

no match

15

14-16

15

Caliper matching with a range of � 1 produced matched scores for 10 of the 15 experimental participants, twice as many as exact matching.   Note that cases without a match (ID #s 27, 30, 19, 16, and 29 in Table 4) are not used in the statistical analysis.

The results of the analysis might be reported as follows:

Preexisting differences between the experimental and control groups were controlled by using caliper matching with a caliper width equal to the pretest score � 1.   Ten of the 15 possible pairs of data (67%) remained after matching. After caliper matching there was no difference between the posttest scores of the experimental group (M = 11.0, SD = 2.71) and the control group (M = 10.20, SD = 2.25), t (9) = 0.63, p = .548.

Matching can reduce the error variance, making your test more sensitive to differences between the groups. The magnitude of the reduction in error variance is partly a function of (a) the width of the caliper you use and (b) the correlation between the pretest and posttest scores. The narrower the caliper, the greater the reduction in error variance. So you should choose a caliper size that is as small as possible. You want to maximize the number of pairs of data while keeping the caliper as small as possible. The larger the correlation between the pretest and posttest scores, the smaller the error variance after matching.

�1998, 1999, Lee A. Becker     -03/15/99