It is rare that researchers gather information from an entire population. If we did, statistics would be unnecessary. Error is involved whenever an experiment is run or people are sampled for a survey. Confidence intervals give us an estimate of the amount of error involved in our data. They tell us about the precision of the statistical estimates (e.g., means, standard deviations, correlations) we have computed. Confidence intervals are related to the concept of the power. The larger the confidence interval the less power a study has to detect differences between treatment conditions in experiments or between groups of respondents in survey research. SPSS provides confidence intervals for a wide range of the statistics it computes.
A confidence interval is based on three elements: a value of a statistic (the mean, the correlation, etc.); the standard error (SE) of the measure; and the desired width of the confidence interval (e.g., the 95% confidence interval or the 99% confidence interval).
In this set of notes confidence intervals are first theoretically defined in terms of the mean and standard deviation of a large number of samples of equal size from a known population. The standard error is defined as the standard deviation of means of those samples. This definition is provides a good basis for understanding confidence intervals but it is not practical because we rarely collect that kind of information. Next, the formula is given for estimating the standard error from a single mean and standard deviation. Finally, the formula for estimating the confidence interval around a single mean is given.
Suppose that you randomly selected 200 samples of 30 cases (n = 30) from a population and found the mean for each of the 200 samples. The hypothetical plot of the 200 means are shown in Figure 1.
Suppose that we know that the population mean is zero. We can ask the question, "is the mean of the distribution of sampled means different from zero?" In order to answer this question we need to build a confidence interval around the mean of the distribution of means. A confidence interval (C.I.) is defined by the following formula
95% C. I. = M ± (z * SE)
= M - (z * SE) to M + (z * SE)
where M is the mean of the distribution of means, SE is the standard error of the distribution, and z is the z-score for the particular confidence interval of interest. For example, if you want the 95% confidence interval the value of z would be 1.96. The value 1.96 comes from our understanding of the normal curve. The areas between plus and minus 1.96 standard deviations covers 95% of the cases, if the means are normally distributed. Substituting 1.96 for z the formula for the 95% C.I. would be
95% C. I. = M ± (1.96 * SE)
= M - (1.96 * SE) to M + (1.96 * SE).
The standard deviation of the distribution of means is called the standard error. The standard deviation of this distribution of sample means, and hence the standard error, is 1.0. The 95% confidence interval for this distribution is indicated by the yellow portion of the figure below. The 95% confidence interval for this set of data ranges from -1.96 to 1.96 because M = 0 and SE = 1.00. 95% of the 200 means will fall within the 95% confidence interval.
(Note: ranges are always given from the lowest value to the highest value. Do not state the range as 1.96 through -1.96.)
The blue areas at either ends of the distribution represent the areas that are outside the 95% confidence interval. The 95% C.I. includes zero so the mean of the scores is not significantly different from zero.
Another random sample of 200 sets of 30 cases are drawn. The distribution of the means is show in Figure 2. In this case the mean of the distribution is 2.5. The standard deviation of the distribution of means is 1.0, hence the standard error is 1.0. Applying the formula for the 95% C.I.
95% C.I. = 2.50 ± (1.96 * 1.00)
= 2.50 ± 1.96
= 2.50 - 1.96 to 2.50 + 1.96
= 0.54 to 4.46
we find that the 95% C.I ranges from 0.54 to 4.46. The confidence interval does not include zero so the mean of this distribution of means is different from zero at p <.05.
The distribution could be towards the negative end of the scale. Suppose that the mean of a distribution of 200 randomly sampled sets of 30 cases was -2.2 and the standard deviation of the distribution of means was 1.0. Then the 95% confidence interval would range from -4.16 to -0.24. The 95% C.I. does not include zero so the mean of -2.2 is significantly lower than zero at p < . 05.
95% C.I. = -2.20 ± (1.96 * 1.00)
= -2.20 ± 1.96
= -2.20 - 1.96 to -2.50 + 1.96
= -4.16 to -0.24
In everyday research we do not sample a large number of sets of n cases from our population, we take a single sample of n cases where n is the number of cases in the study, or sometimes the number of cases in each cell of the design. We can find the mean and standard deviation of our sample. But the definition of the standard error presented in the previous section is that it is the standard deviation of a large number of samples from the population. So how can we find the 95% confidence interval for our single mean if we are missing the standard error part of the C.I. formula? The standard error (SE) can be estimated according to the following formula
SE = SD / Ön
where SD is the standard deviation of our sample and n is the number of cases. That is, the standard error is estimated by the dividing the obtained standard deviation by the square root of the number of cases.
Notice that the standard error becomes smaller as the size of the sample increases. As we increase our sample size the standard error and hence the confidence interval becomes smaller. In other words, we can detect smaller differences between means if we have larger sample sizes. We can increase the power to detect any difference by increasing the sample size.
Suppose that the sample mean = 4.52, the standard deviation = 6.28, and the number of cases = 15. Then the standard error is
SE = SD / Ön
= 6.28 / Ö15
= 6.28 / 3.873
Applying the formula for the 95% C. I. --
95% C. I. = M ± (1.96 * SE)
95% C.I. = 4.52 ± (1.96 * 1.62)
= 4.52 ± 3.175
= 4.52 - 3.175 to 4.52 + 3.175
= 1.345 to 7.695
The 95% C.I. ranges from 1.345 to 7.695. It does not include zero, so the mean, 4.52, is different from zero at p < .05. A graphical representation of the 95% confidence interval using the estimated standard error is shown in Figure 3. The area under the 95% C.I. bar 95% confidence interval around the mean of 4.52.
It would be nice to have an interactive graphic here. Think about the mean getting closer and closer to zero. The corresponding 95% confidence interval moves towards the left end of the number scale as the mean gets smaller. At some point the 95% confidence interval will include zero. At that point the mean is no longer significantly different from zero.
In these examples we have asked the question "is the sample mean different from zero." Usually we are more interested in the question, "is mean A different from mean B?" Confidence intervals are also useful for that question. Does the 95% confidence interval for one of the means include the other mean? If so, then the means are not different from each other. If not, then the means are different from each other at p < .05.
©Lee A. Becker, 1997, 1998 -revised 07/07/99