Reliability and Validity, Part II

Reliability and Validity, Part II - Validity

V. Validity

    A. Criterion-related validity
         1.    Predictive validity
         2.   Concurrent validity
    B. Construct validity
         1.   Convergent validity
               a.   Correlational
               b.   Contrasted Groups
               c.   Experimental
         2.    Discriminant validity

Reliability and Validity, Part I



V. Validity

Validity refers to the issue of whether the test measures what it is intending to measure.  Does a test of, say, mathematics ability measure that ability, or is reading comprehension a part of what is measured by the test?  The validity of a test is constrained by its reliability.  If a test does not consistently   measure a construct or domain then it cannot expect to have high validity coefficients.

A. Criterion-related validity  

Predictive validity. Criterion related validity refers to how strongly the scores on the test are related to other behaviors.  A test may be used to predict future behavior, e.g., will people who score high on the "in basket" test be good supervisors if they are promoted, or will people who score high on the GRE exam be successful in graduate school?  In order to find the relationship between the test and some behavior you need to clearly specify the behavior that you want to predict, that is, you need to specify the criterion.   This is not often an easy task.  What do you mean by being a "good supervisor" or "success in graduate school"  and how would you measure those characteristics? 

As far as I know there are no studies that have looked at the predictive validity of the PTSD-I.  How would you devise a study to explore the predictive validity of the PTSD-I?

Concurrent validity.   Rather than predicting future behavior you may be interested in the relationship between your test and other tests that purport to measure the same domain.  For example, if you are creating a new, shorter, less costly test of intelligence, you would be interested in the relationship between your test and a standard test of intelligence, for example the WAIS.  If you gave both tests at the same time and found the correlation between the two you would be determining the concurrent validity of your test.

Watson, et. al. (1991) examined the concurrent validity of the PTSD-I and the posttraumatic disorder section of the Diagnostic Interview Schedule (DIS: Rollins & Helzer, 1985).  As mentioned earlier, the DIS is a commonly used diagnostic measure of PTSD.  They computed the correlations between the individual items of the PTSD-I and the corresponding items of the DIS and found that they ranged from .57 to .99, with a median correlation of .77.  They also looked at the correlation between the total score of the PTSD-I and the diagnosis of PTSD based on the DIS, it was .94. We know from our earlier discussion that the reliability of a test depends upon the number of items.  So we would expect that the reliability of the total test score would be higher than the reliability of the individual items on the test.

Wilson et al. (1994) administered the PTSD-I, the Impact of Events Scale (IES; Horowitz, Wilner, & Alvarez, 1979) and the Symptom Check List -90-Revised (SCL-90-R; Derogatis, 1992) to 80 adult participants in a study of traumatic memory. The IES scale measures the PTSD symptoms of intrusion and avoidance.  It was not designed to be used to diagnose PTSD. The SCL-90-R measures the occurrence of 9 psychological symptoms for psychiatric and medical patients.  Many of the items on the PTSD-I are phrased in terms of  the lifetime incidence of PTSD symptoms.  For example, item C-5 reads, "Have you felt more cut off emotionally from other people at some period than you did before (the stressor)?  This form of the PTSD-I was used at the pretest, prior to the administration of the treatment.  The concurrent validities of the PTSD-I with the IES and the Global Severity Index (GSI) of the SCL-90R were .54and .66, respectively.  We belatedly recognized that questions of that form would not be useful in measuring changes due to treatment so we revised the time frame of the scale to match that used by the IES and the SCL-90-R, which was one week.  Using the one-week time frame for the PTSD-I the concurrent validities of the PTSD-I with the IES and the GSI were substantially higher, .85 and .88 respectively, see Table 1.

Why do you think that the two versions of the PTSD-I would have such different concurrent validities? Which are the "best" concurrent validities?

Table 1. Concurrent Validities for the PTSD-I and the PDS
Criterion MeasurePTSD-IPDS 
(symptom severity score)
Individual items of the DIS1 ranged from .57 to .99, median = .77  
PTSD diagnosis made by the DIS .941  
PTSD-I symptoms as lifetime measure, Impact of Events Scale (IES) .542 IES-I = .803
IES-A= .663
PTSD-I symptoms within the past week, Impact of Events Scale (IES) .852  
PTSD-I symptoms as lifetime measure, Global Severity Index (GSI) of the SCL-90-R .662  
PTSD-I symptoms within the past week, Global Severity Index (GSI) of the SCL-90-R .882  
Beck Depression Inventory (BDI)   .793
State-Trait Anxiety Scale (STAI-State)   .733
State-Trait Anxiety Scale (STAI-Trait)   .743
1Watson, et al., 1991, study 2
2Wilson, Tinker, Becker, & Gillette, 1994.
3Foa, 1995


B. Construct validity

When you ask about construct validity you are taking a broad view of your test. Does the test adequately measure the underlying construct?  The question is asked both in terms of convergent validity, are test scores related to behaviors and tests that it should be related to and in terms of divergent validity, are test scores unrelated to behaviors and tests that it should be unrelated to?

There is no single measure of construct validity.  Construct validity is based on the accumulation of knowledge about the test and its relationship to other tests and behaviors.

Convergent validity

     Correlational approach. The concurrent validities between the test and other measures of the same domain are correlational measures of convergent validity.

    Contrasted groups.  Another way of measuring convergent validity is to look at the differences in test scores between groups of people who would be expected to score differently on the test.  For example,  Watson, et al. (1991) looked the differences in PTSD-I scores for those who were and were not diagnosed as PTSD by the DIS. They found that the PTSD-I score for those diagnosed with PTSD was higher (M = 58.2; SD = 14.5) than for those not diagnosed as PTSD (M = 28.0; SD = 12.2), t(59) = 8.68, p < .0001.
    Experimental.  Meaningful treatment effect size demonstrates experimental intervention validity for a measure.  Wilson, et al. (1994) computed effect sizes between the EMDR treated group and the delayed treatment group by dividing the difference between the two groups by the standard deviation of the delayed treatment control group (Glass, McGaw, & Smith, 1981). The effect size for the 7-day version of the PTSD-I was 1.28. The comparable effect sizes for the IES and GSI were 1.41 and 0.66, respectively.  You might also report significance tests as measures of experimental convergent validity, but effect sizes are more informative.

Discriminant validity

Discriminant validity, are the test scores unrelated to tests and behaviors in different domains, seems to be less often assessed than is convergent validity.  But the question of discriminant validity is important when you are trying to distinguish your theory from another theory. The subtitle of Daniel Goleman's book Emotional Intelligence is  "Why it can matter more than IQ."   His argument is that emotional IQ is different from traditional IQ and so measures of emotional IQ should not correlate very highly with measures of traditional IQ. This is a question of discriminant validity.



Derogatis, L. R. (1992). SCL-90-R: Administration, scoring and procedures manual--II. Baltimore, MD: Clinical Psychometric Research.

Glass, G. V., McGaw, B., & Smith, M. L. (1981). Meta-analysis in social research. Beverly Hills, CA: Sage Publications.

Goleman, D. (1995). Emotional intelligence: Why it can matter more than IQ. New York: Bantam Books.

Horowitz, M. J., Wilner, N., & Alvarez, W. (1979). Impact of Events Scale: A measure of subjective distress. Psychosomatic Medicine, 41, 209-218.

Robins, L. H., & Helzer, J. E. (1985). Diagnostic Interview Schedule (DIS Version III-A).  St. Louis, MO: Washington University, Department of Psychiatry.

Watson, C. G., Juba, M. P., Manifold, V., Kucala, T., & Anderson, P. E. D. (1991). The PTSD interview: Rationale, description, reliability, and concurrent validity of a DSM-III based technique. Journal of Clinical Psychology, 47, 179-188.

Wilson, S. A., Tinker, R. H., Becker, L. A., & Gillette, C. S. Using the PTSD-I as an outcome measure. Poster presented at the annual meeting of the International Society for Traumatic Stress Studies, Chicago, IL, November 7th, 1994.

-revised 02/16/00 � Lee A. Becker, 1999