Validity Notes

Validity

The Concept of Validity

  • Validity in testing is an evaluation of how well a test measures what it intends to measure within a specific context.

  • A test's validity is specific to its use, the population taking it, and the time it is administered.

  • No test is universally valid for all times, uses, or test-taker populations.

Validation

  • Validation is the process of gathering and assessing evidence to support a test's validity.

  • Both test users and developers are involved in validating a test for a particular purpose.

  • Test developers are responsible for providing validity evidence in the test manual.

  • Test users may conduct their own validation studies with their own test-taker groups; these are called local validation studies.

Local Validation Studies

  • Local validation studies are essential when a test user modifies the test's format, instructions, language, or content.

  • Necessary when using a test with a population that differs significantly from the standardization population.

Validity Categories

  • Face Validity

  • Content validity

  • Criterion-related validity

    • Concurrent Validity

    • Predictive Validity

  • Construct validity

Face Validity

  • Face validity is about what a test appears to measure to the test-taker, rather than what it actually measures.

  • It's a judgment of how relevant the test items seem to be.

  • A test with high face validity appears, on the surface, to measure what it claims to measure.

  • Judgments about face validity are from the test-taker's viewpoint, unlike reliability and other forms of validity, which are from the test user's perspective.

  • A lack of face validity can decrease test-taker confidence, cooperation, and motivation.

  • A test lacking face validity may still be relevant and useful, but negative perceptions can arise from test-takers, parents, legislators, etc.

Content Validity

  • Content validity is an evaluation of how well a test samples the behavior representative of the entire range of behavior it was designed to assess.

  • For example, a test of assertiveness must cover a wide variety of situations to have content validity.

Content Validity – Educational Achievement Tests

  • An educational achievement test is considered content-valid when the proportion of material covered in the test matches the proportion covered in the course.

Content Validity – Employment Tests

  • For an employment test to be content-valid, it must be a representative sample of the job-related skills needed for the job.

  • Behavioral observation of successful employees is used to determine the content areas to include in the test.

Measuring Content Validity

  • C.H. Lawshe developed a method to measure content validity by gauging agreement among raters regarding the essential nature of an item.

  • Raters answer the question: Is the skill or knowledge measured by this item:

    • Essential?

    • Useful but not essential?

    • Not necessary to the performance of the job?

  • If more than half the panelists rate an item as essential, it has some content validity.

  • Higher agreement among panelists indicates greater content validity.

  • Lawshe developed the content validity ratio (CVR) formula:

    • CVR = (n_e – (N/2)) / (N/2)

    • Where:

      • n_e = number of panelists indicating essential

      • N = total number of panelists

  • When fewer than half the panelists indicate "essential,” the CVR is negative.

  • When exactly half the panelists indicate essential, the CVR is zero.

  • When more than half, but not all, the panelists indicate “essential,” the CVR is positive and ranges between 0.00 and 0.99.

  • Lawshe suggested eliminating items if the observed agreement has more than a 5% chance of occurring by chance.

Measuring Content Validity – Minimal Values of the Content Validity Ratio

  • Minimal CVR values for a 5% level of chance
    Here's a table of minimum CVR values based on the number of panelists:

Number of panelists

Minimum Value

5

0.99

6

0.99

7

0.99

8

0.75

10

0.62

12

0.56

14

0.51

15

0.49

20

0.42

25

0.37

30

0.33

35

0.31

40

0.29

Criterion-Related Validity

  • Criterion-related validity assesses how well a test score can be used to predict an individual's standing on a measure of interest (the criterion).

Criterion-Related Validity – Concurrent & Predictive Validity

  • Two types of validity evidence:

    • Concurrent validity: The degree to which a test score relates to a criterion measure obtained at the same time.

    • Predictive validity: The degree to which a test score predicts a criterion measure.

What is a Criterion?

  • A criterion is the standard against which a test or test score is evaluated.

  • It can be a test score, behavior, rating, diagnosis, etc.

Characteristics of a Criterion

  • An adequate criterion should be relevant.

  • An adequate criterion must also be valid for the purpose for which it is being used

  • If one test (X) is being used as the criterion to validate a second test (Y), then evidence should exist that test X is valid

  • If the criterion used is a rating made by a judge or a panel, then evidence should exist that the rating is valid (e.g. training and experience of the raters)

Characteristics of a Criterion

  • Ideally, a criterion should be uncontaminated

    • Criterion contamination occurs when the criterion measure is based on predictor measures.

    • If a psychiatric diagnosis (criterion) is based on MMPI-2 scores (predictor), the predictor measure has contaminated the criterion measure.

Criterion-Related Validity – Concurrent Validity

  • Concurrent validity is established when test scores and criterion measures are obtained at the same time.

  • It indicates how well test scores estimate an individual's current standing on a criterion.

  • If a psychodiagnostic test is validated against a criterion of already diagnosed psychiatric patients, it is concurrent validation.

  • Once established, the test can provide a faster, less expensive method for diagnosis or classification.

  • Concurrent validity can also explore the validity of Test A with respect to Test B if Test B is already proved as a valid test.

Criterion-Related Validity – Predictive Validity

  • Test scores are obtained at one time, and criterion measures are obtained in the future.

  • The intervening event between the test scores and criterion measures can be training, experience, therapy, medication, or simply the passage of time.

Criterion-Related Validity – Predictive Validity

  • Predictive validity indicates how accurately test scores predict a criterion measure obtained in the future.

Judgments of Criterion-Related Validity

  • Judgments of criterion-related validity (concurrent or predictive) are based on:

    • Validity Coefficient

    • Expectancy Data

Criterion-Related Validity – The Validity Coefficient

  • The validity coefficient is a correlation coefficient measuring the relationship between test scores and criterion scores.

  • A correlation coefficient between a score on a psychodiagnostic test and a criterion score assigned by psychodiagnosticians is an example.

  • Typically, the Pearson correlation coefficient is used to determine the validity between the two measures.

    • Depending on variables such as the type of data, the sample size, and the shape of the distribution, other correlation coefficients could be used

  • For example, in correlating self-rankings of performance on some job with rankings made by job supervisors, the formula for the Spearman rho rank-order correlation would be employed

Criterion-Related Validity – Predictive Validity Incremental Validity

  • Test users predicting a criterion from test scores are often interested in the utility of multiple predictors.

  • The value of multiple predictors depends on:

    • Each predictor having criterion-related predictive validity.

    • Additional predictors possessing incremental validity.

    • Incremental validity is the degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use.

Criterion-Related Validity – Predictive Validity Incremental Validity

  • Incremental validity may be used when predicting something like academic success in college.

  • Grade-point average (GPA) at the end of the first year may be used as a measure of academic success.

  • A study of potential predictors of GPA may reveal that time spent in the library and time spent studying are highly correlated with GPA.

  • How much sleep a student’s roommate allows the student to have during exam periods correlates with GPA to a smaller extent.

Criterion-Related Validity – Predictive Validity Incremental Validity

  • One approach, employing the principles of incremental validity, is to start with the best predictor, the predictor that is most highly correlated with GPA.

  • This may be time spent studying. Then, using multiple-regression techniques, one would examine the usefulness of other predictors.

  • The test should also be efficient.

Criterion-Related Validity – Predictive Validity Incremental Validity

  • Even though time in the library is highly correlated with GPA, it may not possess incremental validity if it overlaps too much with the first predictor, time spent studying.

  • If time spent studying and time in the library are so highly correlated with each other that they reflect essentially the same thing, then only one needs to be included as a predictor.

Criterion-Related Validity – Predictive Validity Incremental Validity

  • By contrast, the variable of how much sleep a student’s roommate allows the student to have during exams may have good incremental validity

  • This is because it reflects a different aspect of preparing for exams (resting) from the first predictor studying

Criterion-Related Validity – Expectancy Data

  • Expectancy data provide information that can be used in evaluating the criterion-related validity of a test

  • Using a score obtained on some test(s) or measure(s), expectancy tables illustrate the likelihood that the testtaker will score within some interval of scores on a criterion measure

  • An expectancy table shows the percentage of people within specified test-score intervals who subsequently were placed in various categories of a certain criterion (for example, placed in “passed” category or “failed category)

  • An expectancy table may be created by a scatterplot.

Criterion-Related Validity – Expectancy Data

  • Example: Relationship between scores on the Language Usage Subset of the Differential Aptitude Test (DAT) and course grades in American history for eleventh-grade boys

  • Of the students who scored between 40 and 60 on the DAT, 83% scored 80 or above in their American history course

Criterion-Related Validity – Expectancy Data

  • Expectancy Table Example:

Language Usage/ Subtest Score

0-69

70-79

80-89

90-100

40 and above

0

17

29

54

30-39

8

46

29

17

20-29

15

59

24

2

Below 20

37

57

7

0