Chapter 8 Annotations

In-Depth Notes on Validity

1. Introduction to Validity

Definition: Validity refers to the extent to which a test measures what it intends to measure and the degree to which the evidence and theory support the interpretations of test scores for their intended use.

Core Question: "Are we accurately measuring the construct we claim to be measuring?"

Importance: A test cannot be useful without validity. Even a highly reliable test is worthless if it does not measure the intended construct. Validity is the most fundamental consideration in developing and evaluating a test.

Example: If a test is purported to measure intelligence using only math and vocabulary questions, its validity hinges on whether math and vocabulary skills truly represent the broader construct of intelligence.

2. The Evolution of Validity: From Types to a Unitary Concept

Traditional View: Validity was divided into three distinct types:

1. Content Validity

2. Criterion-Related Validity

3. Construct Validity

Modern View (Unitary Concept): Validity is now understood as a single, unified concept. There is only one construct validity.

The old "types" are now considered sources of evidence that contribute to the overall argument for the validity of a test score interpretation.

All accumulated evidence is examined together to build a case for validity.

3. Sources of Validity Evidence

These are the categories of evidence used to build the case for a test's validity.

A. Evidence Based on Test Content

Definition: Evidence that the test content is a representative sample of the broader domain of knowledge, skills, or behaviors that the construct is meant to represent.

Key Process: A qualitative review by expert judges who systematically evaluate the test's items and structure.

What is Evaluated:

Item Relevance: Is each item relevant to the construct being measured?

Content Coverage: Does the test cover all important areas of the construct domain (e.g., via a Table of Specifications or test blueprint)?

Threats to Validity:

Construct Underrepresentation: The test fails to include important aspects of the construct.

Construct-Irrelevant Variance: The test is influenced by factors unrelated to the construct (e.g., confusing wording that tests reading ability instead of the target knowledge).

Quantitative Method: Content Validity Ratio (CVR) – A statistical measure of agreement among experts on whether an item is "essential."

B. Evidence Based on Response Processes

Definition: Evidence that examines the actual cognitive, affective, or behavioral processes that test-takers use when responding to items. It investigates whether respondents are engaging with the test in the way the developers intended.

Example: Using "think-aloud" protocols where participants verbalize their thoughts while solving a problem to ensure they are using the intended reasoning skills and not just test-taking strategies.

C. Evidence Based on Internal Structure

Definition: Evidence that the internal relationships among test items conform to the hypothesized structure of the construct.

Primary Method: Factor Analysis (see section below).

Application: If a test is designed to measure a single, unified trait (unidimensional), items should be highly intercorrelated. If it is designed to measure several distinct sub-traits (multidimensional), items should cluster into the expected subgroups.

D. Evidence Based on Relations to Other Variables

This is the empirical heart of construct validity, showing how the test scores relate to other measures.

Convergent Validity: The extent to which scores on the test are highly correlated with scores on other tests that are designed to measure the same or similar constructs.

Example: A new depression scale should correlate highly with established, valid depression scales.

Discriminant (Divergent) Validity: The extent to which scores on the test are not correlated with scores on tests designed to measure different or unrelated constructs.

Example: A depression scale should not correlate highly with a scale measuring assertiveness.

Criterion-Related Validity: The extent to which test scores can predict or correlate with a specific, concrete criterion (a measure of performance or outcome).

Predictive Validity: The test (predictor) is administered, and the criterion is measured in the future. Used for forecasting.

Example: SAT scores (predictor) correlated with first-year college GPA (criterion).

Concurrent Validity: The test and the criterion are measured at approximately the same time. Used for diagnosing current status.

Example: A quick diagnostic test for a medical condition (predictor) correlated with a full, clinical diagnosis (criterion) given at the same time.

Validity Coefficient: The correlation coefficient (e.g., Pearson's r) that quantifies the relationship between the test and the criterion.

Multi-trait-Multi-method Matrix (MTMM): A complex matrix of correlation coefficients used to assess construct validity by examining:

Convergence: High correlations for the same trait measured by different methods.

Discrimination: Low correlations for different traits measured by the same method.

E. Evidence Based on Consequences of Testing

Definition: Evidence that investigates the intended and unintended consequences of test use. It questions whether the social outcomes of test interpretation and use align with the intended purpose.

Considerations: Does the test lead to fair decisions? Does it create bias? Does it have positive educational or societal impacts

4. Factor Analysis

Definition: A family of statistical techniques used to analyze the interrelationships among a large number of variables (test items) and to explain them in terms of their common, underlying dimensions (called factors).

Goal: To reduce data complexity by identifying clusters of items that correlate highly with each other, suggesting they measure the same underlying factor or construct.

Types:

Exploratory Factor Analysis (EFA): Used when the underlying factor structure is unknown. The analysis "explores" the data to discover the number and nature of the factors.

Confirmatory Factor Analysis (CFA): Used to test a pre-specified, hypothesized factor structure. The researcher tests how well the data fit their theoretical model.

Key Concepts:

Eigenvalue: A measure of the amount of variance accounted for by a factor. The Kaiser-Guttman rule suggests retaining factors with eigenvalues greater than 1.0.

Scree Plot: A graphical method for deciding the number of factors to retain by looking for an "elbow" or point where the curve flattens out.

5. The Relationship Between Reliability and Validity

This is a critical and hierarchical relationship.

Reliability is Necessary but Not Sufficient for Validity: A test must be reliable to be valid. You cannot accurately measure a construct (validity) if you cannot measure it consistently (reliability).

The Cap Analogy: Reliability sets the upper limit for validity. A test with low reliability cannot have high validity because the scores are too inconsistent to be a true representation of anything.

Key Distinction: A test can be reliable without being valid. It can consistently measure the wrong thing. However, a test cannot be valid without being reliable.

Summary of Key Definitions

Validity: The degree to which evidence and theory support the interpretations of test scores for their intended use.

Construct Validity: The unified concept that all validity evidence contributes to, supporting the interpretation of a test as measuring a specific psychological construct.

Convergent Validity: Evidence that a test correlates highly with other tests measuring the same construct.

Discriminant Validity: Evidence that a test does not correlate with tests measuring different constructs.

Criterion-Related Validity: Evidence that test scores predict or correlate with a specific performance outcome.

Predictive Validity: Criterion is measured in the future.

Concurrent Validity: Criterion is measured at the same time.

Content Validity: Evidence that the test content adequately represents the entire domain of the construct.

Factor Analysis: A statistical method for identifying the underlying factors that explain the pattern of correlations among variables.

NValidity Coefficient: The correlation between a test score and a criterion measure.