Note
0.0(0)
Class Notes

Validity in Psychological Testing – Review Flashcards

Concept of Validity

  • Everyday usage: “valid” = meaningful, well-grounded, executed with proper formalities.

  • Psychological testing usage: a judgement of how well a test measures what it purports to measure in a particular context.

    • Focus is on the appropriateness of inferences drawn from scores, not on the test per se.

  • Validity is not universal; bounded by purpose, population, culture, and time.

  • Validity can fade as culture or technology changes → periodic re-validation is essential.

Validation Process & Local Studies

  • Validation = gathering & evaluating evidence.

    • Test developer supplies evidence in the manual.

    • Test user may conduct local validation studies when:

    • Using a new population.

    • Modifying format (e.g., Braille, translation).

  • If local study impossible, users must consult independent literature or seek expert consultation before use.

Classic “Trinitarian” Model (Content, Criterion, Construct)

  1. Content Validity – representativeness of items.

  2. Criterion-Related Validity – relationship with external measure(s).

  3. Construct Validity – integration of all evidence within a theoretical framework.

  • Construct validity functions as “umbrella validity.”

  • Critics (e.g., Messick) argue for a unitary model incorporating social consequences.

Other Varieties of Validity

  • Ecological Validity – generalizability to real-life contexts and moments (ties to Ecological Momentary Assessment).

  • Face Validity – appearance of relevance to test-taker; PR value rather than technical merit.

    • High face validity can boost cooperation & acceptance (e.g., Introversion/Extraversion questionnaire).

Content Validity in Depth

  • Definition: degree to which items sample the universe of behaviors the construct covers.

  • Achieved through a test blueprint:

    • Specifies topics, item counts, weightings, formats.

    • Informed by syllabi, textbooks, SMEs, job analyses.

  • Example

    • Assertiveness test → items span home, job, and social scenarios.

  • Cultural relativity: what counts as “content” depends on history & politics.

    • Gavrilo Princip example illustrates differing correct answers across Bosnian ethnic texts.

Criterion-Related Validity

Basics

  • Uses an external criterion (standard) to evaluate test scores.

  • Two forms:

    • Concurrent Validity – scores & criterion collected simultaneously.

    • Predictive Validity – criterion obtained in the future.

Characteristics of a Good Criterion

  1. Relevant – directly linked to construct.

  2. Valid – itself measured accurately.

  3. Uncontaminated – independent of predictor (avoid criterion contamination).

Statistical Evidence

  • Validity Coefficient r_{xy}: correlation between test (X) and criterion (Y).

    • Computed typically with Pearson r:
      r=\frac{\sum (Xi-\bar X)(Yi-\bar Y)}{\sqrt{\sum (Xi-\bar X)^2}\sqrt{\sum (Yi-\bar Y)^2}}

    • Affected by range restriction/inflation.

  • Expectancy Data & Charts – show likelihood of specific outcomes per score band.

Hit/Miss Terminology

  • Hit rate – proportion correctly classified.

  • Miss rate – proportion misclassified.

    • False Positive – predicted "has trait" but doesn’t.

    • False Negative – predicted "lacks trait" but actually has it.

  • Base Rate – prevalence of trait in population; influences predictive power.

Incremental Validity

  • Added value of a new predictor (
    \Delta R^2 via hierarchical regression).

  • Maximal when:

    • Strong correlation with criterion.

    • Low correlation with existing predictors (non-redundant).

  • Emotional Intelligence research: modest incremental validity beyond g and personality.

Illustrative Studies

  • BDI with Adolescents: concurrent study vs. established adolescent instrument → adequate validity.

  • Corporate Selection (Dr. Shoemaker): test with high face validity but poor criterion validity retained only for realistic job preview.

Construct Validity

  • Construct = unobservable trait inferred from theory (e.g., intelligence, anxiety).

  • Evidence types:

    1. Homogeneity / Unidimensionality

    • Internal consistency; factor analysis.

    1. Developmental Changes

    • Scores vary with age/time as theory predicts.

    1. Pretest–Posttest Changes

    • Scores shift after interventions (therapy, training).

    1. Contrasted (Known) Groups

    • Expected score differences among distinct groups (e.g., depressed vs. non-depressed).

    1. Convergent Evidence – high correlation with related measures.

    2. Discriminant Evidence – low correlation with unrelated constructs.

    3. Multitrait-Multimethod Matrix (MTMM) – simultaneous appraisal of convergent & discriminant validity.

    4. Factor Analysis

    • Exploratory (EFA) identifies latent factors.

    • Confirmatory (CFA) tests hypothesized structure.

      • Factor Loading = weight linking item to factor.

Example: Marital Satisfaction Scale (MSS)
  • Reduced from 73→48 items via item–total correlations > .50.

  • Validated through:

    • Homogeneity.

    • Pre-/post-therapy score changes.

    • Contrasted happily vs. unhappy couples.

    • Convergent r = 0.79 with Marital Adjustment Test.

Example: Constructive vs. Unconstructive Worry Questionnaire (CUWQ)
  • 29→18 items after EFA.

  • CFA supported 2-factor model; criterion relations (trait anxiety, punctuality, wildfire preparedness) matched predictions.

Bias, Fairness & Rating Errors

Test Bias

  • Statistical artefact causing systematic error for a group.

  • Detection: DIF analyses, intercept bias (consistent under/over-prediction), slope bias (weaker correlations).

Test Fairness

  • Broader, value-laden issue: impartial, just, equitable use.

  • A test can be valid yet used unfairly (e.g., political repression, cold-war USSR).

Rating Errors (Criterion Issues)

  1. Leniency / Generosity Error – ratings too high.

  2. Severity Error – systematically low ratings.

  3. Central Tendency Error – avoidance of extremes.

  4. Halo Effect – undifferentiated positive (or negative) impression spills over.

  • Remedies: rater training, forced rankings, behavioural anchors.

Remedies for Adverse Impact / Score Adjustments

Psychometric techniques (each with ethical & legal debates):

  • Addition of constant points.

  • Differential scoring / empirical keying by group.

  • Elimination of items showing Differential Item Functioning (DIF).

  • Differential cutoffs.

  • Separate ranking lists.

  • Within-group (race) norming (now illegal in U.S. employment).

  • Banding & sliding bands.

  • Explicit preference policies.

Policy Debates

  • Pro-adjustment: redress past wrongs, ensure diversity, correct biased items.

  • Anti-adjustment: undermines individual merit, may harm intended beneficiaries, violates legislation (e.g., U.S. Civil Rights Act 1991 §106).

Connections & Implications

  • Validity evidence guides ethical assessment, informs legal defensibility, and affects societal outcomes (employment, admissions, clinical decisions).

  • Cultural context crucial: item interpretations, historical narratives, political climates (e.g., Bosnian textbooks; Palestinian exam censorship).

  • Practical takeaway: Continuous monitoring, transparent reporting, and multi-method evidence are mandatory for responsible test use.

Key Formulas & Statistics (LaTeX)

  • Pearson correlation: r=\frac{\sum (X-\bar X)(Y-\bar Y)}{\sqrt{\sum (X-\bar X)^2}\sqrt{\sum (Y-\bar Y)^2}}

  • Regression model with new predictor: Y=\beta0 + \beta1 X1 + \beta2 X2 + \epsilon; incremental validity assessed via \Delta R^2 when X2 added.

  • Hit/Miss classification table (simplified):
    \begin{array}{c|cc}
    & \text{Predicted +} & \text{Predicted -} \
    \hline
    \text{Actual +} & \text{Hit} & \text{False Negative} \
    \text{Actual -} & \text{False Positive} & \text{Hit} \
    \end{array}

Summary Checklist for Exam Prep

  • Understand definitions of all validity types.

  • Be able to design a validation study, choosing appropriate criteria & statistics.

  • Recognize cultural and ethical dimensions of test use.

  • Calculate and interpret r_{xy}, hit/miss rates, and incremental \Delta R^2.

  • Identify rating errors and propose corrective actions.

  • Discuss pros/cons of various bias-mitigation techniques.

Note
0.0(0)
Class Notes