Note
0.0(0)
Class Notes

Chapter 7 – Test Utility and Utility Analysis

Introduction: Everyday vs Psychometric “Utility”

  • Everyday meaning = “usefulness”; psychometrics = practical value of using a test, battery, training, or intervention to aid decision-making.

  • Representative questions utility seeks to answer:

    • How does Test A compare to Test B?

    • Does adding a test to a battery improve screening?

    • Will an admissions/personnel test select better applicants than supervisor judgment alone?

    • Does the test save time, money, or other resources?

Core Definition of Test Utility

  • A measure of efficiency gains when testing is implemented.

  • Applies to single instruments and full testing programs.

  • Utility judgments draw on:

    1. Reliability data

    2. Validity data

    3. Additional information (costs, benefits, logistics, ethics, etc.).

Factors Influencing a Test’s Utility

1. Psychometric Soundness

  • Reliability → sets ceiling for validity; validity (especially criterion-related) ↗︎ utility but does not guarantee it.

  • Example: Sweat-patch for cocaine detection had r= .92 agreement with urine tests when untampered, yet low utility due to frequent patch tampering (Chawarski et al., 2007).

2. Costs (Economic & Non-Economic)

  • Direct : purchase, protocols, scoring software, staff time, facilities, insurance, legal, overhead.

  • Indirect : Cost of not testing or of using an ineffective test (e.g.
    airline stops assessing pilots → lawsuits, loss of confidence).

  • Noneconomic: harm, public safety, morale, ethics (e.g. missing child-abuse fractures due to fewer X-ray views).

3. Benefits (Economic & Non-Economic)

  • Economic: ↑ productivity, ↓ waste, ↑ profit, ROI.

  • Noneconomic (often convert to later): better work climate, fewer accidents, lower turnover, social safety from accurate involuntary-hospitalization decisions.

Utility Analysis: Concepts & Purposes

  • Family of cost-benefit techniques; guides choice among testing, training, interventions.

  • Typical decisions:

    • Choose Test A vs Test B vs no test.

    • Add/subtract tools in a battery.

    • Compare training programs or intervention elements.

  • End product = “educated decision” (optimal course of action).

Expectancy Data

  • Scatterplot → expectancy table showing probability of criterion success per predictor band.

  • Classic aids: Taylor–Russell tables & Naylor–Shine tables.

    • Inputs: validity ρ_{xy}; selection ratio; base rate.

    • Output: percentage of hires predicted successful after adding test.

    • Limitation: assumes linear predictor-criterion relation & clear pass/fail criterion.

Close-Up: Flecha Esmaralda Road Test (FERT)

  • Scenario: South-American courier hiring.

  • Existing policy = license + no criminal record → 50 % of hires rated “qualified.”

  • New on-road FERT studied (predictive validity r=.40).

  • Three illustrative cut scores:

    1. 18 (low)

    • Selection ratio =.95 (57/60 hired).

    • Positive Predictive Value (PPV) =.526.

    • False-negatives 0 % but utility gain trivial.

    1. 80 (high)

    • Selection ratio =.10 (6/60 hired).

    • PPV =1.00; overall accuracy only 60 %.

    • Requires ≈ 600 applicants for 60 hires (costly recruitment).

    1. 48 (moderate) – chosen

    • Selection ratio =.517 (31/60).

    • Miss rate ↓ from 50 % → 15 %.

    • PPV =.839.

    • Misclassifications cut from 30 to 9 drivers.

    • ROI computed with BCG formula ≈ 12.5 : 1 (see below).

Brogden–Cronbach–Gleser (BCG) Utility Formula

(\text{Utility Gain}) = N\,T\,r{xy}\,SDy\,\bar Z_m - N\,C

  • Example values (FERT):

    • N=60 drivers/yr, T=1.5 yr tenure,

    • r{xy}=0.40, SDy=\$9,000 (≈40 % salary),

    • \bar Z_m = 1.0, C=\$200/test → total test cost =\$24,000.

    • \text{Benefit}=60\times1.5\times0.40\times9000\times1=\$324,000.

    • \text{Utility Gain}=324{,}000-24{,}000=\$300,000.

    • Each testing dollar returns >\$12.50.

Productivity Variant

(\text{Productivity Gain}) = N\,T\,r{xy}\,SDp\,\bar Z_m - N\,C

  • SD_p = SD of output units (not dollars).

Decision Theory & Cut Scores

  • Four potential outcomes: True Positive, False Positive, False Negative, True Negative.

  • Trade-off managed via selection ratio & cut score.

  • Guidelines: set stricter cutoffs when false-positives are more costly (e.g. airline pilots).

Cut-Score Taxonomy

  • Fixed/Absolute (criterion-referenced): e.g.
    driver’s road test.

  • Relative/Norm-referenced: top 10 % get A.

  • Multiple Cut Scores: tiers (A–F).

  • Multiple Hurdle process: must pass sequential stages to continue (application → test → interview …).

  • Compensatory Model: weighted predictors; high score in one area offsets low in another (implemented via multiple regression).

Methods for Establishing Cut Scores

  • Angoff: SMEs judge probability minimally competent candidate answers each item correctly; average probabilities = cut.

    • Pros: simple; Cons: low inter-rater reliability possible.

  • Known/Contrasting Groups: test administered to groups already known pass/fail; cut = score at intersection of distributions.

    • Sensitive to group definition choices.

  • IRT-Based

    • Item-Mapping: SMEs review histogram columns of equal item difficulty.

    • Bookmark: SMEs place bookmark in ordered item booklet at point minimally competent examinee would answer correctly 50 % of time.

    • Advantages: ties cut to item difficulty, not raw % correct.

  • Additional historical methods: Predictive Yield (Thorndike), Decision-Theoretic approaches, discriminant-function analysis.

Practical Issues in Utility Studies

  • Applicant Pool Size & Quality: many models assume limitless applicants & 100 % offer acceptance → real-world overestimation.

    • Empirical adjustment: reduce projected gains up to 80 % (Murphy 1986).

  • Job Complexity: higher complexity → wider SD of performance, affects SD_y and utility.

  • Base Rates: at extreme values a test adds little incremental accuracy.

Real-World Illustration: Police Body Cameras (Ariel et al., 2015)

  • Randomized controlled trial: 988 shifts, Rialto CA.

  • “Camera” shifts vs “No-Camera” shifts.

  • Results: Use-of-force incidents \downarrow by > 50 %; citizen complaints \downarrow dramatically.

  • Demonstrates high diagnostic/treatment utility of BWC technology despite high initial procurement costs.

Ethical, Philosophical & Practical Implications

  • Misuse of utility arguments can lead to discriminatory or unsafe practices (e.g.
    dropping assessment to save ).

  • Utility not purely monetary: social justice, safety, individual rights, and morale weigh in.

  • Decision-makers must integrate psychometrics with prudence, vision, common sense.

Key Formulas & Numerical References

  • Taylor–Russell example: base rate $=.60$, selection ratio $=.20$, validity $=.55$ → projected success $=.88$.

  • ROI example: \text{ROI}=\frac{\$300,000}{\$24,000}=12.5:1.

  • Selection ratios illustrated: SR=.95,.517,.10 for cut scores 18, 48, 80 respectively.

Vocabulary Quick-Reference

  • Utility, Utility Analysis, Utility Gain, ROI.

  • Costs vs Benefits.

  • Psychometric Soundness.

  • Cut Score (fixed, relative, multiple, hurdle).

  • Angoff, Known-Groups, Bookmark, Item-Mapping.

  • Brogden–Cronbach–Gleser formula.

  • Decision-theory: TP, FP, FN, TN; Sensitivity, Specificity.

  • Compensatory vs Multiple-Hurdle selection.

Connections & Foundational Principles

  • Reliability → ceiling on validity; validity often correlates with utility but context matters.

  • Concepts integrate with earlier chapters on criterion-related validity, expectancy data, selection ratio, base rate.

  • Utility perspective extends psychometrics from “measurement quality” to organizational & societal impact.

Study Prompts

  • Compute utility gain given new r{xy}, SDy, selection ratio.

  • Contrast Angoff & Bookmark in settings where item difficulty varies widely.

  • Debate ethical boundaries: when is a false-negative more acceptable than a false-positive?

  • Design a multiple-hurdle hiring system for airline pilots incorporating both fixed and compensatory elements.

Note
0.0(0)
Class Notes