Note

0.0(0)

Take a practice test

Chat with Kai

undefined Flashcards

0 Cards0.0(0)

View the linked PDF

Class Notes

Chapter 7 – Test Utility and Utility Analysis

Introduction: Everyday vs Psychometric “Utility”

Everyday meaning = “usefulness”; psychometrics = practical value of using a test, battery, training, or intervention to aid decision-making.
Representative questions utility seeks to answer:
- How does Test A compare to Test B?
- Does adding a test to a battery improve screening?
- Will an admissions/personnel test select better applicants than supervisor judgment alone?
- Does the test save time, money, or other resources?

Core Definition of Test Utility

A measure of efficiency gains when testing is implemented.
Applies to single instruments and full testing programs.
Utility judgments draw on:
1. Reliability data
2. Validity data
3. Additional information (costs, benefits, logistics, ethics, etc.).

Factors Influencing a Test’s Utility

1. Psychometric Soundness

Reliability → sets ceiling for validity; validity (especially criterion-related) ↗︎ utility but does not guarantee it.
Example: Sweat-patch for cocaine detection had r= .92 agreement with urine tests when untampered, yet low utility due to frequent patch tampering (Chawarski et al., 2007).

2. Costs (Economic & Non-Economic)

Direct : purchase, protocols, scoring software, staff time, facilities, insurance, legal, overhead.
Indirect : Cost of not testing or of using an ineffective test (e.g.
airline stops assessing pilots → lawsuits, loss of confidence).
Noneconomic: harm, public safety, morale, ethics (e.g. missing child-abuse fractures due to fewer X-ray views).

3. Benefits (Economic & Non-Economic)

Economic: ↑ productivity, ↓ waste, ↑ profit, ROI.
Noneconomic (often convert to later): better work climate, fewer accidents, lower turnover, social safety from accurate involuntary-hospitalization decisions.

Utility Analysis: Concepts & Purposes

Family of cost-benefit techniques; guides choice among testing, training, interventions.
Typical decisions:
- Choose Test A vs Test B vs no test.
- Add/subtract tools in a battery.
- Compare training programs or intervention elements.
End product = “educated decision” (optimal course of action).

Expectancy Data

Scatterplot → expectancy table showing probability of criterion success per predictor band.
Classic aids: Taylor–Russell tables & Naylor–Shine tables.
- Inputs: validity ρ_{xy}; selection ratio; base rate.
- Output: percentage of hires predicted successful after adding test.
- Limitation: assumes linear predictor-criterion relation & clear pass/fail criterion.

Close-Up: Flecha Esmaralda Road Test (FERT)

Scenario: South-American courier hiring.
Existing policy = license + no criminal record → 50 % of hires rated “qualified.”
New on-road FERT studied (predictive validity r=.40).
Three illustrative cut scores:
1. 18 (low)
- Selection ratio =.95 (57/60 hired).
- Positive Predictive Value (PPV) =.526.
- False-negatives 0 % but utility gain trivial.
1. 80 (high)
- Selection ratio =.10 (6/60 hired).
- PPV =1.00; overall accuracy only 60 %.
- Requires ≈ 600 applicants for 60 hires (costly recruitment).
1. 48 (moderate) – chosen
- Selection ratio =.517 (31/60).
- Miss rate ↓ from 50 % → 15 %.
- PPV =.839.
- Misclassifications cut from 30 to 9 drivers.
- ROI computed with BCG formula ≈ 12.5 : 1 (see below).

Brogden–Cronbach–Gleser (BCG) Utility Formula

(\text{Utility Gain}) = N\,T\,r{xy}\,SDy\,\bar Z_m - N\,C

Example values (FERT):
- N=60 drivers/yr, T=1.5 yr tenure,
- r{xy}=0.40, SDy=\$9,000 (≈40 % salary),
- \bar Z_m = 1.0, C=\$200/test → total test cost =\$24,000.
- \text{Benefit}=60\times1.5\times0.40\times9000\times1=\$324,000.
- \text{Utility Gain}=324{,}000-24{,}000=\$300,000.
- Each testing dollar returns >\$12.50.

Productivity Variant

(\text{Productivity Gain}) = N\,T\,r{xy}\,SDp\,\bar Z_m - N\,C

SD_p = SD of output units (not dollars).

Decision Theory & Cut Scores

Four potential outcomes: True Positive, False Positive, False Negative, True Negative.
Trade-off managed via selection ratio & cut score.
Guidelines: set stricter cutoffs when false-positives are more costly (e.g. airline pilots).

Cut-Score Taxonomy

Fixed/Absolute (criterion-referenced): e.g.
driver’s road test.
Relative/Norm-referenced: top 10 % get A.
Multiple Cut Scores: tiers (A–F).
Multiple Hurdle process: must pass sequential stages to continue (application → test → interview …).
Compensatory Model: weighted predictors; high score in one area offsets low in another (implemented via multiple regression).

Methods for Establishing Cut Scores

Angoff: SMEs judge probability minimally competent candidate answers each item correctly; average probabilities = cut.
- Pros: simple; Cons: low inter-rater reliability possible.
Known/Contrasting Groups: test administered to groups already known pass/fail; cut = score at intersection of distributions.
- Sensitive to group definition choices.
IRT-Based
- Item-Mapping: SMEs review histogram columns of equal item difficulty.
- Bookmark: SMEs place bookmark in ordered item booklet at point minimally competent examinee would answer correctly 50 % of time.
- Advantages: ties cut to item difficulty, not raw % correct.
Additional historical methods: Predictive Yield (Thorndike), Decision-Theoretic approaches, discriminant-function analysis.

Practical Issues in Utility Studies

Applicant Pool Size & Quality: many models assume limitless applicants & 100 % offer acceptance → real-world overestimation.
- Empirical adjustment: reduce projected gains up to 80 % (Murphy 1986).
Job Complexity: higher complexity → wider SD of performance, affects SD_y and utility.
Base Rates: at extreme values a test adds little incremental accuracy.

Real-World Illustration: Police Body Cameras (Ariel et al., 2015)

Randomized controlled trial: 988 shifts, Rialto CA.
“Camera” shifts vs “No-Camera” shifts.
Results: Use-of-force incidents \downarrow by > 50 %; citizen complaints \downarrow dramatically.
Demonstrates high diagnostic/treatment utility of BWC technology despite high initial procurement costs.

Ethical, Philosophical & Practical Implications

Misuse of utility arguments can lead to discriminatory or unsafe practices (e.g.
dropping assessment to save ).
Utility not purely monetary: social justice, safety, individual rights, and morale weigh in.
Decision-makers must integrate psychometrics with prudence, vision, common sense.

Key Formulas & Numerical References

Taylor–Russell example: base rate $=.60$, selection ratio $=.20$, validity $=.55$ → projected success $=.88$.
ROI example: \text{ROI}=\frac{\$300,000}{\$24,000}=12.5:1.
Selection ratios illustrated: SR=.95,.517,.10 for cut scores 18, 48, 80 respectively.

Vocabulary Quick-Reference

Utility, Utility Analysis, Utility Gain, ROI.
Costs vs Benefits.
Psychometric Soundness.
Cut Score (fixed, relative, multiple, hurdle).
Angoff, Known-Groups, Bookmark, Item-Mapping.
Brogden–Cronbach–Gleser formula.
Decision-theory: TP, FP, FN, TN; Sensitivity, Specificity.
Compensatory vs Multiple-Hurdle selection.

Connections & Foundational Principles

Reliability → ceiling on validity; validity often correlates with utility but context matters.
Concepts integrate with earlier chapters on criterion-related validity, expectancy data, selection ratio, base rate.
Utility perspective extends psychometrics from “measurement quality” to organizational & societal impact.

Study Prompts

Compute utility gain given new r{xy}, SDy, selection ratio.
Contrast Angoff & Bookmark in settings where item difficulty varies widely.
Debate ethical boundaries: when is a false-negative more acceptable than a false-positive?
Design a multiple-hurdle hiring system for airline pilots incorporating both fixed and compensatory elements.

Note

0.0(0)

Take a practice test

Chat with Kai

undefined Flashcards

0 Cards0.0(0)

View the linked PDF

Class Notes