Signal Detection Theory and Z-scores — Study Notes

Z-scores, standard deviation, and why they matter

Z-score intuition: a z-score tells you how far an observation is from the mean in units of standard deviation. It’s the ratio of the deviation to the data’s variability.
- Concept: for any data point x, with distribution mean μ and standard deviation σ, the z-score is $z = \frac{x - \mu}{\sigma}$ .
- When we convert measurement differences into z-scores, units cancel out, giving a unitless, comparable measure.
Why standard deviation is fundamental:
- It quantifies how much your data vary trial-to-trial.
- A big difference between conditions is more meaningful if variability (σ) is small; a big σ can make the same difference look trivial.
- Statistics often compare an effect size (difference) to variability (noise) to judge meaningfulness.
Practical note from the lecture:
- You don’t necessarily need to calculate σ on an exam, but you should understand what σ (the spread) represents and why a standard deviation matters conceptually.
- Z-scores let you compare effects across different measurement units (milliseconds, volts, etc.).

Introduction to Signal Detection Theory (SDT)

Core idea:
- There are real physical events (the stimulus) and perceptual decisions our noisy nervous system makes about them.
- Perception is noisy; we cannot measure the stimulus directly in the brain, only via behavior.
Reality vs. perception:
- Reality: the target is either present or absent (binary).
- Perception: we may perceive it or not, with errors due to internal noise and external conditions.
The normal curve premise:
- When information from a stimulus is processed by the brain, the resulting perceptual strength is assumed to be normally distributed across trials.
- Noise alone gives a distribution; the presence of a signal shifts that distribution upward (signal+noise).
Two distributions:
- Noise distribution: represents perceptual strength with no target.
- Signal+Noise distribution: represents perceptual strength with the target present.
The four possible outcomes (yes/no task):
- Hit: target present and reported present.
- Miss: target present but reported absent.
- False alarm: target absent but reported present.
- Correct rejection: target absent and reported absent.
Decision criterion (the “threshold”):
- A single criterion (threshold) is used to decide between “target present” and “target absent” on each trial.
- If perceptual strength exceeds the criterion, respond “present”; otherwise respond “absent.”
Why this matters:
- Different people can have the same ability to distinguish signals (same info extraction) but different response biases (tendency to say yes or no).
- SDT provides metrics that separate perceptual sensitivity from response bias.

Yes/No task and SDT geometry

Target present vs absent trials:
- Noise-only trials (absent trials) generate False Alarms and Correct Rejections.
- Target-present trials generate Hits and Misses.
Visual intuition (two overlapping distributions):
- Noise distribution sits below, signal+noise sits shifted to the right (toward higher perceptual strength).
- The decision criterion sits somewhere along the information axis; moving it changes the balance of Hits and False Alarms without changing the underlying distributions.
What changes with bias:
- A liberal bias (lower criterion) increases Hits but also increases False Alarms.
- A conservative bias (higher criterion) reduces False Alarms but also reduces Hits.
Practical takeaway:
- Accuracy alone mixes sensitivity and bias; SDT aims to separate these components.

Two key SDT metrics: d′ and criterion c

Hit rate and False Alarm rate:
- Hit rate: H = \frac{\text{Hits}}{\text{N_present}}
- False alarm rate: FA = \frac{\text{False Alarms}}{\text{N_absent}}
Z-transform of rates (assuming normal distributions):
- $z(H) = \text{the z-score corresponding to the cumulative probability } H$
- $z(FA) = \text{the z-score corresponding to the cumulative probability } FA$
d′ (d-prime): perceptual sensitivity (distance between the two distributions in SD units)
- Definition: $d' = z(H) - z(FA)$
- Interpretation: larger d′ means greater separation between noise and signal+noise distributions; better perceptual discrimination.
Criterion c (response bias):
- Definition (common convention): $c = -\tfrac{1}{2} \big( z(H) + z(FA) \big)$
- Interpretation: negative c = liberal bias; positive c = conservative bias; zero = unbiased (balanced) criterion.
Relationship between d′ and c:
- d′ measures perceptual sensitivity independent of bias.
- c captures the decision bias (where the criterion lies relative to the two distributions).
How to interpret a fixed d′ when bias changes:
- If you shift the criterion (bias) but keep the same underlying sensitivity, d′ remains the same.
- Accuracy can go up or down with bias even if d′ stays constant.

Worked examples (yes/no task)

Example 1 (200 trials: 100 present, 100 absent):
- Hits = 80; False Alarms = 35
- Hit rate: $H = 0.80$ ; FA rate: $FA = 0.35$
- Compute z-scores: $z(H) = \text{Φ}^{-1}(0.80) \approx 0.842$ , $z(FA) = \text{Φ}^{-1}(0.35) \approx -0.385$
- d′: $d' = z(H) - z(FA) \approx 0.842 - (-0.385) \approx 1.23$
- c: $c = -\tfrac{1}{2} [ z(H) + z(FA) ] \approx -\tfrac{1}{2} (0.842 - 0.385) \approx -0.23$
- Interpretation: d′ ≈ 1.23 (moderate-to-good sensitivity); c ≈ -0.23 (liberal bias: more willing to say “present”).
Example 2 (200 trials: 100 present, 100 absent):
- Hits = 75; False Alarms = 22
- Hit rate: $H = 0.75$ ; FA rate: $FA = 0.22$
- Compute z-scores: $z(H) = \text{Φ}^{-1}(0.75) \approx 0.674$ , $z(FA) = \text{Φ}^{-1}(0.22) \approx -0.772$
- d′: $d' = z(H) - z(FA) \approx 0.674 - (-0.772) \approx 1.45$
- c: $c = -\tfrac{1}{2} [ z(H) + z(FA) ] \approx -\tfrac{1}{2} (0.674 - 0.772) \approx 0.049$
- Interpretation: d′ ≈ 1.45 (greater sensitivity than Example 1); c ≈ 0.05 (slightly conservative bias).
Key takeaway from the examples:
- d′ values reflect perceptual information strength; higher d′ means better discrimination.
- c reflects bias toward saying “present” or “absent.”
- It is possible for a person to have higher accuracy with a stronger bias that makes more hits but also more false alarms; d′ is not affected by this bias.

Edge cases and practical adjustments

When hit rate or false alarm rate is 0% or 100%:
- Directly converting 0% or 100% to z-scores is problematic (they map to ±∞).
- Common correction: add a small imaginary/trial correction by adding 0.5 to each cell in the 2x2 table (present/absent × target/response), effectively making Npresent and Nabsent incremented by 1 each and adjusting both H and FA slightly.
- Rationale: prevents infinite z-scores and yields a finite d′; should be applied to all participants consistently and planned before data collection.
Negative d′ (theoretical only):
- d′ < 0 would imply you are systematically responding in the wrong direction (confusing noise for signal more often than signal for noise) in literal terms.
- In normal perception tasks, d′ < 0 is rarely meaningful unless describing an illusion or reversed task; typically, we interpret d′ as >= 0.
Alternative measures when variances differ: d′ assumes equal-variance normal distributions.
- When this assumption is violated, an approach called d′-of-a (d′a) or ROC-based methods (e.g., d′ with unequal variances) can be used.
- d′a comes from fitting an ROC curve with rating data and can handle unequal variances between noise and signal+noise distributions.

Rating scales and ROC methods

Beyond binary yes/no responses, you can collect confidence or multiple response categories (e.g., definitely present, probably present, guess present, guess absent, probably absent, definitely absent).
- Each category yields a different hit and false alarm rate, enabling multiple z-scores and an ROC curve with more points.
- You can compute d′ using these points and fit a line (ROC) to summarize sensitivity.
Benefits of rating-based SDT:
- Provides richer data (more than a single hits/FA pair).
- Allows modeling of unequal variances (via d′a) and a more nuanced view of decision processes.
Practical example from research:
- A memory strength study used rating-scale SDT to examine how memory traces differ under conditions; d′a was used to account for unequal variances between memory strengths.

Applications and extensions of SDT concepts

Bias-free memory and perception:
- d′ is a measure of perceptual or memory sensitivity independent of bias (criterion c).
- Higher d′ implies stronger discriminability or memory strength, regardless of where the respondent tends to say “present.”
Real-world and research examples mentioned in the lecture:
- Social psychology and stereotypes: a yes/no task using names (e.g., NBA player names) to study bias and perception; a bias shift can manifest as changes in d′ or in criterion depending on context.
- Stereotype bias can lead to a shifted criterion (pseudo-d′), illustrating that bias can affect response tendencies even when underlying sensitivity is constant.
How SDT connects to broader science and practice:
- Provides a principled framework for separating perceptual/memory strength from decision criteria.
- Helps interpret performance changes due to environment (noise) vs. instruction/goal (criterion shifts).
- Useful across domains: perception, memory, psychometrics, clinical decision-making, and even weather/event detection.

Quick recap and takeaways

Z-scores convert raw performance into unitless measures that reflect how far observed signals are from noise, in SD units.
Signal Detection Theory decomposes performance into two components:
- Sensitivity (d′): how well you can distinguish signal from noise.
- Bias/criterion (c): your default tendency to say “present” vs. “absent.”
Key formulas:
- Hit rate: H = \frac{\text{Hits}}{\text{N_present}}
- False alarm rate: FA = \frac{\text{False Alarms}}{\text{N_absent}}
- $d' = z(H) - z(FA)$
- $c = -\tfrac{1}{2} \big( z(H) + z(FA) \big)$
Examples illustrate how d′ and c can diverge: higher d′ means better discrimination, while c reflects liberal vs. conservative response style.
Practical data issues: zero/one rate corrections, potential unequal variances (d′a), and the use of rating scales to build ROC curves.
SDT concepts extend beyond simple perception tasks to memory strength, stereotypes, and many decision-making contexts.