Multi-Item Scales

Measurement tools in surveys can be classified as either:
- Single-item scales (one question per construct)
- Multi-item scales (several questions combined into one overall score)
Key purpose: translate complex, often abstract concepts (e.g., police bias) into quantitative data

Example:
- Q17 (B2) Police Bias: “Police officers treat minorities differently than White people.”
- Response format: 7-point Likert, Strongly disagree (1) → Strongly agree (7)
Advantages:
- Quick to administer
- Low respondent burden
Limitations :
- Construct complexity: difficult to capture nuanced, multidimensional ideas with one sentence
- Sensitivity: only 7 response options → limited granularity
- Reliability: impossible to test internal consistency because internal consistency requires >1 item

Police bias encompasses fairness, selective enforcement, racial targeting, socio-economic discrimination, procedural justice, etc.
Risk that a single statement reflects only a slice of this broader phenomenon → measurement error ↑

Core idea: ask several carefully worded statements addressing different angles of the same construct, then average (or sum) the individual scores to create one composite index.

Items presented (all on 1–7 Likert):
1. (B1) Police officers usually make fair decisions when enforcing laws.
2. (B2) Police officers treat minorities differently than White people.
3. (B3) Police officers do not unfairly target the poor.
4. (B4) Everyone is treated equally by the police.
5. (B5) Police officers unfairly target racial minorities during their investigations.
6. (B6) Police officers usually have a reason when they stop or arrest people.
7. (B7) People who say they were treated poorly by the police probably did something.
8. (B8) Police do their best to be fair to everyone.
Uniform meaning of end-points: “7 = Strongly agree” must indicate the same direction across items; otherwise reverse coding is required (noted with “R” in slide for B2 & B5).
Benefits of Multi-Item Scales

Better coverage of complex construct → captures multiple facets (e.g., fairness, equal treatment, justification of police actions)
More data points per respondent (k>1):
- Finer distinctions between individuals
- Statistical power ↑
Reliability estimation possible via internal consistency statistics such as Cronbach’s α

Same response metric for every item (here: 1–7 Likert)
Reverse coding: any negatively-worded items must be recoded so that high numbers always represent the same conceptual direction.
- Formula: X' = (\text{highest scale value}+\text{lowest scale value}) - X (e.g., for 1–7 scale → X' = 8 - X)
Internal consistency check: Cronbach’s α should ideally be >0.70 for research purposes.
Item redundancy vs. breadth: aim for correlation without duplication; too-similar wording inflates α artificially.

Precise measurement of police bias can inform policy, training, and accountability.
Poor measurement risks underestimating or misrepresenting community experiences, potentially perpetuating injustice.
Transparent scale construction (publicly available items, clear coding rules) enhances replicability and trust in findings.