measurement reliability = consistency
test-retest: if tested again, the results/patterns will be similar/dissimilar
internal: will generate similar responses across all of the items, even with different wording
average inter-item correlation (AIC): mean of all possible correlations
i.e., AIC between 0.15-0.50 have a reasonable correlation
cronbach’s alpha: combines AIC and the # of items in the scale
i.e., the closer to 1.0, the more reliable the scale
interrater: different interpretations of the same sentence/prompt
whether results are uniform when multiple administrators use the measure
r: 0.70 or greater for strong, positive correlation
kappa: when rating categorical variables; closer to 1.0 is a stronger correlation
measurement validity = accuracy
construct: how well variables have been operationalized
face: subjective; does this seem plausible?
e.g., layperson may be confused or question construct validity compared to an expert
content: subjective; does it cover all the (relevant) content?
e.g., a quiz or survey covering only some of the relevant material vs. all or comprehensively
criterion: empirical; does it correlate with behavior (the way it should theoretically)?
e.g., measuring class with relevant behaviors (arriving on time, using a planner, etc.), vs. irrelevant behaviors
known-groups paradigm: testing two groups who are known to differ on the measured variable to ensure the score differently on that variable
convergent: empirical; does it correlate with similar metric.. do they converges?
e.g., measuring movie quality with imdb, rotten tomatoes, etc., scores (commonly used measurements)
discriminant/divergent: empirical; does it correlate with dissimilar metrics?
e.g., ensuring that measurement used for “itch” does not converge with measurement used for “pain”
prevents confusion and irrelevant overlap
Internal Validity – The degree to which a study establishes a cause-and-effect relationship between variables, without confounding factors.
External Validity – The extent to which the study’s findings generalize to other settings, populations, or times.
Ecological Validity – How well the findings apply to real-world settings.
Statistical Conclusion Validity – The extent to which conclusions drawn from statistical analysis are accurate and reliable.
sampling validity
statistical validity: findings are precise, reasonable, and replicable
requires big enough sample
Cares which wolves are sampled and how (the number doesn’t matter)
larger sample size → smaller CI
larger sample size → more precise estimate of variable of interest
external validity: findings can be generalized to other contexts
requires right kind of sample
Doesn’t care which wolves are sampled, just how many are sampled
generalizability: extent we can apply claims of sample to entire population of interest
probability sampling: lets randomness determine who/what is sampled
simple random: list of every single member of population & choose sample randomly
systematic: choose sample based on a system (e.g., every 5th person)
stratified random: dividing population into meaningful subgroups (strata) and sample randomly from subgroups
cluster: random select naturally occurring clusters → randomly selecting people within those clusters
nonprobability sampling: researcher/participants choose who/what is sampled
response sets
non-differentiation: picking the same thing every time
solution: mix up the wording
acquiescence (yea-saying): agree with everything
solution: including attention check questions
fence-sitting: picking the neutral option every time
solution: eliminate neutral option on likert scales
observational validity solutions
observer bias → clear codebooks: precise operationalization of variables
observer bias/effects → masked (blind) design: observers are unaware of study’s purpose and conditions
reactivity → unobtrusive observations: making observers less noticeable
probability sampling: lets randomness determine who/what is sampled
simple random: list of every single member of population & choose sample randomly
systematic: choose sample based on a system (e.g., every 5th person)
stratified random: dividing population into meaningful subgroups (strata) and sample randomly from subgroups
oversampling: intentionally over representing certain subgroups
cluster: random select naturally occurring clusters → randomly selecting people within those clusters
multistage: includes 2 samples/stages
random sample of clusters
random sample of people within clusters
nonprobability sampling: researcher/participants choose who/what is sampled
convenience: sampling only easily accessible/available participants; most common sampling technique
purposive: only certain kinds of people included in sample
snowball: participants asked to recommend acquaintances for study
quota: identifies subsets of population and sets target number for each category in sample; samples nonrandomly until quotas are filled