1/43
Flashcards covering key concepts, definitions, and examples related to validity, validation strategies, bias, and fairness in psychological testing.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is the general definition of validity in psychological assessment?
The degree to which evidence and theory support the interpretations of test scores for their intended purposes; how well a test measures what it purports to measure in a specific context.
Why is the phrase “valid test” potentially misleading?
Because no test is universally valid for all purposes, populations, and times; validity always refers to a specific use, group, and occasion.
What is validation?
The ongoing process of gathering and evaluating evidence about the appropriateness of inferences drawn from test scores.
Who is responsible for supplying validity evidence in a test manual?
Primarily the test developer, though test users may conduct local validation studies when needed.
When is a local validation study necessary?
When a test is altered (e.g., translated to Braille) or used with a population that differs significantly from the norm group.
Name the three classic (‘trinitarian’) categories of validity evidence.
Content validity, criterion-related validity, and construct validity.
What is content validity?
Judgment of how adequately a test samples the domain of behavior it is intended to measure.
Give an example of ensuring content validity for an educational exam.
Matching the proportion and type of items on a cumulative final to the proportion and type of material actually taught in the course.
What is a test blueprint?
A detailed plan specifying the content areas, number of items, and weightings that guide test construction to ensure content validity.
What is criterion-related validity?
The extent to which test scores relate to an external criterion measure; includes concurrent and predictive validity.
Define concurrent validity.
The degree to which test scores are related to criterion measures obtained at the same time.
Define predictive validity.
The degree to which test scores forecast a criterion measured at some future point.
What is a criterion in validity studies?
The standard against which a test is evaluated, such as job performance ratings, GPA, or clinical diagnosis.
List three desired characteristics of a criterion.
Relevant, valid, and uncontaminated by the predictor test.
What is criterion contamination?
Using information from the predictor test in establishing the criterion, inflating the validity coefficient artificially.
What statistical index is typically used as a validity coefficient?
A correlation coefficient (often Pearson r) between test scores and criterion scores.
How does restriction of range affect validity coefficients?
It usually lowers them because reduced variability attenuates correlations.
What is incremental validity?
The amount of variance in a criterion explained by a new predictor beyond that explained by existing predictors.
Which analysis is often used to quantify incremental validity?
Hierarchical (or sequential) multiple regression.
Define base rate in predictive validity contexts.
The proportion of individuals in the population who naturally possess or exhibit the criterion characteristic.
What is a hit rate?
The proportion of people a test correctly identifies regarding the presence or absence of a criterion attribute.
Differentiate false positive and false negative errors.
False positive: test predicts presence of a trait when it is absent; False negative: test predicts absence when the trait is present.
What is construct validity?
Evidence and reasoning showing that test scores meaningfully relate to the theoretical construct they are intended to measure.
Give two types of construct validity evidence involving correlations.
Convergent validity (high correlations with related measures) and discriminant validity (low correlations with unrelated measures).
What is the multitrait-multimethod matrix?
A table of correlations among multiple traits measured by multiple methods used to examine convergent and discriminant validity simultaneously.
How does factor analysis aid construct validation?
By identifying underlying factors and showing whether items load on expected constructs, supporting test homogeneity and theoretical structure.
What is a factor loading?
The correlation between an individual test or item and an underlying factor identified in factor analysis.
Define homogeneity in a test.
The degree to which all items measure a single construct; often assessed via internal correlations or factor analysis.
What type of evidence involves demonstrating score differences between theoretically distinct groups?
Evidence from contrasted (distinct) groups.
What is face validity and why is it important?
The extent to which a test appears, on the surface, to measure what it claims; it influences test-taker motivation and public acceptance but is not technical evidence of validity.
Define ecological validity.
The extent to which test findings generalize to real-life, naturalistic settings where the behavior actually occurs.
Name four common rating errors that can affect criterion data.
Leniency (generosity) error, severity error, central tendency error, and halo effect.
How can ranking rather than rating reduce certain errors?
Ranking forces raters to order individuals relative to one another, reducing leniency, severity, and central tendency tendencies tied to absolute scales.
What is test bias in psychometrics?
A systematic factor in a test that prevents impartial, accurate measurement across groups, leading to unfair advantages or disadvantages.
Differentiate intercept bias and slope bias.
Intercept bias: systematic over- or under-prediction for a group; Slope bias: predictor has weaker (or stronger) correlation with criterion for a group.
What is test fairness?
The impartial, just, and equitable use of tests in decision-making, considering societal values and consequences.
Give an example of a preventive psychometric technique to address adverse impact.
Eliminating items showing differential item functioning (DIF) between groups before finalizing the test.
What is within-group norming (race-norming)?
Converting raw scores to percentiles based on an examinee’s own demographic group; now prohibited for employment decisions in the U.S.
Describe banding as a selection technique.
Grouping scores within a defined range as equivalent, then using additional criteria (e.g., diversity goals) to choose among candidates within the band.
Why might a test with low face validity still be worth using?
If strong empirical evidence supports its validity; appearance alone does not determine measurement quality.
What is an expectancy chart?
A table showing the probability of reaching various criterion outcomes for different score ranges, aiding interpretation of predictive validity.
Why must validity be re-established over time?
Cultural changes, new populations, or shifts in test content can alter how well scores represent the intended construct.
How can age-related score changes support construct validity?
If a construct is theoretically expected to develop or decline with age, observing corresponding score trends supports validity.
What is a local validation study?
A study conducted by a specific test user to gather validity evidence for their own population or modified test version.