Exam 2: Psychological Assessment
Validity: measuring what you think
Reliability: how close to the true score are we likely to be
Practically: does it make sense to apply it to this setting? Is it worth it? Related to utility
Cross-Sectional Fairness: is it accurate for someone in this group? Across other tests do you get the same answer. Absence of test or assessor biases
Correlation: is the degree of the relationship between 2 variables
Separate from causation
(relationship could reflect causation, but need experiment to know for sure)
Negative, Positive, Strong and Weak correlations
No Correlation: no relationship; no pattern, random, no meaningful predictors
Info about scatter plots:
Some graphs don’t have a line of best fit because the data has no correlation
The closer to -1 or +1, the STRONGER the relationship better a your prediction because it’s more accurate
The closer to 0 the weaker the relationship!
Correlation does not equal causation!!
3 types of validity:
2. Criterion related validity: relationships between scores and other measures. Do the scores predict the performance on a criterion?
Crit= outcome measure of interest
Crit pre-requisites: must be VALID and noncontaminated (meaning it’s independent can't share any items)
Two Types of Criterion Related Validity:
Concurrent Validity: can predict score now (very quickly) Ex: 100 clients take the Beck Depression Inventory Test (BDI). 500 people take the alcohol test
Predictive Validity: can predict scores/performance in the future. Ex: using your current college gpa to determine your insurance claim. Basing it on potential risk of you crashing based on how low or high your gpa is. SAT scores correlated w/ college GPA. GRE correlated to graduation rate.
Both are about predicting
Predictive validity related stats: expectancy tables and standard error estimate (check the handout)
SEest: average distance from the regression line
High r: strong relationship between measure (test score) and what you are predicting. no one is that far off, predict score on the line and you will be close
Low r: weak relationship is between measure (test score) and what you are predicting
No r: no relationship between measures
Standards fo prediction varies
Set cut off scores that optimize ‘hit rate’ for situation
Hit rate: hits/ hits + misses
Hit: accurate predicted classification
Miss: false negatives and false positives
False positives: predict high/pass/trait but not
False negatives: predict the absence of something but it‘s actually there
Construct Validity
Construct: theoretical, intangible quality people vary on (ex:intelligence, leadership, psychopathy, anxiety, hostility, and self esteem
We infer that these qualities are real and that they exist. We try to group together predictable patterns of behavioral characteristics over related items
(construct is the assumed reason for the pattern)
Construct validity asks…is your quality measurable? and is this an accurate measure of it?
This is broader in comparison to content or criterion validity
Convergent Validity: scores highly as expected with other tests (positive or negative)
Ex: on older, established tests of contract o retaliated measures
Discrimination Validity: scores show little or no relationship to those that the theory predicts they should not be relatable
Reliability is about consistency: do you get the same results after each test?
It implies that there’s very little error and it’s near to the true score
Reliability Coefficient: a stat that quantifies reliability. Ranges from 0 (not reliable) to 1 (reliable)
Classic test Theory: Spearman 1904. It response theory probability of getting an item correct should be related to item difficulty and overall skill level
*** Variance of T/ variance of T + error variance
Reliability is a measure of the variability of true scores divided by the variability of the observed scores. (True + Error)
Closer to a value of zero
Random error can be unpredictable: environmental problems (temperature), examine state (sleepy, to feeling well), administration error, rapport issues, test score errors, judgment errors
Standardization (define it)
Different Ways to Measure Reliability:
Test Retest (The gold standard): correlates core from the same test given at different time
Possible issues…practice effects, look up answers
Alternate Forms:
Same content tapped differently but equally
correlate people’s scores on both versions
harder than you might guess
Interscorer (interjudge/interater)
Same people different administrators
*Use correlation coefficient
Split Half (odd-even):
try to divide into 2 equally difficult halves, correlate the scores
useful if the cost practice effects woul impact test-retest (especially if impacts some more than others
tend to be lower b/c shorrrter can correct for thay nit still issued w/ how Ro pick your halves…led to
Coefficient Alpha:
mean of all possible split halves
Inter-Item Consistency:
degree of o correlation among all items
.9 or .95 is the goal for most tests
as low as .7 can be accepted
research sometimes accepts even lower values
should find stats in the lower test manuals
Are the items homogeneous or heterogeneous in nature