AW

Exam 2: Psychological Assessment

Validity: measuring what you think

Reliability: how close to the true score are we likely to be 

Practically: does it make sense to apply it to this setting? Is it worth it? Related to utility 

Cross-Sectional Fairness: is it accurate for someone in this group? Across other tests do you get the same answer. Absence of test or assessor biases

  • Correlation: is the degree of the relationship between 2 variables 

  • Separate from causation 

  • (relationship could reflect causation, but need experiment to know for sure)

  • Negative, Positive, Strong and Weak correlations 

  • No Correlation: no relationship; no pattern, random, no meaningful predictors

  • Info about scatter plots:

  • Some graphs don’t have a line of best fit because the data has no correlation 

  • The closer to -1 or +1, the STRONGER the relationship better a your prediction because it’s more accurate

  • The closer to 0 the weaker the relationship!

Correlation does not equal causation!!


3 types of validity:

2. Criterion related validity: relationships between scores and other measures. Do the scores predict the performance on a criterion?

  • Crit= outcome measure of interest

  • Crit pre-requisites: must be VALID and noncontaminated (meaning it’s independent can't share any items) 

  • Two Types of Criterion Related Validity: 

Concurrent Validity: can predict score now (very quickly) Ex: 100 clients take the Beck Depression Inventory Test (BDI). 500 people take the alcohol test 

Predictive Validity: can predict scores/performance in the future. Ex: using your current college gpa to determine your insurance claim. Basing it on potential risk of you crashing based on how low or high your gpa is. SAT scores correlated w/ college GPA. GRE correlated to graduation rate.

  • Both are about predicting

  • Predictive validity related stats: expectancy tables and standard error estimate (check the handout)

SEest: average distance from the regression line 

High r: strong relationship between measure (test score) and what you are predicting. no one is that far off, predict score on the line and you will be close 

Low r: weak relationship is between measure (test score) and what you are predicting 

No r: no relationship between measures 


  • Standards fo prediction varies

  • Set cut off scores that optimize ‘hit rate’ for situation 

Hit rate: hits/ hits + misses

  • Hit: accurate predicted classification 

  • Miss: false negatives and false positives

  • False positives: predict high/pass/trait but not 

  • False negatives: predict the absence of something but it‘s actually there


Construct Validity  

Construct: theoretical, intangible quality people vary on (ex:intelligence, leadership, psychopathy, anxiety, hostility, and self esteem 

  • We infer that these qualities are real and that they exist. We try to group together predictable patterns of behavioral characteristics over related items  

  • (construct is the assumed reason for the pattern)

  • Construct validity asks…is your quality measurable? and is this an accurate measure of it? 

  • This is broader in comparison to content or criterion validity 

  • Convergent Validity: scores highly as expected with other tests (positive or negative)

  • Ex: on older, established tests of contract o retaliated measures

  • Discrimination Validity: scores show little or no relationship to those that the theory predicts they should not be relatable


Reliability is about consistency: do you get the same results after each test?

  • It implies that there’s very little error and it’s near to the true score


Reliability Coefficient: a stat that quantifies reliability. Ranges from 0 (not reliable) to 1 (reliable)

  • Classic test Theory: Spearman 1904. It response theory probability of getting an item correct should be related to item difficulty and overall skill level 

*** Variance of T/ variance of T + error variance 

  • Reliability is a measure of the variability of true scores divided by the variability of the observed scores. (True + Error)

  • Closer to a value of zero 

  • Random error can be unpredictable: environmental problems (temperature), examine state (sleepy, to feeling well), administration error, rapport issues, test score errors, judgment errors 

  • Standardization (define it)


Different Ways to Measure Reliability:

  • Test Retest (The gold standard): correlates core from the same test given at different time 

  • Possible issues…practice effects, look up answers 


Alternate Forms:

  • Same content tapped differently but equally 

  • correlate people’s scores on both versions

  • harder than you might guess 


Interscorer (interjudge/interater)

  • Same people different administrators

  • *Use correlation coefficient 


Split Half (odd-even):

  • try to divide into 2 equally difficult halves, correlate the scores

  • useful if the cost practice effects woul impact test-retest (especially if impacts some more than others

  • tend to be lower b/c shorrrter can correct for thay nit still issued w/ how Ro pick your halves…led to 


Coefficient Alpha:

  •  mean of all possible split halves 


Inter-Item Consistency:

  • degree of o correlation among all items


.9 or .95 is the goal for most tests

as low as .7 can be accepted 

  • research sometimes accepts even lower values 

  • should find stats in the lower test manuals

Are the items homogeneous or heterogeneous in nature