Psychological Assessment Exam 3
Validity: measuring what you think
Reliability: how close to the true score are we likely to be
Practically: Does it make sense to apply it to this setting? Is it worth it? Related to utility
Cross-Sectional Fairness: Is it accurate for someone in this group? Across other tests, do you get the same answer. Absence of test or assessor biases
Correlation: is the degree of the relationship between 2 variables
Separate from causation
(relationship could reflect causation, but need experiment to know for sure)
Negative, Positive, Strong and Weak correlations
No Correlation: no relationship; no pattern, random, no meaningful predictors
Info about scatter plots:
Some graphs don’t have a line of best fit because the data has no correlation
The closer to -1 or +1, the STRONGER the relationship better a your prediction because it’s more accurate
The closer to 0 the weaker the relationship!
Correlation does not equal causation!!
3 types of validity:
2. Criterion related validity: relationships between scores and other measures. Do the scores predict the performance on a criterion?
Crit= outcome measure of interest
Crit pre-requisites: must be VALID and noncontaminated (meaning it’s independent can't share any items)
Two Types of Criterion Related Validity:
Concurrent Validity: can predict score now (very quickly) Ex: 100 clients take the Beck Depression Inventory Test (BDI). 500 people take the alcohol test
Predictive Validity: can predict scores/performance in the future. Ex: using your current college gpa to determine your insurance claim. Basing it on potential risk of you crashing based on how low or high your gpa is. SAT scores correlated w/ college GPA. GRE correlated to graduation rate.
Both are about predicting
Predictive validity related stats: expectancy tables and standard error estimate (check the handout)
SEest: average distance from the regression line
High r: strong relationship between measure (test score) and what you are predicting. no one is that far off, predict score on the line and you will be close
Low r: weak relationship is between measure (test score) and what you are predicting
No r: no relationship between measures
Standards fo prediction varies
Set cut off scores that optimize ‘hit rate’ for situation
Hit rate: hits/ hits + misses
Hit: accurate predicted classification
Miss: false negatives and false positives
False positives: predict high/pass/trait but not
False negatives: predict the absence of something but it‘s actually there
Construct Validity
Construct: theoretical, intangible quality people vary on (ex:intelligence, leadership, psychopathy, anxiety, hostility, and self esteem
We infer that these qualities are real and that they exist. We try to group together predictable patterns of behavioral characteristics over related items
(construct is the assumed reason for the pattern)
Construct validity asks…is your quality measurable? and is this an accurate measure of it?
This is broader in comparison to content or criterion validity
Convergent Validity: scores highly as expected with other tests (positive or negative)
Ex: on older, established tests of contract or retaliated measures
Discrimination Validity: scores show little or no relationship to those that the theory predicts they should not be relatable
Reliability is about consistency: do you get the same results after each test?
It implies that there’s very little error and it’s near to the true score
Reliability Coefficient: a stat that quantifies reliability. Ranges from 0 (not reliable) to 1 (reliable)
Classic test Theory: Spearman 1904. It response theory probability of getting an item correct should be related to item difficulty and overall skill level
*** Variance of T/ variance of T + error variance
Reliability is a measure of the variability of true scores divided by the variability of the observed scores. (True + Error)
Closer to a value of zero
Random error can be unpredictable: environmental problems (temperature), examine state (sleepy, to feeling well), administration error, rapport issues, test score errors, judgment errors
Standardization (define it)
Different Ways to Measure Reliability:
Test Retest (The gold standard): correlates core from the same test given at different time
Possible issues…practice effects, look up answers
Alternate Forms:
Same content tapped differently but equally
correlate people’s scores on both versions
harder than you might guess
Interscorer (interjudge/interater)
Same people different administrators
*Use correlation coefficient
Split Half (odd-even):
try to divide into 2 equally difficult halves, correlate the scores
useful if the cost practice effects woul impact test-retest (especially if impacts some more than others
tend to be lower b/c shorrrter can correct for thay nit still issued w/ how Ro pick your halves…led to
Coefficient Alpha:
mean of all possible split halves
Inter-Item Consistency:
degree of o correlation among all items
.9 or .95 is the goal for most tests
as low as .7 can be accepted
research sometimes accepts even lower values
should find stats in the lower test manuals
Are the items homogeneous or heterogeneous in nature
UNIT 3: CHAPTERS 8 AND 9
The Assessment of Ability
Aptitude: estimating your potential learning
Achievement Tests: what you already know
Intelligence…
to learn from experience
to acquire knowledge
to recognize and solve problems/adapt to environment
to think abstractly, to reason, to understand complexity
and the speed you learn and gather data
Others include additional concepts: interpersonal skills, morality, compassion, loyalty
most agree nature and nurture contribute (modifiable within LARGE genetic based limits
Eastern Cultures Emphasize: benevolence, humility, doing what’s right
African Cultures Emphasize: maintaining harmonious and stable intergroup relations
Western Culture Emphaize: Generally expanding
1.Factor analytic theories (psychometric approach): identify the ability or groups of abilities that constitute intelligence (if more than one, are related?)
Factor analysis: correlational technique allowing us to identify clusters of items that Are related, possibly indicating a meaningful concept
Many exist; some are over 100 years old
Single trait v.s. Independent trait v.s. Hierarchical
2. Information Processing theory/model: identifies specific mental processes that are applied during problem solving (how we present the processed info, these models are not interested in what we process)
Metacognition: thinking about thinking
Spearman’s Two Factor Theory (early 1900’s)
little ”g”, “g factor” or “g”
”g” is influencing everything
little g is the ability to reason and solve problems: impacts everything, explains high correlations among skills
General intelligence: electrochemical mental energy or power (CS), physiological integrity of NS (SMS) (the efficiency and effectiveness of the NS
The best measure is the abstract reasoning problems for little g
Best measure: nature and nurture, experience, trauma, SES, being socially isolated
S Factors: the ability to excel in certain areas; specific intelligences (art, music, art, business, etc) the two factor model was too simple
Group Factor: an intermediate construct that impacts must have but not all skills. Not general g or specific s
Spearman v.s. Thurstone
Thurstone Believed that in Primary Mental Abilities; believed that there was seven independent factors…later found out that these traits are connected
1. Verbal Comprehension, 2. Word Fluency, 3. Number, 4. Space, 5. Associative Memory, 6. Perceptual Speed, 7. Inductive Reasoning
Most people agree with the hierarchical model
Cartel-Horn-Carroll (CHC Theory)
Started with Cattek (40’s)
Revised by Horn (68)
Expanded by Carroll (1993)
Factor analytic theory is dominant method
Crystallized Intelligence: reflects ability to accumulate knowledge use verbal skills uses strategies (like school info) a repository, culture/schooling dependent. Somebody else taught you how to do this
Fluid Intelligence: used when reasoning, ability to solve problems, see relationships, to reason abstractly, and spatial. Nonverbal, free of instruction, independent of culture
People argue that fluid intelligence is best practiced through little g
Synthesizing most other factors results in one model. Good overall but still refining details
Planning: selection, use and monitoring of effective problem solving strategies. Ex: using feedback
Attention: receptivity to info Ex: ability to keep attention and block out distractions
Simultaneous: parallel processing; think and perceive arr the same time Ex: drawing, order doesn't matter
Successive: sequential processing; thinking that requires a specific order. Ex: language and calculations
Many things take simultaneous and successive learning. Can be difficult to do a task that tackles just one
Cognitive Assessment System attacks all 4^^
New Theories:
Gardner: Theory of multiple intelligences. Howard Gaardner thought intelligence was too vast and too complex to measure as we do. Initially came up with 7 forms of intelligence and then added throughout the years. (7, 8, 9, 10)
Sternberg: Triarchic Theory of Successful Intelligence
Both felt traditional IQ tests were too limited
Expanded to include things like creativity, music, people skills, body intelligence
The information processing approach is more fair
Gardner and Sternberg thought intelligence tests were too limited
Existentialists: sensitivity and capacity to tackle deep questions about the meaning of life, human existence and why we die.
Pedagogical: understanding how learning happens and knowing how to shape one's own learning to help others learn
Evaluation:
people like this method because it seems fair and optimistic
Most psychologists recognize the value of expanding "intelligence" with the multiple theory of intelligence track
Also look at the broader
Sternberg’s Trarchic Theory of Successful intelligence:
Analytical: ability to acquire knowledge and break problems into parts. Problem solving
Creative: ability to think outside the box. Also tracks problem solving. The speed of learning new things
Practical: also known as emotional intelligence; social skills, common sense, can you “read the room,” adapt an shifting to your environment
Now he talks about wisdom. Adaptive Intelligence includes all 4**
Intelligence Quotient (IQ): how you compare to others your age on a specific intelligence test. Note: IQ’s from most accepted tests are correlated
Does it = intelligence
It’s your current best estimate, it is useful but it’s not w/o problems. (need to know test limits and possible administration limits)
Taking the test and seeing how they compared to other people in their age group
We expect people to gain more knowledge as they age so that’s why we compare intelligence based on age
Does it = intelligence? This is our current best estimate but it is abstract. Know the limits. Be aware of the strength and weaknesses
Why are we testing intelligence?
To assist people who are struggling to who is excelling
Used to identify intellectual disabilities/learning disorders
Guide admission to private schools
Helps predict academic success
Helps predict occupational success
It DOES NOT predict mental health
Assess intellectual ability after brain injury
To assess/document impairment and or legal competence
To diagnosis disability — access to benefits
Personality testing as a part of full assessment for diagnosis and/or therapeutic intervention
For guidance in career
Test people on items assumed to tap into intelligence (items vary. Per test based on theory, typical broad range)
Raw scores are converted too In scores based on raw score distribution for age. (Scores fall along normal curve; can manipulate test if not)
arbitrarily but consistently use to 100 as mean with a SD of 15. Called a deviation IQ (new way)
Old way…
(ratio). IQ = MA/CA * 100
MA = Mental Age (what you score like; could be lower, higher or average)
CA =Chronological Age (Actual Age)
WE ONLY USE DEVIATION NOW**
We’re moving away from archaic labels like “superior” ,”boarder line” “idiot” “imbecile”, “moron”, “feeble minded” and “mental retardation” and using less harsh identifiers.
Verbal Comprehension: ability to access and find words, to retrieve info
Perceptual Reasoning:
Working Memory:
Processing Speed:
Standardization (2200 individuals and cooperative healthy; used race, gender, age, etc); Reliability, Validity
SB5: Stanford Binet 5: Average 100 SD: 15. 5 factors, 2 domains
The Cognitive Assessment System II:
Ages 5-18
12 subsets, full scale and 4 process scores (based on PASS)
Stratified US sample 2200
Acceptable reliability and Validity
White/Black race differences smaller than traditional tests: controlling for variables like SES. Correlate with grades high or both groups. Use may help decrease over rpep of black children in special ed
There’s typically a 15 point difference between black and white Americans.
Non-verbal Tests of Intelligence:
Comprehension Test for Non-verbal, Peabody Picture,
Raven’s Progressive Matrices (they say this is the purest measure of “g”). This test is supposed to be culture free. For ages 5+. You have to choose the missing piece
Flynn Effect (Group) apparently “we are smarter than our parents”
Stability over age (Individual)
Flynn Effect: Increase in average IQ score; about 3pts/decade (then they retake the test); biggest jump - prob solve not general knowledge
Start off as measured raw scores
Environmental impact on IQ
Better nutrition, parental care, medical care
Overall environ changes, teaching styles overall ed experiences, tech advances, complexity of life smaller families, more time in cognitive based leisure activities
Older tests you seem smarted. New test you seem less smart. As time progresses society learns new things. Old test get outdated and we learn from historical mistakes
Infant/Childhood trend.
Early intervention is useful to identify development delays . Measure multiple domains: cognitive, social and motor
We don’t identify if a young child is gifted because infants change so rapidly so they may easily fall behind as fast as they excel
By age 4 you can predict adolescence and then when they’re middle child to teenage years then you can better determine their intellectual level as an adult
Crystalized skills are maintained much longer than fluid skills. Fluid maxes around 20’s and 30’s, slows and faster around 75+. Crystallized maintained much longer
If good health, older = wiser. see multiple perspectives; respect — know their limits, less distorted by negative emotions
Contribution to Intelligence:
Strong evidence nature (genes) and environment matters
Comparing Twin research, Comparing siblings and comparing non-related siblings. They test these groups while they’re really young and close in age
Adopted kids scores correlate less with adopted family and more with Bo family overtime (IQ and verbal measures)
Similarity between identical twins increases over time
2+3 are NOT saying environment don’t matter, the correlations do not become 1.0 but these shifts over time are certainly thought provoking
Genetic hypothesis for group differences >> No real support
Standard IQ for races: Asian, White, Native/Latin, Black
A large part of that is due to SES^
Social Identity threat: if you are reminded of your group membership then you are more likely to score along the lines of the stereotypes that’s attached to your identity. (Ex: whites scoring higher if they’re reminded they have higher IQ test scores and the opposite for black Americans)
Within groups there are generation differences (ex:Flynn and average IQ for blacks is increasing more than whites)
Given the same info Blacks and Whites show similar info processing skills (PASS-CAS) (FAGAN)
In different eras, different ethnic groups have experienced of remarkable