1/53
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
The 3 aspects in designing a good test
Standardization
one aspect of designing a good test
= comparing a data point against a data set
ex. grades from a harsh teacher vs. an easy teacher, don’t mean anything without _
so that’s why there are tests (MCAS, SAT, AP)
Reliability
one aspect of designing a good test
= score consistency
Split-in-Half Technique
One way to establish reliability
= is assessed by splitting the measures/items from the measurement procedure in half, and then calculating the scores for each half separately
ex. 100 Q test, see how well students did in first 50 and the last 50 (or split by evens and odds)
if both halves are good = reliable
if one good, one bad = unreliable (b/c one part is *statistically* significantly more difficult)
Test-Retest Reliability
One way to establish reliability
= administering the same test (but shuffled a bit to avoid test-retest bias AKA memorization) twice over a period of time
The higher the correlation, the higher the reliability!
Alternate Forms Reliability
One way to establish reliability
= Does the score you receive correlate with the score on another test covering the same material?
ex. taking the October SAT vs. the December SAT: scores shouldn’t change drastically b/c the SAT is reliable
Inter-Rater Reliability
One way to establish reliability
= does the score one grader assigns to your assessment correlate with the score another grader gives for the same test?
ex. more than one “blind” grader gives scores for FRQs to ensure _ reliability where both scores should be the same
Intra-Rater Reliability
One way to establish reliability
= Does an individual rater agree with themselves when measuring the same item multiple times?
= the consistency of the data recorded by ONE rater over several trials
Validity
one aspect of designing a good test
= the extent to which a test actually assesses what it claims to asses
Content Validity
One way to establish validity
= Does the assessment have content relevant to the construct?
AKA how representative the results are of the content being tested
Face Validity
One way to establish validity
= at first glance, does the test seem to evaluate what it claims to?
ex. a test on musical ability, but the first page is just pictures of food → appropriate here to question the _ validity
ex. AP Psych exam includes a graph of normal distribution, and you (a psych student who didn’t study at all) thinks it looks like a math test, but the graph is actually related to intelligence testing and statistics
Construct Validity
One way to establish validity
= whether or not an assessment measures an idea (or “construct”) that it is designed to measure
construct = something intangible, so test makers have to come up with a tool (or “operationaldefinition”)
so the main question is: Does the operational definition really measure what it’s supposed to?
Criterion Validity
One way to establish validity
= measures how well the test correlates with the outcome
ex. student gets “genius” on an inteligens test, BUT always misspells the word inteligens → low criterion validity
ex. you score really high on an inteligens test, BUT, you have trouble multitasking → bad criterion validity
has 2 types:
Predictive Validity
a type of criterion validity
= does the test accurately predict the level of some future performance?
ex. Does the performance on the SAT correlate with later college performance?
Concurrent Validity
a type of criterion validity
= do the results from the test correlate with results from OTHER measures designed to assess similar topics/concepts?
ex. if results from my test that I created were similar to the WAIS, then my test has criterion validity (b/c the scores had a positive correlation to another valid measure of intelligence)
Verbal Tests
Tests that use word problems to assess abilities
Abstract tests
Tests that use non-verbal measures to assess abilities
Speed of Processing
= the time it takes a person to do a mental task
Binet Test
an intelligence test that compares a child against what most children their age can do
ex. an average 7 year old can tie their shoes, ride a bike, do basic math, etc.
is ratio-based:
Mental age X 100 = IQ
Chronological age
7 X 100 = 100 IQ (the average)
7
ex.
a mental age of 8 X 100 = 80 IQ (below average)
a biological 10 yr old
Stanford-Binet Test
= an intelligence test that compares an individual against a large bank of acquired scores on a bell curve
Wechsler Adult Intelligence Scale (WAIS)
= an IQ test designed to measure intelligence and cognitive ability
Francis Galton
= this person correlated reaction time to intelligence
HOWEVER, this person also used intelligence tests to support eugenics
Raven’s Matrices
= a non-verbal IQ test that measures intellectual development + logical thinking
Flynn Effect
= an increase in population Intelligence Quotient (IQ) throughout the 20th century
Growth Mindset
= when you DO believe you can improve intellectually
Fixed Mindset
= when you DON’T believe you can improve intellectually
idk, u think
Are tests predictive?
idk, u think
Are tests biased?
55% of intelligence is heritable (biological/nature)
However, intelligence can be modified by changing the environment
ex. improving nutrition, removing toxins, better schools, the ratio of encouraging comments to reprimands, amount of attention from adults, etc.
What portion of intelligence is due to nature? due to nurture?
Between-Group Differences
= the average of group 1 compared to the average of group 2
ex. average height of women = 5’4” vs. men = 5’9”
Within-Group Differences
= the range of differences within 1 group (individual 1 compared to individual 2)
ex. Ms. Georges height vs. Mrs. Silipo’s height (within the female SHS teachers population)
Question Familiarity
one criticism of standardized tests
= the Qs are more familiar to middle/upper-middle class than others
or
= the Qs might reflect common knowledge of the majority group / the interests of one specific group
ex. only 1 student knowing about Ash Wednesday (majority group = christians)
ex. a Q about using instrumental aggression in football (biased towards men)
ex. MCAS question about snow days given to the Midwest region (probably created by people in northern regions)
Motivation
recall Maslow’s hierarchy:
also:
Self-fulfilling Prophecy
people often conform to what’s expected of them
Stereotype Threat
tendency for members of the same group for which a negative stereotype exists to perform poorly on an instrument designed to asses an ability related to the negative stereotype
AKA members who are thought to be ____ will conform to that expectation when tested about _
Peer Pressure and Group Norms
= different groups have different beliefs about school success
based on research from the 200s:
Biological Reactions to Stress
= physical stress reactions (changes in cortisol. blood pressure, etc.) hurt memory, attention, and executive functioning---all necessary components for academic success
Achievement Tests
= any norm-referenced standardized test intended to measure skill/knowledge in a certain subject
Aptitude Tests
= attempts to determine a person’s ability to acquire (through future training) specific skills
ex. career tests for high school students
Personality Tests
= designed to systematically elicit information about a person's motivations, preferences, interests, emotional make-up, and style of interacting with people and situations
ex. MMPI-2, MBTI, etc.
Objective Tests
= tests that are easily scored, can be given in groups, are forced choice (test taker has to choose from multiple choice or true/false)
ex. Achievement tests (intelligence), Aptitude tests (intelligence), MMPI (personality), MBTI (personality)
Projective Tests
= tests that are unstructured where the subject is shown ambiguous stimuli
ex. Thematic Apperception Test (TAT)
ex. Rorschach inkblot
Thematic Apperception Test (TAT)
= a type of projective test that involves describing ambiguous scenes to learn more about a person's emotions, motivations, and personality
Rorschach Inkblot
= a projective psychological test in which subjects' perceptions of inkblots are recorded and then analyzed using psychological interpretation, complex algorithms, or both
Intellectual Disability
= limited cognitive and adaptive functioning
= IQ less than 70, needs support to live
Adaptive Functioning
= ability to care for self and meet general social expectations
Executive Functioning
= planning strategies, multi-step problems, metacognition
Intermittent
level 1 of support needed for those with intellectual disabilities
= as needed basis (usually only needed when person starts something new in life)
Limited
level 2 of support needed for those with intellectual disabilities
= for limited time/activities (ex. job coach)
Extensive
level 3 of support needed for those with intellectual disabilities
= long-term involvement (help w/daily living)
Pervasive
level 4 of support needed for those with intellectual disabilities
= intense, long-term care (possibly needed to keep a person alive) for all parts of their life
Savants
= people (usually with intellectual disabilities) who have a genius-like ability in a very narrow area
Prodigy
= a child with amazing ability
IQ > 135
Genius
scoring 2 standard deviations above the man (top 1%)