Looks like no one added any tags here yet for you.
Traditional interview
Interviewer pursues \n different areas of inquiry with each job candidate.
Differential validity
test results mean different things for different groups
Single Group validity
test results mean same thing for same group
Organizational Settings
Employment interview, Personality inventories, Situation Judgment Test (SJT), Pre-employment testing
Structured interview
standardized, with the \n same questions asked of each job candidate.
Behavioral interview
focus on past behaviors
rather than attitudes or opinions.
Formative assessments
help teachers determine what information students are and are not learning during the instructional process \n – Helps teachers identify where students need help and \n decide whether it is appropriate to move to the next \n unit of instruction.
Norm-Referenced Tests
standardized tests in which one test taker’s score is compared with the scores of a group of test takers who took the test previously.
-Ex: Top 25% passing
Criterion-Referenced Tests
tests that compare a test taker’s scores with an objectively stated standard of achievement
-Ex: Need a 70 to pass
Authentic assessment
assessing a student’s ability to perform real-world tasks by applying the knowledge and skills he or she has learned
Cognitive testing
use of tests that measure global and narrow intellectual abilities
Memory testing
specific questions about memory functioning
-Often used with older adults who report \n memory concern
Personality inventories
– Often used as part of pre-employment testing \n – Individual personality features may not be as important \n ■ Overall personality has predictive validity for a variety of \n outcomes \n – Provide incremental validity (e.g., above cognitive ability)
Validity coefficient (r)
Correlation between test scores (predictors) and \n performance (criterion) representing the strength of \n the validity evidence
Range restriction
The reduction in the range of sores that results when \n some people are dropped from a validity study, such \n as when low performers are not hired, causing the \n validity coefficients to be lower than it would be if \n all persons were included in the study
Coefficient of determination (R^2)
– The amount of shared variance between predictor and \n criterion \n – Squared correlation
Employment interview
-Traditional interview: interviewer pursues \n different areas of inquiry with each job candidate. \n – Structured interviews: standardized, with the \n same questions asked of each job candidate. \n – Behavioral interviews: focus on past behaviors \n rather than attitudes or opinions.
Situation Judgment Test (SJT)
– Written or video-based scenarios on work- \n related dilemmas \n – Asked to identify effective course of action \n – May ask to rank from most to least effective \n actions
Pre-employment testing
Who should we hire for a job? \n ■ What individual characteristics are indicative of \n – successful performance on the job \n – satisfaction with the job \n – successful performance while training for the job \n – remaining committed to the job and organization \n – staying with the job for the long term
Clinical/Counseling Settings
Cognitive and Memory Testing, Comprehensive, Clinically Oriented Self-Report, Symptom Self-Report Tests, Symptom Checklists, Behavior Rating Scales, interviews
Cognitive and Memory Testing ex
Weschler’s
Comprehensive, Clinically Oriented Self- \n Report
– Gather information on symptoms, functioning, \n personality, and more \n – Often used to help diagnose and plan \n treatment \n – Not frequently used to monitor progress \n – Examples: MMPI, PAI, MCMI-III
Symptom Self-Report Tests
More specific, narrow tests
Ex: likert scale \n – 1. \n 0 I do not feel sad. \n 1 I feel sad \n 2 I am sad all the time and I can't snap out of it. \n 3 I am so sad and unhappy that I can't stand it.
Symptom Checklist
– Client answers questions about their symptoms \n – Can cover a broad area of symptoms
Ex: Generalized Anxiety Disorder and CBU
Behavior Rating Scales
– Often used with children \n – Outside informant (parent, teacher, etc.) \n answers questions about the individual’s behavior
Semi-structured interviews
some structure, but allow more flexibility \n – Sometimes questions can change based on \n previous answers
Interviews
Allow psychologist to gather information by asking \n questions to the client
Structured clinical interviews
require the interviewer to follow a fixed format in asking \n questions.
– Often accompanied by a formal scoring plan
Evidence-based practice
the integration of the best available research with clinical expertise in the context of patient characteristics, culture, and preferences.
Educational Settings
Placement, Formative, Diagnostic, Summative, and Authentic Assesments, Norm and Criterion-referenced tests
Authentic assessment examples
■ Dissertations
■ Journal writing \n ■ Project \n ■ Presentations \n ■ Experiments
Traditional assessments
Norm and criterion referenced tests
Summative assessments:
– Determine what students do and do not know \n – Gauge student learning \n – Assign earned grades
Decisions made at the end of instruction
summative assessments
Diagnostic assessment
– Assessments that involves an in-depth evaluation of \n an individual to identify characteristics for treatment \n or enhancement
Formative assessment ex
Midterm
Decisions made during instruction
formative and diagnostic assessments
Placement assessments are
assessments that are used to: \n – determine whether students have the skills or \n knowledge necessary to understand new material \n – determine how much information students already \n know about the new material
Decisions made at the beginning of instruction
Placement assessments
Making decisions in the classroom
– What knowledge and skills do students already possess? \n – What knowledge and skills are my students learning? \n – Where do my students need help? \n – What knowledge and skills have my students learned? \n – What grade have my students earned?
Range Restriction
Not to raise the standard for better accuracy
Only do something once
Higher chance of error
Simple linear regression
Y = a +bX \n – Where: \n ■ Y = predicted criterion score \n ■ a = intercept \n ■ b = slope \n ■ X = predictor score
Simple
one predictor
Line of best fit
Use this line to make predictions \n – That is, the regression equation
True positives
Predicted to succeed and were successful
False Negatives
Predicted to fail \n and were \n successful
High Standard of Accuracy
Ex: Malingering
The best way: increase validity
Only way to maximize accuracy
Calibration sample (a.k.a Training Set)
Sample for which regression parameters are estimated
Measurement bias
Scores on a test taken by different subgroups in the \n population (e.g., men and women; minority and \n majority) need to be interpreted differently because of \n some characteristic of the test not related to the \n construct being measured
The source of bias could be
in the criterion \n – E.g., a rater systematically rates women higher than men
Measurement Bias in Cognitive Ability Testing
is a Controversial topic
-Some studies have found subpopulation differences in intelligence test scores
Why do these differences exist?
Subjective vs. Objective job performance criteria; bias in \n performance ratings
■ Subjective ratings may favor one group of participants \n ■ Objective ratings may favor another group of participants
Validity is a
statistical concept
Group differences do not
always indicate unfair testing
Fairness is a
social concept
Purpose of testing is
to identify individual differences \n – But people should be evaluated on what the test \n purports to measure, and nothing more.
Why is measurement bias /bias in performance rating important?
– Relative validity differences seem small in absolute value \n of the correlations, but are large in relative percentages \n – Differences in who gets selected for a job, accepted into college, receives services, etc...
While the differences discussed exist, it is important to note:
– There is more variation within subpopulation groups than between \n groups \n – Bias inherent in a test may also cause differences
The source of bias may also
represent true subpopulation differences \n – E.g., men and women difference in spatial rotation ability
The source of bias is
not always the test
Types of Measurement Bias
Differential and Single-Group validity
Validation sample (a.k.a Test Set)
Sample used to predict criterion scores
Cross-validation
– The process of administering a test to another sample of test takers, representative of the target population \n ■ Because of chance factors that contribute to random error, this second administration can be expected to yield lower correlations with criterion measures. \n – Can also simply gather a large enough dataset and \n randomly split it into two samples.
Low Standard of Accuracy
Ex: TSA being overly cautious
Why not just raise \n the standard for \n better accuracy?
It depends
False positives
Predicted to succeed and were failures
True negatives
Predicted to fail and were failures
Psychology:
describe, explain, predict
Discriminant validity
Test scores are not related to unrelated \n constructs
Ex of discriminant validity
self-esteem and intelligence
How do we know different measures of the same construct are truly measuring the same thing?
– E.g., two different tests of intelligence
– E.g., two different tests of personality
Traditional view of validity
Content and criterion-related
What is validity?
Evidence that the interpretations that are being made from the scores on a test are appropriate for their intended purpose
A test must first be
reliable before it can be valid
Reliability → 2. Validity
Traditional model of validity
“Measures what it is designed to measure”
There are truly no “types” of validity,
only different sources of validity evidence
Careful of validity inference
E.g., valid measure of personality may not predict sales performance.
Three forms of validity evidence
– Content validity
– Construct validity
– Criterion-related validity
Validity is
a unitary/single concept
Sources of Validity Evidence
– Evidence based on test content
– Evidence based on response process
– Evidence based on internal structure
– Evidence based on relationships with other variables
– Evidence based on the consequences of testing
Evidence based on test content
What is actually on the test
Evidence based on response process
Observations or interviews to understand mental processes that test takers use to respond
Factor analysis:
Is the test accounted for by one factor?
Criterion-related validity
The extent to which scores on a test correlate with \n scores on a measure of performance or behavior \n – Evidence that test scores correlate with or predict \n independent behaviors, attitude, or events
-correlate test scores with other measures
Evidence based on relationships with other variables
Criterion-related validity: correlate test scores with other measures
Evidence based on internal structure
Factor analysis: Is the test accounted for by one factor?
Evidence based on the consequences of testing
Intended and unintended consequences: is the test biased?
Content validity (evidence based on test content)
The extent to which the questions on a test are representative of the material that should be covered by the test
Logically examining and evaluating the content of a test(including the test questions, format, wording, and tasks required of test takers)
to determine the extent to which the content is representative of the concepts that the test is designed to measure without either underrepresenting those concepts or including elements that are irrelevant to their measurement. (Content validity)
Construct
– An attribute, trait, or other characteristic that is abstracted from observable behavior
– Attribute, trait, or characteristic the test is designed to measure
Example of Construct
Aggression
Examples of content validity
– Psych Tests and Measurement Final– items should assess knowledge of course material
Example of Construct in a test
– FIRO-B – test of interpersonal interaction in organizations
■ Inclusion
■ Control
■ Affection
Purpose of most tests is to
make predictions
SAT → College grades
Personality → Job performance