1/95
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
explain some of the major laws that govern the use of tests and measurement in the United States
compare the motivations and consequences of various lawsuits related to testing
contrast the motivations and consequences of various lawsuits related to testing
list several important ethical rules of conduct for measurement folks
Individuals with Disabilities Education Act (IDEA, 1997)
all children are entitled to a free and appropriate public education
testing used primarily to help place children in correct programs and measure progress
each child given an Individual Education Program (IEP)
children with disabilities educated in the least restrictive environment
both students and parents are involved in decision-making
mechanisms needed to ensure the above five principles
Truth in Testing Law (1979)
passed after investigation of Education Testing Services by New York Public Interest Research Group
require testing companies:
disclose all validity studies of a test
full disclosure on meaning of scores and how they’re calculated
provide copies of test questions, correct answers, and student’s answers if the student requests it
No Child Left Behind (NCLB, 2002)
assure that all children meet or exceed their state’s level of academic achievement
issues:
standardized achievement tests used to measure “school performance”
unrealistic standards
everyone must be tested at grade-level
expensive with limited funding
only applied to public schools
Every Student Succeeds Act (ESSA, 2015)
revised NCLB
relaxes requirements about testing every student in a school
extra funding and interventions for high schools with more needs
states given more control over how to help students and schools in needs
issues:
still using standardized achievement testing to evaluate school performance
still only applied to public schools
still limited in funding
Family Educational Rights and Privacy Act (FERPA)
parents/students can inspect student’s education records, but schools are not required to provide copies
parents/students can request records be corrected
schools need written permission from parents/students to release any information
exceptions:
school officials
destination schools after tranfser
specified officials for audit/evaluation
etc.
What was Hobson v. Hansen (1967) about?
standardized tests used to place students in different learning tracks
African American children were disproportionately placed into based tracks while white children moved to other tracks
What was the consequence in Hobson v. Hansen (1967)?
grouping would be permissible if based on innate ability
problem: tests used were influenced by cultural experiences
What was Diana v. State Board of Education about?
intelligence tests used to place students in EMR tracks
problematic for bilingual children
tests were standardized for only white children
What was the consequence in Diana v. State Board of Education?
further research revealed bilingual children receive higher IQs if tested in their primary language
What was Larry P. v. Wilson Riles (1979) about?
1/6th of African American elementary-school children tracked to EMR classes based on IQ scores
side 1 argued:
retesting done by African American psychologists yielded higher IQ scores, EMR placement was detrimental long-term
side 2 argued:
IQ scores were valid and unbiased
retesting was not standardized
What was the consequence in Larry P. v. Wilson Riles (1979)?
practice of IQ tests for EMR tracking ended
mixed feelings about outcome
What was Parents in Action on Special Education v. Hannon (1980) about?
racial bias only found for a subset of items on teh WISC, WISC-R, and Stanford-Binet IQ tests
What was the consequence in Parents in Action on Special Education v. Hannon (1980)?
racial bias findings didn’t justify removal of tests
conflicted with Larry P. v. Wilson Riles
What was Griggs v. Duke Power Company about?
raised concerns about segregation in the workplace
company claimed education was needed for advancement and created a test
nobody passed this test
concerns about validity of the test
What was the consequence in Griggs v. Duke Power company?
employment test results must be valid and reliable
What was Watson v. Fort Worth Bank and Trust about?
misrepresentation of African American personnel
passed over for promotion multiple times
What was the consequence in Watson v. Fort Worth Bank and Trust?
lower courts argued that statistical bias only applied to psychological tests
Supreme Court disagreed
key ethical principles
no physical, emotional, or psychological harm
consent is important
reasonable and appropriate incentives
responses are made anonymous
confidentiality must be ensured
careful reporting of information
use of appropriate assessment techniques
test scores must be sufficiently valid and reliable
tests should have a purpose
define test fairness described by psychometricians
define test bias as described by psychometricians
compare test fairness and bias
contrast test fairness and bias
describe how threats to test fairness weaken validity arguments
describe the ways test developers ensure test fairness
principles for making assessments using universal design
describe various ways of detecting bias and their limitations
test fairness
validity issue combining morality, philosophy, and legality
What are views on test fairness?
equitable treatment during testing
accessibility to the measured constructs
validity of individual test score interpretations for intended uses
lack of measurement bias
What is equitable treatment composed of?
standardization and consistency of administration
qualified test administrators
flexibility
What is accessibility composed of?
respondents can accurately record their responses
congruence of construct intended to be measured
congruence of constructs needed to respond to the measure
What is validity of individual test scores composed of?
heterogeneity within groups
group level accommodations or modifications are not always appropriate
What are the kinds of threats to fairness?
content
context
response process
lack of opportunity to learn
What is content? How does it threaten fairness?
problems with words or vocabulary inside an item
terms may be more likely known by another group
offensive language
representativeness within a question
What is context? How does it threaten fairness?
problems surrounding a test or measurement
stereotype threat
unclear instruction
advanced or unfamiliar language
differential treatment
What is response process? How does it threaten fairness?
problems with the processes used to take in an item, process, and respond
occur between test-taker and item
faking good
misinterpreted communication
lack of accessibility
What is lack of opportunity to learn? How does it threaten fairness?
universal design
assessment development approach that maximizes accessibility of the test for all of the intended takers
begins by defining constructs precisely with clear differentiation from construct-irrelevant invariance
What are best practices for content and wording?
test takers share the same experience
appropriate complexity of sentences and vocabulary
shorter sentences
What are best practices for formatting?
text formatting
typefaces
white space
contrast
illustrations
accommodation
changes made to a test to improve accessibility
doesn’t affect the measured construct
modifications
changes made to a test to improve accessibiility
does affect the measured construct
When are accommodations necessary?
not appropriate if the affected ability is directly relevant to the construct being measured
not appropriate for an assessment if the purpose of the test is to assess the presence and degree of the disability
not necessary for all students with disabilities
test bias
system difference in scores between groups due to some unrelated factor
empirical observation
What are total-scores test bias?
difference-difference bias
cleary model
What is item-specific test bias?
content evaluation
differential item functioning (DIF)
difference-difference bias
bias evidenced by differences in scores among groups
cleary model
test scores are unbiased if equivalent scores from different groups equally predict some criterion
linear regression with interaction effect
problems with cleary model
assumes all relevant predictors or covariates are included
assumed unbiased criterion
interaction/model tests are often underpowered
content examination
review items for obvious cultural, racial, or gender related bias
differential item functioning (DIF)
occurs when an item behaves differently among groups
respondents from different groups equated across scores
Mantel-Haenszel Test
simplest approach for detecting DIF
procedure:
group respondents into score groups
create a contingency table, for each score range group, of incorrect/correct responses and comparison group membership
calculate expected counts and variances of counts within each score range group
use all information to calculate chi-square statistic
limitation: how are ranges picked
advantage: can be used with smaller samples
parent model
allow item parameters to differ between groups
nested models
constrain a single item’s parameters to be equal across groups
bias and fairness
evidence of test bias does not mean a test is unfair
DIF may be detected, but might not cause impactful differences in scores
understand how to create table of specifications for item development
understand how to use a table of specifications for item development
describe different item formats
what kind of tests are different item formats suited for
describe and write good achievement test items
be familiar with Bloom’s taxonomy
alternative ways of defining the cognitive demands of items
describe and write good survey items
How is a test made?
choose and define constructs
determine the best method to use
develop possible questions or items
What are types of tests or measurements?
achievement
aptitude
ability or intelligence
personality
neuropsychology
career interests
achievement test
assess an individual’s level of knowledge in a particular domain
aptitude test
measure an individual’s potential to succeed in an activity requiring a particular skill or set of skills and can predict future outcomes
ability or intelligence test
assess one’s level of skill or competence in a wide variety of areas
personality test
assess an individual’s unique and table set of characteristics, traits, or attitudes
neuropsychological test
asses the functioning of the brain as it relates to everyday behaviors, including emotions and thinking
vocational or career test
assess and individual’s interests and help classify those interests as they relate to particular jobs and careers
content/table of specifications
define the construct or content domain you are measuring in excruciating detail
What must be defined in clinical/psychological assessments?
define the construct and describe the associated observable behaviors
What must be defined in organizational tests?
define the knowledge and skills needed to do a job successfully
What must be defined in educational assessments?
describe the curriculum to be assessed
What are selected-response formats?
Likert format
category format
multiple-choice
What are constructed-response formats?
essay questions
interview questions
performance assessment
Likert format
people presented with a statement and asked to use a rating scale to respond according to the anchor
anchor
labels for different positions on the Likert scale
category format
rating scale between a and b
How might category format lead to reliability and validity issues?
too many categories
guidelines for survey items
every item is important and requires a response
item should apply to all respondents unless filter questions are used to exclude a participant
avoid double-barreled items
item should be technically accurate
item should be a complete question or sentence with a simple structure
use as few words as possible in each item stem and options
use simple, familiar words
use specific, concrete words to specify concepts clearly
avoid negatively worded or connotative inconsistent items and options
avoid leading or loaded items that suggest an appropriate response
guidelines for ordinal scales
balance the item stem
choose an appropriate rating scale length
avoid the middle or neutral category
provide balanced scales where categories are relatively equal distance apart conceptually
verbally label all response categories
align response options in one column (single item) or horizontally on one row (multiple items)
response categories should be exhaustive, including all plausible responses
response categories should be mutually exclusive
response categories should approximate the actual distribution of the characteristic in the population
guidelines for nominal scales
avoid the “other” option
use forced-choice items instead of check-all-that-apply items
multiple choice
stem
options
correct: correct answer or key
incorrect: distractors, foils, or misleads
Bloom’s taxonomy
used to gauge the cognitive demand of test items
What are parts of the cognitive dimension of Bloom’s taxonomy?
remember
understand
apply
analyze
evaluate
create
cognitive demands
poor reliability in labeling questions using Bloom’s taxonomy
revised taxonomy:
recall
comprehend
use (or apply)
guidelines for achievement tests
content concerns
style concerns
writing the stem
writing the options
content concerns
base each item on one type of content and cognitive demand
use new material to elicit higher-level thinking
keep the content of items independent of one another
avoid opinions unless qualified
style concerns
edit and proof items
keep linguistic complexity appropriate to the group being tested
minimize the amount of reading in each item
writing the stem
state the central idea clearly and concisely in the stem and not in the options
word the stem positively, avoid negative phrasing
writing the options
use only options that are plausible and discriminating
make sure that only one of these options is the right answer
place options in logical or numerical order
keep options independent
avoid using options none-of-the-above, all-of-the-above, or I don’t know
word options positively, avoiding negative words such as NOT
avoid giving clues to the right answer
make all distractors plausible
avoid the use of humor