True or false: Reducing random error is the goal of increasing a survey’s reliability.
37
New cards
test-retest, split-half
two most common methods of furnishing evidence of survey reliability
38
New cards
survey validity
degree to which the survey reflects or assesses the concept that a researcher is attempting to measure
39
New cards
survey reliability
degree to which the survey is free from measurement error
40
New cards
false
True or false: Surveys focus on individual outcomes.
41
New cards
false
True or false: Surveys report results as an overall described score or scale score.
42
New cards
testing universe, target audience, test purpose
The first step of test development includes defining what three characteristics?
43
New cards
testing universe
body of knowledge or behaviors that the test represents
44
New cards
working definition of construct, literature review
two aspects of defining the testing universe
45
New cards
test purpose
what the test will measure and how test users will utilize the test scores
46
New cards
criterion
The ___ approach uses test scores to evaluate achievement.
47
New cards
normative
The ___ approach uses test scores to compare test takers.
48
New cards
test plan
specifies the characteristics of the test, including an operational definition of the construct and the content to be measured (the testing universe), the format for the questions, and the administration and scoring of the test
49
New cards
stimulus (item), mechanism (response)
two components of test format
50
New cards
cumulative model
Model of scoring in which
* test takers receive points for correct answers (such as a multiple choice test) * the more a test taker responds in a particular fashion, the more they exhibit the measured attribute
51
New cards
categorical model
Model of scoring which
* places test takers in a particular group or class * test taker displays a pattern of responses
52
New cards
ipsative model
Model of scoring in which
* test takers must make forced choices (such as when responding “most like” or “least like” me)
53
New cards
cumulative model
most common model of test scoring
54
New cards
pilot test
a scientific evaluation of a test’s performance which suggests that the test scores are reliable and valid for their specified purpose
55
New cards
true
True or false: Forced choice items are mostly used in personality tests.
56
New cards
performance assessment, simulation, portfolio
three complex item formats
57
New cards
faking
answering in a way that will result in a specific outcome or diagnosis
58
New cards
faking
Article concerning validity of PTSD disability claims was relevant to which response set?
59
New cards
acquiescence
Balancing positive and negative questions in a test (then reverse-coding) counters what response set?
60
New cards
100
In a test which aims to have 50 items, how many items should be written during test development?
61
New cards
true
True or false: All test responses should be similar in length and detail.
62
New cards
false
True or false: The words “always” and “never” should be used in test items for clarity.
63
New cards
true
True or false: Order of test items should be randomized.
64
New cards
subjective
Validity based on content is more likely to suffer in an (objective/subjective) test format.
65
New cards
objective
(Objective/subjective) tests require more thought and development time.
66
New cards
subjective
(Objective/subjective) tests are easier to construct and revise.
67
New cards
subjective
(Objective/subjective) tests are better suited for testing higher order skills.
68
New cards
*p* = (number who got the item correct) / (number who responded to the item)
item difficulty equation (as part of Quantitative Item Analysis)
69
New cards
.5
What is the ideal value for item difficulty?
70
New cards
.9 - 1
quantitative value of an item that is too easy
71
New cards
0 - .2
quantitative value of an item that is too difficult
72
New cards
false
True or false: Quantitative item analysis seeks to minimize variation.
73
New cards
item discrimination
extent to which items differentiate high from low scorers
74
New cards
positive
Item discrimination (*D*) =
*U* \[(number in upper group who responded correctly) / (number in upper group) x 100\]
\- *L* \[(number in lower group who responded correctly) / (number in lower group) x 100\]
\ Ideal *D* should be a high (positive/negative).
75
New cards
internal consistency
What do inter-item correlations measure?
76
New cards
item-total correlation
measure of overall test consistency AND item’s ability to
discriminate high from low scoring individuals
77
New cards
positive
Item-total correlations should be (positive/negative).
78
New cards
subtle questions
questions used for item-criterion correlations that have no apparent relation to the criterion, but may have sample-specific properties
79
New cards
phi correlation
correlation used between two binary items
80
New cards
point biserial correlation
correlation used between a binary item and a quantitative score
81
New cards
point biserial
Item-total correlations use (phi/point biserial) coefficients.
82
New cards
item-response theory (IRT)
estimates of respondent ability as well as item difficulty & discrimination, independent of each other
83
New cards
Item-Characteristic Curve
line that results when we graph the probability of answering an item correctly with the level of ability on the construct being measured
84
New cards
ability; probability of correct response
Item-Characteristic Curve
x-axis = ___
y-axis = ___
85
New cards
true score
“Ability” in item response theory corresponds to what value in classical test theory?
86
New cards
item difficulty
The slope *(b)* of an Item-Characteristic Curve indicates what parameter?
87
New cards
item discrimination
The intercept *(a)* of an Item-Characteristic Curve indicates what parameter?
88
New cards
construct bias
What kind of bias arises when items do not have the same meaning from culture to culture or subculture?
89
New cards
method bias
What kind of bias arises when the mechanics of the test work differently for various cultural groups?
90
New cards
differential item functioning
What kind of bias arises when test takers from different cultures have the same ability level on the test construct, but the item or test yields very different scores for the two cultures?
91
New cards
qualitative
Expert panels and questionnaires for test takers are both examples of ___ item analysis.
92
New cards
validity
Replication and cross-validation are both methods of demonstrating evidence of ___
93
New cards
cross-validation
method of test validation which seeks to generalize regression results to a new sample of test takers
94
New cards
replication
method of test validation which seeks to ensure that criterion validity is the same in another sample
95
New cards
predictive bias
What is indicated by varying validity coefficients between subgroups?
96
New cards
slope bias, intercept bias
Using a common regression line will overpredict some groups and underpredict others in the event of what kinds of bias?
97
New cards
slope bias
What kind of bias will result in differential validity for the groups being predicted?
98
New cards
differential item functioning
occurs when people from different groups with the same latent trait (ability/skill) have a different probability of giving a certain response on a questionnaire or test
99
New cards
norm
group of scores that indicate the average performance of a group and the distribution of scores above and below this average
100
New cards
cut scores
decision points for dividing test scores into pass/fail groupings—provide information that assists the test user in interpreting test results