1/71
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Test development
An umbrella term for all that goes into the process of creating a test.
Test conceptualization
The thought that “there ought to be a test for…” is impetus to developing a new test.
Norm-referenced
Generally, a good item on a __________________ achievement test is an item for which high scorers on the test respond correctly and low scorers respond incorrectly.
Criterion-oriented test
Ideally, each item on a ___________________ addresses the issue of whether the respondent has met certain criteria.
Pilot work
Refers to the preliminary research surrounding the creation of a prototype of the test.
Scaling
The process of setting rules for assigning numbers in measurement.
Scales
Are instruments to measure some trait, state, or ability and may be categorized in many ways.
Rating scale
Defined as a grouping of words, statements, or symbols on which judgments of the strength of a particular trait, attitude, or emotion are indicated by the test taker.
Summative scale
The final test score is obtained by summing the ratings across all the items.
Likert Scale
Each item presents the test taker with five alternative responses (sometimes seven), usually on an agree–disagree or approve–disapprove continuum.
Method of Paired Comparisons
Test takers must choose between two alternatives according to some rules.
Comparative scaling
This entails judgments of a stimulus in comparison with every other stimulus on the scale.
Categorical scaling
Stimuli are placed into one of two or more alternative categories that differ quantitatively concerning some continuum.
Guttman scale
Items range sequentially from weaker to stronger expressions of the attitude, belief, or feeling being measured.
Item pool
The reservoir or well from which items will or will not be drawn for the final version of the test.
Item format
Variables such as the form, plan, structure, arrangement, and layout of individual test items.
Selected-response Item Format
Require test takers to select a response from a set of alternative responses.
Multiple-choice Format
Has three elements:
A stem
A correct alternative option
A several incorrect alternatives or options variously referred to as distractors or foils.
Binary-choice Item
Contains only two possible responses.
Matching Item
The testtaker is presented with two columns: premises on the left and responses on the right.
True-False Item
This type of selected-response item usually takes the form of a sentence that requires the testtaker to indicate whether the statement is or is not a fact.
Constructed-response Item Format
Require testtakers to supply or to create the correct answer, not merely to select it.
Completion Item
Requires the examinee to provide a word or phrase that completes a sentence.
Short-answer Item
It is desirable for ____________________ to be written clearly enough that the testtaker can respond succinctly.
Essay Item
A test item that requires the testtaker to respond to a question by writing a composition, typically one that demonstrates recall of facts, understanding, analysis, and/or interpretation.
Item bank
A relatively large and easily accessible collection of test questions.
Computerized Adaptive Testing (CAT)
Refers to an interactive, computer-administered testtaking process wherein items presented to the testtaker are based in part on the testtaker’s performance on previous items.
Computerized Adaptive Testing (CAT)
__________________________ tends to reduce Floor Effect and Ceiling Effect.
Floor Effect
Refers to the diminished utility of an assessment tool for distinguishing testtakers at the low end of the ability, trait, or other attribute being measured.
Ceiling Effect
Refers to the diminished utility of an assessment tool for distinguishing testtakers at the high end of the ability, trait, or other attribute being measured.
Item Branching
The ability of the computer to tailor the content and order of presentation of test items on the basis of responses to previous items.
Class Scoring
Testtaker responses earn credit toward placement in a particular class or category with other testtakers whose pattern of responses is presumably similar in some way.
Ipsative Scoring
A typical objective in ________________ is comparing a testtaker’s score on one scale within a test to another scale within that same test.
Test Tryout
The test should be tried out on people who are similar in critical respects to the people for whom the test was designed.
5, 10
There should be no fewer than ___ subjects and preferably as many as ___ for each item on the test.
Phantom factors
Factors that actually are just artifacts of the small sample size.
Test-Difficulty Index
It is obtained by calculating the proportion of the total number of testtakers who answered the item correctly.
Easier
A lowercase italic “p” (p) is used to denote item difficulty, and a subscript refers to the item number (so p1 is read “item-difficulty index for item 1”).
p refers to the percent of people passing an item, the higher the p for an item, the________ the item.
Item-Reliability Index
Provides an indication of the internal consistency of a test; the higher this index, the greater the test’s internal consistency.
Factor Analysis
A statistical tool useful in determining whether items on a test appear to be measuring the same thing(s).
Item-Validity Index
Is a statistic designed to provide an indication of the degree to which a test is measuring what it purports to measure.
s1
ITEM-VALIDITY INDEX
Item score standard deviation is denoted by the symbol ___.
p1
ITEM-VALIDITY INDEX
Index of the item’s difficulty is denoted by the symbol ___.
Item-Discrimination Index
The quality of each alternative within a multiple-choice item can be readily assessed with reference to the comparative performance of upper and lower scorers.
Item-Characteristic Curves (ICCs)
A graphic representation of item difficulty and discrimination.
Greater
The steeper the slope, the __________ the item discrimination.
Guessing
Methods designed to detect ___________, minimize the effects of ___________, and statistically correct for ___________ have been proposed, but no such method has achieved universal acceptance.
Item Fairness
Refers to the degree, if any, a test item is biased.
Speed tests
Item analyses of tests taken under speed conditions yield misleading or uninterpretable results.
The closer an item is to the end of the test, the more difficult it may appear to be.
Qualitative Methods
Are techniques of data generation and analysis that rely primarily on verbal rather than mathematical or statistical procedures.
Qualitative Methods
involve exploration of the issues through verbal means such as interviews and group discussions conducted with test takers and other relevant parties.
Qualitative Item Analysis
Is a general term for various nonstatistical procedures designed to explore how individual test items work.
“Think Aloud” Test Administration
Cohen et al. (1988) proposed the use of ______________________ as a qualitative research tool designed to shed light on the test taker’s thought processes during the administration of a test.
“Think Aloud” Test Administration
On a one-to-one basis with an examiner, examinees are asked to take a test, thinking aloud as they respond to each item.
Expert Panel
_____________ may provide qualitative analyses of test items. They may also play a role in the development of new tools of assessment for members of underserved populations.
Sensitivity review
Is a study of test items, typically conducted during the test development process, in which items are examined for fairness to all prospective test takers and for the presence of offensive language, stereotypes, or situations.
Test Revision
When the item analysis of data derived from a test administration indicates that the test is not yet in finished form, the steps of revision, tryout, and item analysis are repeated until the test is satisfactory and standardization can occur.
Steps to revise an existing test
Test Conceptualization → Test Construction → Test Tryout → Item Analysis → Test Revision
Cross-validation
A key step in the development of all tests—brand-new or revised editions.
Cross-Validation
Refers to the revalidation of a test on a sample of testtakers other than those on whom test performance was originally found to be a valid predictor of some criterion.
Co-Validation
Test validation process conducted on two or more tests using the same sample of test takers.
Validity Shrinkage
Item validities inevitably become smaller when administered to a second sample.
Co-Norming
A process where it is used in conjunction with the creation of norms or the revision of existing norms.
Anchor protocol
A mechanism that ensures consistency in scoring.
Scoring Drift
Discrepancy between scoring in an anchor protocol and the scoring of another protocol.
Item Response Theory (IRT)
Could be applied in the evaluation of the utility of tests and testing programs.
Item-Characteristic Curves (ICCs)
Provide information about the relationship between the performance of individual items and the presumed underlying ability (or trait) level in the testtaker.
Item-Characteristic Curves (ICCs)
Using IRT, test developers evaluate individual item performance with reference to _______________________.
Use of IRT in Building and Revising Tests
Evaluating the properties of existing tests and guiding test revision
Determining measurement equivalence across testtaker populations
Developing item banks
Differential Item Functioning (DIF)
An item functions differently in one group of testtakers as compared to another group of testtakers known to have the same (or similar) level of the underlying trait.
DIF Analysis
Test developers scrutinize group-by-group item response curves.
DIF Items
Items that respondents from different groups at the same level of the underlying trait have different probabilities of endorsing as a function of their group membership.