1/41
Vocabulary flashcards about test development, covering concepts from initial progress to qualitative item analysis.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Conceptualization (Test Development)
The process of defining what a test is designed to measure, including how the test developer defines the construct and how it differs from similar tests.
Stimulus for Test Development
An emerging social phenomenon or pattern of behavior that might inspire the development of a new test.
Pilot Work
Preliminary research surrounding the creation of a test prototype, involving evaluating test items for inclusion in the final instrument.
Scaling
The process of setting rules for assigning numbers in measurement; the design and calibration of a measuring device.
Age-based Scale
A scale measuring a testtaker’s performance as a function of age.
Grade-based Scale
A scale measuring a testtaker’s performance as a function of their grade level.
Stanine Scale
A scale used to transform raw test scores into scores ranging from 1 to 9.
Unidimensional Test
A test where only one dimension is presumed to underlie the ratings.
Multidimensional Test
A test where more than one dimension is thought to guide the testtaker’s responses.
Comparative Scaling
Entails judgments of a stimulus in comparison with every other stimulus on the scale.
Categorical Scaling
Stimuli are placed into one of two or more alternative categories that differ quantitatively with respect to some continuum.
Rating Scale
A grouping of words, statements, or symbols on which judgments of the strength of a trait, attitude, or emotion are indicated.
Summative Scale
A scale in which the final test score is obtained by summing the ratings across all items.
Likert Scale
A scale where each item presents testtakers with five to seven alternate responses, usually on an agree-disagree continuum.
Method of Paired Comparisons
Testtakers are presented with pairs of stimuli and must select one according to a rule.
Guttman Scale
Items range sequentially from weaker to stronger expressions of attitude, belief, or feeling being measured.
Method of Equal-Appearing Intervals
A scaling method used to obtain data presumed to be interval in nature, involving collecting and evaluating statements reflecting positive and negative attitudes.
Item Pool
The reservoir from which items will or will not be drawn for the final version of a test.
Selected-Response Format
Requires testtakers to select a response from a set of alternative responses.
Matching Item
Testtakers are presented with two columns: premises and responses, and the task is to associate each response with the correct premise.
Binary-Choice Item
Usually takes the form of a sentence that requires the testtaker to indicate whether the statement is or is not a fact.
Constructed-Response Format
Require testtakers to supply or create the correct answer, not merely select it.
Completion Item
Requires the examinee to provide a word or phrase that completes a sentence.
Short-Answer Item
Another form of completion item that is more on identification rather than sentence completion
Essay Item
A test item requiring the testtaker to respond to a question by writing a composition, typically demonstrating recall, understanding, analysis, or interpretation.
Item Bank
A large and easily accessible collection of test questions classified by subject area or item statistics.
Computerized Adaptive Testing (CAT)
An interactive, computer-administered test-taking process where items presented are based on the testtaker’s performance on previous items.
Item Branching
Ability of the computer to tailor the content and order of test items based on responses to previous items.
Cumulative Model
The higher the score on the test, the higher the testtaker is on the characteristic being measured.
Class Scoring/Category Scoring
Testtaker responses earn credit toward placement in a class or category with others showing similar response patterns.
Ipsative Scoring
Comparing a testtaker’s score on one scale within a test to another scale within that same test.
Phantom Factors
Factors that are just artifacts of the small sample size
Item-Difficulty Index (p)
The proportion of testtakers who answered the item correctly.
Item-Reliability Index
Indicates the internal consistency of a test, equal to the product of the item-score standard deviation and the correlation between the item score and the total test score.
Item-Validity Index
Statistic designed to indicate the degree to which a test measures what it purports to measure.
Item-Discrimination Index (d)
Indicates how adequately an item separates or discriminates between high scorers and low scorers on an entire test.
Qualitative Item Analysis
Nonstatistical procedures designed to explore how individual test items work, comparing them to each other and to the test as a whole.
Think Aloud Test Administration
Designed to shed light on the testtaker’s thought processes during the administration of a test.
Sensitivity Review
Study of test items, typically during test development, examining fairness and offensive content.
Cross-validation
Revalidation of the test on a sample of testtakers other than those on whom test performance was originally found to be a valid predictor of some criterion
Validity Shrinkage
Decrease in item validities that inevitably occurs after cross-validation of findings.
Co-validation
Test validation process conducted on two or more tests using the same sample of testtakers