(6) TEST DEVELOPMENT

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/134

There's no tags or description

Looks like no tags are added yet.

Last updated 1:19 PM on 6/8/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

135 Terms

New cards

Test Development

an umbrella term for all that goes into process of creating a test

New cards

Test Development

Test Conceptualization
Test Construction
Test Tryout
Item Analysis
Test Revision

New cards

Test Conceptualization

process of conceptualizing the construct, items included, and design of the test

New cards

Test Conceptualization - Step 1

describe the purpose and rationale for the test

New cards

Test Conceptualization - Step 2

describe the target population for the test

New cards

Test Conceptualization - Step 3

clearly define the key variables of interests

New cards

Test Conceptualization - Step 4

create item specifications

New cards

Test Conceptualization - Step 5

choose item format

New cards

Test Conceptualization - Step 6

specify administration to consider

New cards

Some Preliminary Questions

What is the test designed to measure?
What is the objective of the test?
Is there a need for this test?
Who will use this test?
Who will take this test?

New cards

Some Preliminary Questions

What content will the test cover?
How will the test be administered?
What is the ideal format of the test?
Should more than one form of the test be developed?
How will meaning be attributed to scores on this test?

New cards

Some Preliminary Questions

What special training will be required of test users for administering or interpreting the test?
What types of responses will be required of test takers?
Who benefits from the administration of this test?
Is there any potential for harm as the result of an administration of this test?

New cards

Pilot Testing

preliminary research surrounding the creation of a prototype of the test

New cards

Norm-referenced

comparisons typically are insufficient and inappropriate when knowledge of mastery is required

New cards

Criterion-referenced

at least two groups of the test takers, one group known to have mastered the knowledge or skill being measured and another group known not have mastered such knowledge or skill

New cards

Test Construction

stage in the process that entails writing test items (or rewriting or revising existing test items), as well as formatting items, setting scoring rules, and otherwise designing and building a test

New cards

Item Pool

the reservoir or well from which items will or will not be drawn for the final version of the test

New cards

Item Pool

It is usually advisable that the first draft contain approximately twice the number of items that the final version of the test will contain

New cards

Item Banks

relatively large and easily accessible collection of test questions

New cards

Computerized Adaptive Testing (CAT)

items presented to test takers are based in the part on the test taker’s performance on previous items

New cards

Floor Effect

low end of ability, trait, or other measurable attribute

New cards

Ceiling Effect

high end of ability, trait, or other measurable attribute

New cards

Item Branching

the ability of the computer to tailor the content and order of presentation of items on the basis of responses to previous items

New cards

Item Format

form, plan, structure, arrangement, and layout of individual test items

New cards

Dichotomous Format

offers two alternatives for each item

New cards

Polychotomous Format

each item has more than two alternatives

New cards

Category Format

a format where respondents are asked to rate a construct

New cards

Checklist

subject receives a long list of adjectives and indicate whether each one is characteristic of himself/herself

New cards

Guttman Scale

are arranged sequentially from weaker to stronger expressions of attitudes, beliefs, or feelings being measured

New cards

Selected-Response Format

select a response from a set of alternatives

New cards

Multiple Choice

three elements: (1) stem, (2) a correct alternative or option, and (3) several incorrect alternatives or options: distractor or foils (25%)

New cards

Effective Distractors

was chosen by both high- and low-performing groups, which enhances the consistency of a test result

New cards

Ineffective Distractors

may hurt the reliability of the test because they are time-consuming to read and can limit the number of good items

New cards

Cute Distractors

less likely to be chosen, may affect the reliability of the test takers, who may guess from the remaining options

New cards

Matching Item

presented with two columns (premises) on the left (responses) on the right

New cards

Binary Choice

contains only two possible responses (true-false item or forced-choice item) (50%)

New cards

Constructed-Response Format

supply or create the correct answer

New cards

Completion Item

provide a word or phrase that completes a sentence

New cards

Short-Answer

word, term, sentence, or paragraph may qualify

New cards

Essay

respond to a question by writing a composition

New cards

Scaling

process of setting rules for assigning numbers in measurement

New cards

Notion of Absolute Scaling

procedure obtaining a measure of item difficulty across samples of test takers who vary in ability

New cards

Types of Scaling

Age-based
Grade-based
Stanine
Unidimensional
Multidimensional

New cards

Paired Comparison

Produces ordinal data by presenting with pairs of two stimuli, which they are asked to compare

New cards

Rank Order

Respondents are presented with several items simultaneously and asked to rank them in order or priority

New cards

Constant Sum

Respondents are asked to allocate a constant sum of units, such as points, among set of stimulus objects with respect to some criterion

New cards

Q-Sort Technique

Sort objects based on similarity with respect to some criterion

New cards

Continuous Rating

Rate the objects by placing a mark at the appropriate position on a continuous line that runs from one extreme of the criterion variable to the other

New cards

Itemized Rating

Having numbers or brief descriptions associated with each category

New cards

Likert Scale

Indicate their own attitudes by checking how strongly they agree or disagree with carefully worded statements that range from very positive to very negative towards attitudinal object

New cards

Visual Analogue Scale

A 100-mm line that allows subjects to express the magnitude of an experience or belief

New cards

Semantic Differential Scale

Measures an individual’s attitudes or emotional reactions to a concept or object by using a series of bipolar adjectives or phrases

New cards

Stapel Scale

Uses a numerical scale, typically presented vertically, with a single adjective in the middle to measure respondents' attitudes or opinions about a specific subject

New cards

Summative Scale

Summing all the rating across all the items

New cards

Thurstone Scale

Uses a set of statements about a topic, with each statement assigned a numerical value reflecting the respondent’s attitude towards it

New cards

Ipsative Scale

Where respondents distribute a fixed number of points across different attributes, highlighting their relative strengths and weaknesses within themselves

New cards

Class Scoring (Category Scoring)

test taker responses earn credit toward placement in a particular class or category with other test takers whose pattern or responses is presumably similar in one way

New cards

Cumulative Model

the higher the score on the test, the higher the test taker is on the ability, trait, or other characteristic that the test purports to measure

New cards

Ipsative Scoring

comparing a test taker’s score on one scale with a test to another scale within that same test

New cards

Test Tryout

administered to a representative sample of test takers under conditions that stimulate the conditions that the final version of the will be administered under

New cards

Test Tryout

The test should be tried out on people who are similar in critical respects to the people for whom the test was designed

New cards

Test Tryout

Informal rule of thumb: no fewer than 5 subjects and preferably as many as 10 for each item on the test—the more subjects in the tryout, the better

New cards

Phantom Factors

factors that actually are just artifacts of the small sample size

New cards

Phantom Factors

A definite risk in using too few subjects during test tryout comes during the factor analysis findings

New cards

“Good Item”

reliable and valid, and can answered by the test taker correctly by high scores on the test as a whole

New cards

Pseudobulbar Affect (PBA)

a neurological disorder characterized by frequent and involuntary outbursts of laughing or crying that may or may not be appropriate to the situation

New cards

Item Analysis

statistical procedure to analyze items and evaluate test items

New cards

Item Analysis

Employed to assist in making judgments about which items are good as they are and which items need to be revised, or discarded

New cards

Item

suggest a sample of behavior of an individual

New cards

Table of Specification

a blueprint of the test in terms of number of items per difficulty, topic importance, or taxonomy

New cards

Guidelines of Writing Items

Define clearly what to measure
Generate item pool
Avoid long items

New cards

Guidelines of Writing Items

Keep the level of reading difficulty appropriate for those who will complete the test
Avoid double-barreled items
Consider making positive and negative word items

New cards

convey more than one ideas at the same time

Double-Barreled Items

New cards

Item-Difficulty Index

obtained by calculating the proportion of the total number of test takers who answered correctly

New cards

Item-Difficulty Index

Denoted as “p”

New cards

Item-Difficulty Index

The higher the value, the easier the question

New cards

Item-Difficulty Index

A particular item is too easy or too difficult, the item must be rewritten or discarded

New cards

Item Endorsement Index

statistics provides not a measure of of the percent of people passing the item but a measure of the percent of people who said yes to, agreed with, or otherwise endorsed the item

New cards

Level of Difficulty - Very Difficult

Item Difficulty Range - 0.00 - 0.19

New cards

Level of Difficulty - Difficult

Item Difficulty Range - 0.20 - 0.39

New cards

Level of Difficulty - Average

Item Difficulty Range - 0.40 - 0.60

New cards

Level of Difficulty - Easy

Item Difficulty Range - 0.61 - 0.79

New cards

Level of Difficulty - Very Easy

Item Difficulty Range - 0.80 - 1.00

New cards

Optimal Level of Item Difficulty (Optimal Difficulty)

refers to the ideal level of difficulty for the questions on a test or assessment, designed to maximize the effectiveness and quality of the test

New cards

Optimal Level of Item Difficulty (Optimal Difficulty)

refers to the ideal level of difficulty for the questions on a test or assessment, designed to maximize the effectiveness and quality of the test

New cards

Optimal Level of Item Difficulty (Optimal Difficulty)

It is usually the midpoint between 1.00 and the chance of success proportion, defined by the probability of answering correctly by random guessing

New cards

Chance Score

the proportion of correct answers expected if test-takers were simply guessing.

New cards

0.25

For a 4-option multiple-choice question, the chance score is 1/4 or ___

New cards

0.20

For a 5-option question, it's 1/5 or ___

New cards

1.0 (or 100% correct)

Perfect Score

New cards

ideal difficulty level (multiple choice)

item is slightly higher than midway between the chance score and 1.0

New cards

4-option multiple-choice

the midpoint between the chance score (0.25) and 1.0 is 0.625. A slightly higher value, around 0.74, is often cited as an optimal difficulty level for a four-response multiple-choice item

New cards

5-option multiple-choice

the midpoint between the chance score (0.20) and 1.0 is 0.60. A difficulty level around 0.70 is often considered optimal

New cards

Item-Reliability Index

indication of the internal consistency of a test

New cards

Item-Reliability Index

The higher the index, the greater the test’s internal consistency

New cards

Internal Consistency - May have limited applicability

Item Reliability Index - 0.69 & below

New cards

Internal Consistency - Adequate

Item Reliability Index - 0.70 - 0.79

New cards

Internal Consistency - Good

Item Reliability Index - 0.80 - 0.89

New cards

Internal Consistency - Excellent

Item Reliability Index - 0.90 & above

100

New cards

Item-Validity Index

designed to provide an indication of the degree to which a test is measuring what it purports to measure