(6) TEST DEVELOPMENT

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/134

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 1:19 PM on 6/8/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

135 Terms

1
New cards

Test Development

an umbrella term for all that goes into process of creating a test

2
New cards

Test Development

  • Test Conceptualization

  • Test Construction

  • Test Tryout

  • Item Analysis

  • Test Revision

3
New cards

Test Conceptualization

process of conceptualizing the construct, items included, and design of the test

4
New cards

Test Conceptualization - Step 1

describe the purpose and rationale for the test

5
New cards

Test Conceptualization - Step 2

describe the target population for the test

6
New cards

Test Conceptualization - Step 3

clearly define the key variables of interests

7
New cards

Test Conceptualization - Step 4

create item specifications

8
New cards

Test Conceptualization - Step 5

choose item format

9
New cards

Test Conceptualization - Step 6

specify administration to consider

10
New cards

Some Preliminary Questions

  • What is the test designed to measure?

  • What is the objective of the test?

  • Is there a need for this test?

  • Who will use this test?

  • Who will take this test?

11
New cards

Some Preliminary Questions

  • What content will the test cover?

  • How will the test be administered?

  • What is the ideal format of the test?

  • Should more than one form of the test be developed?

  • How will meaning be attributed to scores on this test?

12
New cards

Some Preliminary Questions

  • What special training will be required of test users for administering or interpreting the test?

  • What types of responses will be required of test takers?

  • Who benefits from the administration of this test?

  • Is there any potential for harm as the result of an administration of this test?

13
New cards

Pilot Testing

preliminary research surrounding the creation of a prototype of the test

14
New cards

Norm-referenced

comparisons typically are insufficient and inappropriate when knowledge of mastery is required

15
New cards

Criterion-referenced

at least two groups of the test takers, one group known to have mastered the knowledge or skill being measured and another group known not have mastered such knowledge or skill

16
New cards

Test Construction

stage in the process that entails writing test items (or rewriting or revising existing test items), as well as formatting items, setting scoring rules, and otherwise designing and building a test

17
New cards

Item Pool

the reservoir or well from which items will or will not be drawn for the final version of the test

18
New cards

Item Pool

It is usually advisable that the first draft contain approximately twice the number of items that the final version of the test will contain

19
New cards

Item Banks

relatively large and easily accessible collection of test questions

20
New cards

Computerized Adaptive Testing (CAT)

items presented to test takers are based in the part on the test taker’s performance on previous items

21
New cards

Floor Effect

low end of ability, trait, or other measurable attribute

22
New cards

Ceiling Effect

high end of ability, trait, or other measurable attribute

23
New cards

Item Branching

the ability of the computer to tailor the content and order of presentation of items on the basis of responses to previous items

24
New cards

Item Format

form, plan, structure, arrangement, and layout of individual test items

25
New cards

Dichotomous Format

offers two alternatives for each item

26
New cards

Polychotomous Format

each item has more than two alternatives

27
New cards

Category Format

a format where respondents are asked to rate a construct

28
New cards

Checklist

subject receives a long list of adjectives and indicate whether each one is characteristic of himself/herself

29
New cards

Guttman Scale

are arranged sequentially from weaker to stronger expressions of attitudes, beliefs, or feelings being measured

30
New cards

Selected-Response Format

select a response from a set of alternatives

31
New cards

Multiple Choice

three elements: (1) stem, (2) a correct alternative or option, and (3) several incorrect alternatives or options: distractor or foils (25%)

32
New cards

Effective Distractors

was chosen by both high- and low-performing groups, which enhances the consistency of a test result

33
New cards

Ineffective Distractors

may hurt the reliability of the test because they are time-consuming to read and can limit the number of good items

34
New cards

Cute Distractors

less likely to be chosen, may affect the reliability of the test takers, who may guess from the remaining options

35
New cards

Matching Item

presented with two columns (premises) on the left (responses) on the right

36
New cards

Binary Choice

contains only two possible responses (true-false item or forced-choice item) (50%)

37
New cards

Constructed-Response Format

supply or create the correct answer

38
New cards

Completion Item

provide a word or phrase that completes a sentence

39
New cards

Short-Answer

word, term, sentence, or paragraph may qualify

40
New cards

Essay

respond to a question by writing a composition

41
New cards

Scaling

process of setting rules for assigning numbers in measurement

42
New cards

Notion of Absolute Scaling

procedure obtaining a measure of item difficulty across samples of test takers who vary in ability

43
New cards

Types of Scaling

  • Age-based

  • Grade-based

  • Stanine

  • Unidimensional

  • Multidimensional

44
New cards

Paired Comparison

Produces ordinal data by presenting with pairs of two stimuli, which they are asked to compare

45
New cards

Rank Order

Respondents are presented with several items simultaneously and asked to rank them in order or priority

46
New cards

Constant Sum

Respondents are asked to allocate a constant sum of units, such as points, among set of stimulus objects with respect to some criterion

47
New cards

Q-Sort Technique

Sort objects based on similarity with respect to some criterion

48
New cards

Continuous Rating

Rate the objects by placing a mark at the appropriate position on a continuous line that runs from one extreme of the criterion variable to the other

49
New cards

Itemized Rating

Having numbers or brief descriptions associated with each category

50
New cards

Likert Scale

Indicate their own attitudes by checking how strongly they agree or disagree with carefully worded statements that range from very positive to very negative towards attitudinal object

51
New cards

Visual Analogue Scale

A 100-mm line that allows subjects to express the magnitude of an experience or belief

52
New cards

Semantic Differential Scale

Measures an individual’s attitudes or emotional reactions to a concept or object by using a series of bipolar adjectives or phrases

53
New cards

Stapel Scale

Uses a numerical scale, typically presented vertically, with a single adjective in the middle to measure respondents' attitudes or opinions about a specific subject

54
New cards

Summative Scale

Summing all the rating across all the items

55
New cards

Thurstone Scale

Uses a set of statements about a topic, with each statement assigned a numerical value reflecting the respondent’s attitude towards it

56
New cards

Ipsative Scale

Where respondents distribute a fixed number of points across different attributes, highlighting their relative strengths and weaknesses within themselves

57
New cards

Class Scoring (Category Scoring)

test taker responses earn credit toward placement in a particular class or category with other test takers whose pattern or responses is presumably similar in one way

58
New cards

Cumulative Model

the higher the score on the test, the higher the test taker is on the ability, trait, or other characteristic that the test purports to measure

59
New cards

Ipsative Scoring

comparing a test taker’s score on one scale with a test to another scale within that same test

60
New cards

Test Tryout

administered to a representative sample of test takers under conditions that stimulate the conditions that the final version of the will be administered under

61
New cards

Test Tryout

The test should be tried out on people who are similar in critical respects to the people for whom the test was designed

62
New cards

Test Tryout

Informal rule of thumb: no fewer than 5 subjects and preferably as many as 10 for each item on the test—the more subjects in the tryout, the better

63
New cards

Phantom Factors

factors that actually are just artifacts of the small sample size

64
New cards

Phantom Factors

A definite risk in using too few subjects during test tryout comes during the factor analysis findings

65
New cards

“Good Item”

reliable and valid, and can answered by the test taker correctly by high scores on the test as a whole

66
New cards

Pseudobulbar Affect (PBA)

a neurological disorder characterized by frequent and involuntary outbursts of laughing or crying that may or may not be appropriate to the situation

67
New cards

Item Analysis

statistical procedure to analyze items and evaluate test items

68
New cards

Item Analysis

Employed to assist in making judgments about which items are good as they are and which items need to be revised, or discarded

69
New cards

Item

suggest a sample of behavior of an individual

70
New cards

Table of Specification

a blueprint of the test in terms of number of items per difficulty, topic importance, or taxonomy

71
New cards

Guidelines of Writing Items

  • Define clearly what to measure

  • Generate item pool

  • Avoid long items

72
New cards

Guidelines of Writing Items

  • Keep the level of reading difficulty appropriate for those who will complete the test

  • Avoid double-barreled items

  • Consider making positive and negative word items

73
New cards

convey more than one ideas at the same time

Double-Barreled Items

74
New cards

Item-Difficulty Index

obtained by calculating the proportion of the total number of test takers who answered correctly

75
New cards

Item-Difficulty Index

Denoted as “p”

76
New cards

Item-Difficulty Index

The higher the value, the easier the question

77
New cards

Item-Difficulty Index

A particular item is too easy or too difficult, the item must be rewritten or discarded

78
New cards

Item Endorsement Index

statistics provides not a measure of of the percent of people passing the item but a measure of the percent of people who said yes to, agreed with, or otherwise endorsed the item

79
New cards

Level of Difficulty - Very Difficult

Item Difficulty Range - 0.00 - 0.19

80
New cards

Level of Difficulty - Difficult

Item Difficulty Range - 0.20 - 0.39

81
New cards

Level of Difficulty - Average

Item Difficulty Range - 0.40 - 0.60

82
New cards

Level of Difficulty - Easy

Item Difficulty Range - 0.61 - 0.79

83
New cards

Level of Difficulty - Very Easy

Item Difficulty Range - 0.80 - 1.00

84
New cards

Optimal Level of Item Difficulty (Optimal Difficulty)

refers to the ideal level of difficulty for the questions on a test or assessment, designed to maximize the effectiveness and quality of the test

85
New cards

Optimal Level of Item Difficulty (Optimal Difficulty)

refers to the ideal level of difficulty for the questions on a test or assessment, designed to maximize the effectiveness and quality of the test

86
New cards

Optimal Level of Item Difficulty (Optimal Difficulty)

It is usually the midpoint between 1.00 and the chance of success proportion, defined by the probability of answering correctly by random guessing

87
New cards

Chance Score

the proportion of correct answers expected if test-takers were simply guessing.

88
New cards

0.25

For a 4-option multiple-choice question, the chance score is 1/4 or ___

89
New cards

0.20

For a 5-option question, it's 1/5 or ___

90
New cards

1.0 (or 100% correct)

Perfect Score

91
New cards

ideal difficulty level (multiple choice)

item is slightly higher than midway between the chance score and 1.0

92
New cards

4-option multiple-choice

the midpoint between the chance score (0.25) and 1.0 is 0.625. A slightly higher value, around 0.74, is often cited as an optimal difficulty level for a four-response multiple-choice item

93
New cards

5-option multiple-choice

the midpoint between the chance score (0.20) and 1.0 is 0.60. A difficulty level around 0.70 is often considered optimal

94
New cards

Item-Reliability Index

indication of the internal consistency of a test

95
New cards

Item-Reliability Index

The higher the index, the greater the test’s internal consistency

96
New cards

Internal Consistency - May have limited applicability

Item Reliability Index - 0.69 & below

97
New cards

Internal Consistency - Adequate

Item Reliability Index - 0.70 - 0.79

98
New cards

Internal Consistency - Good

Item Reliability Index - 0.80 - 0.89

99
New cards

Internal Consistency - Excellent

Item Reliability Index - 0.90 & above

100
New cards

Item-Validity Index

designed to provide an indication of the degree to which a test is measuring what it purports to measure