1/134
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Test Development
an umbrella term for all that goes into process of creating a test
Test Development
Test Conceptualization
Test Construction
Test Tryout
Item Analysis
Test Revision
Test Conceptualization
process of conceptualizing the construct, items included, and design of the test
Test Conceptualization - Step 1
describe the purpose and rationale for the test
Test Conceptualization - Step 2
describe the target population for the test
Test Conceptualization - Step 3
clearly define the key variables of interests
Test Conceptualization - Step 4
create item specifications
Test Conceptualization - Step 5
choose item format
Test Conceptualization - Step 6
specify administration to consider
Some Preliminary Questions
What is the test designed to measure?
What is the objective of the test?
Is there a need for this test?
Who will use this test?
Who will take this test?
Some Preliminary Questions
What content will the test cover?
How will the test be administered?
What is the ideal format of the test?
Should more than one form of the test be developed?
How will meaning be attributed to scores on this test?
Some Preliminary Questions
What special training will be required of test users for administering or interpreting the test?
What types of responses will be required of test takers?
Who benefits from the administration of this test?
Is there any potential for harm as the result of an administration of this test?
Pilot Testing
preliminary research surrounding the creation of a prototype of the test
Norm-referenced
comparisons typically are insufficient and inappropriate when knowledge of mastery is required
Criterion-referenced
at least two groups of the test takers, one group known to have mastered the knowledge or skill being measured and another group known not have mastered such knowledge or skill
Test Construction
stage in the process that entails writing test items (or rewriting or revising existing test items), as well as formatting items, setting scoring rules, and otherwise designing and building a test
Item Pool
the reservoir or well from which items will or will not be drawn for the final version of the test
Item Pool
It is usually advisable that the first draft contain approximately twice the number of items that the final version of the test will contain
Item Banks
relatively large and easily accessible collection of test questions
Computerized Adaptive Testing (CAT)
items presented to test takers are based in the part on the test taker’s performance on previous items
Floor Effect
low end of ability, trait, or other measurable attribute
Ceiling Effect
high end of ability, trait, or other measurable attribute
Item Branching
the ability of the computer to tailor the content and order of presentation of items on the basis of responses to previous items
Item Format
form, plan, structure, arrangement, and layout of individual test items
Dichotomous Format
offers two alternatives for each item
Polychotomous Format
each item has more than two alternatives
Category Format
a format where respondents are asked to rate a construct
Checklist
subject receives a long list of adjectives and indicate whether each one is characteristic of himself/herself
Guttman Scale
are arranged sequentially from weaker to stronger expressions of attitudes, beliefs, or feelings being measured
Selected-Response Format
select a response from a set of alternatives
Multiple Choice
three elements: (1) stem, (2) a correct alternative or option, and (3) several incorrect alternatives or options: distractor or foils (25%)
Effective Distractors
was chosen by both high- and low-performing groups, which enhances the consistency of a test result
Ineffective Distractors
may hurt the reliability of the test because they are time-consuming to read and can limit the number of good items
Cute Distractors
less likely to be chosen, may affect the reliability of the test takers, who may guess from the remaining options
Matching Item
presented with two columns (premises) on the left (responses) on the right
Binary Choice
contains only two possible responses (true-false item or forced-choice item) (50%)
Constructed-Response Format
supply or create the correct answer
Completion Item
provide a word or phrase that completes a sentence
Short-Answer
word, term, sentence, or paragraph may qualify
Essay
respond to a question by writing a composition
Scaling
process of setting rules for assigning numbers in measurement
Notion of Absolute Scaling
procedure obtaining a measure of item difficulty across samples of test takers who vary in ability
Types of Scaling
Age-based
Grade-based
Stanine
Unidimensional
Multidimensional
Paired Comparison
Produces ordinal data by presenting with pairs of two stimuli, which they are asked to compare
Rank Order
Respondents are presented with several items simultaneously and asked to rank them in order or priority
Constant Sum
Respondents are asked to allocate a constant sum of units, such as points, among set of stimulus objects with respect to some criterion
Q-Sort Technique
Sort objects based on similarity with respect to some criterion
Continuous Rating
Rate the objects by placing a mark at the appropriate position on a continuous line that runs from one extreme of the criterion variable to the other
Itemized Rating
Having numbers or brief descriptions associated with each category
Likert Scale
Indicate their own attitudes by checking how strongly they agree or disagree with carefully worded statements that range from very positive to very negative towards attitudinal object
Visual Analogue Scale
A 100-mm line that allows subjects to express the magnitude of an experience or belief
Semantic Differential Scale
Measures an individual’s attitudes or emotional reactions to a concept or object by using a series of bipolar adjectives or phrases
Stapel Scale
Uses a numerical scale, typically presented vertically, with a single adjective in the middle to measure respondents' attitudes or opinions about a specific subject
Summative Scale
Summing all the rating across all the items
Thurstone Scale
Uses a set of statements about a topic, with each statement assigned a numerical value reflecting the respondent’s attitude towards it
Ipsative Scale
Where respondents distribute a fixed number of points across different attributes, highlighting their relative strengths and weaknesses within themselves
Class Scoring (Category Scoring)
test taker responses earn credit toward placement in a particular class or category with other test takers whose pattern or responses is presumably similar in one way
Cumulative Model
the higher the score on the test, the higher the test taker is on the ability, trait, or other characteristic that the test purports to measure
Ipsative Scoring
comparing a test taker’s score on one scale with a test to another scale within that same test
Test Tryout
administered to a representative sample of test takers under conditions that stimulate the conditions that the final version of the will be administered under
Test Tryout
The test should be tried out on people who are similar in critical respects to the people for whom the test was designed
Test Tryout
Informal rule of thumb: no fewer than 5 subjects and preferably as many as 10 for each item on the test—the more subjects in the tryout, the better
Phantom Factors
factors that actually are just artifacts of the small sample size
Phantom Factors
A definite risk in using too few subjects during test tryout comes during the factor analysis findings
“Good Item”
reliable and valid, and can answered by the test taker correctly by high scores on the test as a whole
Pseudobulbar Affect (PBA)
a neurological disorder characterized by frequent and involuntary outbursts of laughing or crying that may or may not be appropriate to the situation
Item Analysis
statistical procedure to analyze items and evaluate test items
Item Analysis
Employed to assist in making judgments about which items are good as they are and which items need to be revised, or discarded
Item
suggest a sample of behavior of an individual
Table of Specification
a blueprint of the test in terms of number of items per difficulty, topic importance, or taxonomy
Guidelines of Writing Items
Define clearly what to measure
Generate item pool
Avoid long items
Guidelines of Writing Items
Keep the level of reading difficulty appropriate for those who will complete the test
Avoid double-barreled items
Consider making positive and negative word items
convey more than one ideas at the same time
Double-Barreled Items
Item-Difficulty Index
obtained by calculating the proportion of the total number of test takers who answered correctly
Item-Difficulty Index
Denoted as “p”
Item-Difficulty Index
The higher the value, the easier the question
Item-Difficulty Index
A particular item is too easy or too difficult, the item must be rewritten or discarded
Item Endorsement Index
statistics provides not a measure of of the percent of people passing the item but a measure of the percent of people who said yes to, agreed with, or otherwise endorsed the item
Level of Difficulty - Very Difficult
Item Difficulty Range - 0.00 - 0.19
Level of Difficulty - Difficult
Item Difficulty Range - 0.20 - 0.39
Level of Difficulty - Average
Item Difficulty Range - 0.40 - 0.60
Level of Difficulty - Easy
Item Difficulty Range - 0.61 - 0.79
Level of Difficulty - Very Easy
Item Difficulty Range - 0.80 - 1.00
Optimal Level of Item Difficulty (Optimal Difficulty)
refers to the ideal level of difficulty for the questions on a test or assessment, designed to maximize the effectiveness and quality of the test
Optimal Level of Item Difficulty (Optimal Difficulty)
refers to the ideal level of difficulty for the questions on a test or assessment, designed to maximize the effectiveness and quality of the test
Optimal Level of Item Difficulty (Optimal Difficulty)
It is usually the midpoint between 1.00 and the chance of success proportion, defined by the probability of answering correctly by random guessing
Chance Score
the proportion of correct answers expected if test-takers were simply guessing.
0.25
For a 4-option multiple-choice question, the chance score is 1/4 or ___
0.20
For a 5-option question, it's 1/5 or ___
1.0 (or 100% correct)
Perfect Score
ideal difficulty level (multiple choice)
item is slightly higher than midway between the chance score and 1.0
4-option multiple-choice
the midpoint between the chance score (0.25) and 1.0 is 0.625. A slightly higher value, around 0.74, is often cited as an optimal difficulty level for a four-response multiple-choice item
5-option multiple-choice
the midpoint between the chance score (0.20) and 1.0 is 0.60. A difficulty level around 0.70 is often considered optimal
Item-Reliability Index
indication of the internal consistency of a test
Item-Reliability Index
The higher the index, the greater the test’s internal consistency
Internal Consistency - May have limited applicability
Item Reliability Index - 0.69 & below
Internal Consistency - Adequate
Item Reliability Index - 0.70 - 0.79
Internal Consistency - Good
Item Reliability Index - 0.80 - 0.89
Internal Consistency - Excellent
Item Reliability Index - 0.90 & above
Item-Validity Index
designed to provide an indication of the degree to which a test is measuring what it purports to measure