PSYASS 2 - 8 Test Development

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/71

flashcard set

Earn XP

Description and Tags

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

72 Terms

1
New cards

Test development

An umbrella term for all that goes into the process of creating a test.

2
New cards

Test conceptualization

The thought that “there ought to be a test for…” is impetus to developing a new test.

3
New cards

Norm-referenced

Generally, a good item on a __________________ achievement test is an item for which high scorers on the test respond correctly and low scorers respond incorrectly.

4
New cards

Criterion-oriented test

Ideally, each item on a ___________________ addresses the issue of whether the respondent has met certain criteria.

5
New cards

Pilot work

Refers to the preliminary research surrounding the creation of a prototype of the test.

6
New cards

Scaling

The process of setting rules for assigning numbers in measurement.

7
New cards

Scales

Are instruments to measure some trait, state, or ability and may be categorized in many ways.

8
New cards

Rating scale

Defined as a grouping of words, statements, or symbols on which judgments of the strength of a particular trait, attitude, or emotion are indicated by the test taker.

9
New cards

Summative scale

The final test score is obtained by summing the ratings across all the items.

10
New cards

Likert Scale

Each item presents the test taker with five alternative responses (sometimes seven), usually on an agree–disagree or approve–disapprove continuum.

11
New cards

Method of Paired Comparisons

Test takers must choose between two alternatives according to some rules.

12
New cards

Comparative scaling

This entails judgments of a stimulus in comparison with every other stimulus on the scale.

13
New cards

Categorical scaling

Stimuli are placed into one of two or more alternative categories that differ quantitatively concerning some continuum.

14
New cards

Guttman scale

Items range sequentially from weaker to stronger expressions of the attitude, belief, or feeling being measured.

15
New cards

Item pool

The reservoir or well from which items will or will not be drawn for the final version of the test.

16
New cards

Item format

Variables such as the form, plan, structure, arrangement, and layout of individual test items.

17
New cards

Selected-response Item Format

Require test takers to select a response from a set of alternative responses.

18
New cards

Multiple-choice Format

Has three elements:

  • A stem

  • A correct alternative option

  • A several incorrect alternatives or options variously referred to as distractors or foils.

19
New cards

Binary-choice Item

Contains only two possible responses.

20
New cards

Matching Item

The testtaker is presented with two columns: premises on the left and responses on the right.

21
New cards

True-False Item

This type of selected-response item usually takes the form of a sentence that requires the testtaker to indicate whether the statement is or is not a fact.

22
New cards

Constructed-response Item Format

Require testtakers to supply or to create the correct answer, not merely to select it.

23
New cards

Completion Item

Requires the examinee to provide a word or phrase that completes a sentence.

24
New cards

Short-answer Item

It is desirable for ____________________ to be written clearly enough that the testtaker can respond succinctly.

25
New cards

Essay Item

A test item that requires the testtaker to respond to a question by writing a composition, typically one that demonstrates recall of facts, understanding, analysis, and/or interpretation.

26
New cards

Item bank

A relatively large and easily accessible collection of test questions.

27
New cards

Computerized Adaptive Testing (CAT)

Refers to an interactive, computer-administered testtaking process wherein items presented to the testtaker are based in part on the testtaker’s performance on previous items.

28
New cards

Computerized Adaptive Testing (CAT)

__________________________ tends to reduce Floor Effect and Ceiling Effect.

29
New cards

Floor Effect

Refers to the diminished utility of an assessment tool for distinguishing testtakers at the low end of the ability, trait, or other attribute being measured.

30
New cards

Ceiling Effect

Refers to the diminished utility of an assessment tool for distinguishing testtakers at the high end of the ability, trait, or other attribute being measured.

31
New cards

Item Branching

The ability of the computer to tailor the content and order of presentation of test items on the basis of responses to previous items.

32
New cards

Class Scoring

Testtaker responses earn credit toward placement in a particular class or category with other testtakers whose pattern of responses is presumably similar in some way.

33
New cards

Ipsative Scoring

A typical objective in ________________ is comparing a testtaker’s score on one scale within a test to another scale within that same test.

34
New cards

Test Tryout

The test should be tried out on people who are similar in critical respects to the people for whom the test was designed.

35
New cards

5, 10

There should be no fewer than ___ subjects and preferably as many as ___ for each item on the test.

36
New cards

Phantom factors

Factors that actually are just artifacts of the small sample size.

37
New cards

Test-Difficulty Index

It is obtained by calculating the proportion of the total number of testtakers who answered the item correctly.

38
New cards

Easier

A lowercase italic “p” (p) is used to denote item difficulty, and a subscript refers to the item number (so p1 is read “item-difficulty index for item 1”).

p refers to the percent of people passing an item, the higher the p for an item, the________ the item.

39
New cards

Item-Reliability Index

Provides an indication of the internal consistency of a test; the higher this index, the greater the test’s internal consistency.

40
New cards

Factor Analysis

A statistical tool useful in determining whether items on a test appear to be measuring the same thing(s).

41
New cards

Item-Validity Index

Is a statistic designed to provide an indication of the degree to which a test is measuring what it purports to measure.

42
New cards

s1

ITEM-VALIDITY INDEX

Item score standard deviation is denoted by the symbol ___.

43
New cards

p1

ITEM-VALIDITY INDEX

Index of the item’s difficulty is denoted by the symbol ___.

44
New cards

Item-Discrimination Index

The quality of each alternative within a multiple-choice item can be readily assessed with reference to the comparative performance of upper and lower scorers.

45
New cards

Item-Characteristic Curves (ICCs)

A graphic representation of item difficulty and discrimination.

46
New cards

Greater

The steeper the slope, the __________ the item discrimination.

47
New cards

Guessing

Methods designed to detect ___________, minimize the effects of ___________, and statistically correct for ___________ have been proposed, but no such method has achieved universal acceptance.

48
New cards

Item Fairness

Refers to the degree, if any, a test item is biased.

49
New cards

Speed tests

Item analyses of tests taken under speed conditions yield misleading or uninterpretable results.

The closer an item is to the end of the test, the more difficult it may appear to be.

50
New cards

Qualitative Methods

Are techniques of data generation and analysis that rely primarily on verbal rather than mathematical or statistical procedures.

51
New cards

Qualitative Methods

involve exploration of the issues through verbal means such as interviews and group discussions conducted with test takers and other relevant parties.

52
New cards

Qualitative Item Analysis

Is a general term for various nonstatistical procedures designed to explore how individual test items work.

53
New cards

“Think Aloud” Test Administration

Cohen et al. (1988) proposed the use of ______________________ as a qualitative research tool designed to shed light on the test taker’s thought processes during the administration of a test.

54
New cards

“Think Aloud” Test Administration

On a one-to-one basis with an examiner, examinees are asked to take a test, thinking aloud as they respond to each item.

55
New cards

Expert Panel

_____________ may provide qualitative analyses of test items. They may also play a role in the development of new tools of assessment for members of underserved populations.

56
New cards

Sensitivity review

Is a study of test items, typically conducted during the test development process, in which items are examined for fairness to all prospective test takers and for the presence of offensive language, stereotypes, or situations.

57
New cards

Test Revision

When the item analysis of data derived from a test administration indicates that the test is not yet in finished form, the steps of revision, tryout, and item analysis are repeated until the test is satisfactory and standardization can occur.

58
New cards

Steps to revise an existing test

Test Conceptualization → Test Construction → Test Tryout → Item Analysis → Test Revision

59
New cards

Cross-validation

A key step in the development of all tests—brand-new or revised editions.

60
New cards

Cross-Validation

Refers to the revalidation of a test on a sample of testtakers other than those on whom test performance was originally found to be a valid predictor of some criterion.

61
New cards

Co-Validation

Test validation process conducted on two or more tests using the same sample of test takers.

62
New cards

Validity Shrinkage

Item validities inevitably become smaller when administered to a second sample.

63
New cards

Co-Norming

A process where it is used in conjunction with the creation of norms or the revision of existing norms.

64
New cards

Anchor protocol

A mechanism that ensures consistency in scoring.

65
New cards

Scoring Drift

Discrepancy between scoring in an anchor protocol and the scoring of another protocol.

66
New cards

Item Response Theory (IRT)

Could be applied in the evaluation of the utility of tests and testing programs.

67
New cards

Item-Characteristic Curves (ICCs)

Provide information about the relationship between the performance of individual items and the presumed underlying ability (or trait) level in the testtaker.

68
New cards

Item-Characteristic Curves (ICCs)

Using IRT, test developers evaluate individual item performance with reference to _______________________.

69
New cards

Use of IRT in Building and Revising Tests

  • Evaluating the properties of existing tests and guiding test revision

  • Determining measurement equivalence across testtaker populations

  • Developing item banks

70
New cards

Differential Item Functioning (DIF)

An item functions differently in one group of testtakers as compared to another group of testtakers known to have the same (or similar) level of the underlying trait.

71
New cards

DIF Analysis

Test developers scrutinize group-by-group item response curves.

72
New cards

DIF Items

Items that respondents from different groups at the same level of the underlying trait have different probabilities of endorsing as a function of their group membership.