A. Psychometric Properties and Principles

0.0(0)
Studied by 0 people
call kaiCall Kai
Locked
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/112

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 5:55 PM on 6/2/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai
Chat

No analytics yet

Send a link to your students to track their progress

113 Terms

1
New cards

Reliability

the extent to which a test yields consistent results, as assessed by the consistency of scores on two halves of the test, on alternate forms of the test, or on retesting

2
New cards

reliability coefficient

an index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance

3
New cards

Perfect reliability indicating redundancy

1.0

4
New cards

≥ 0.9 if Clinical and ≥ 0.8 or ≤ 0.9 for normal use

What is a good reliability?

5
New cards

True score

An individual's actual score on a variable being measured, as opposed to the score the individual obtained on the measure itself.

6
New cards

Carryover effects

occur when participants' experience in one condition affects their behavior in another condition of a study

7
New cards

Practice effects

Improvements in performance resulting from opportunities to perform a behavior repeatedly so that baseline measures can be obtained.

8
New cards

Test Sophistication

increase of score due to the test

9
New cards

Fatigue effects

Repeated testing reduces overall mental energy or motivation to perform on a test.

10
New cards

Construct score

A person's standing on a theoretical variable independent of any particular measurement.

11
New cards

Variance

The standard deviation squared

Describing sources of test score variability

12
New cards

True variance

variance from true differences

13
New cards

Error variance

The amount of variability among the scores caused by chance or uncontrolled variables.

14
New cards

Measurement Error

an error that occurs when there is a difference between the information desired by the researcher and the information provided by the measurement process

15
New cards

Random error

an error that occurs when the selected sample is an imperfect representation of the overall population

16
New cards

Systematic error

Error that shifts all measurements in a standardized way. Decreases accuracy. Can result in bias

17
New cards

Test environment

A controlled environment established to test products, services, and other configuration items.

18
New cards

Testtaker Variables

Personal factors affecting test performance.

19
New cards

Examiner-related Variables

physical appearance, demeanor, eye contact are examples of _________

20
New cards

Test Retest Reliability

a method for determining the reliability of a test by comparing a test taker's scores on the same test taken on separate occasions

21
New cards

Coefficient of Stability

An estimate of test-retest reliability obtained during time intervals of six months or longer

22
New cards

Parallel Forms

a method of establishing the reliability of a measurement instrument by correlating scores on two different but equivalent versions of the same instrument

23
New cards

Alternate Forms

if a teacher gives out multiple forms of an exam with different questions, the overall scores should be similar for each form

24
New cards

Immediate Form

Administered at the same time.

25
New cards

Delayed Form

Interval between both administrations.

26
New cards

Split-Half Reliability

A measure of reliability in which a test is split into two parts and an individual's scores on both halves are compared.

27
New cards

Spearman Brown Formula

Used to estimate internal consistency reliability from a correlation between two halves of a test

28
New cards

Coefficient Alpha

A measure of internal-consistency reliability that is the average of all possible split-half coefficients resulting from different splittings of the scale items

For Non-dichotomous items

Answers how similar sets of data are

29
New cards

Kuder-Richardson Formula

Used to calculate interitem consistency when items are dichotomous (yes/no, true/false)

30
New cards

KR20

Dichotomous items with varying levels of difficulty

31
New cards

KR21

dichotomous items; all the test items have approximately the same degree of difficulty.

32
New cards

Average Proportional Distance (APD)

a measure used to evaluate the internal consistency of a test that focuses on the degree of difference that exists between item scores

33
New cards

interrater reliability

the amount of agreement in the observations of different raters who witness the same behavior

34
New cards

Kappa statistics

used formula for nominal data.

35
New cards

Cohen's Kappa

Used to measure the level of agreement between two raters or judges only

36
New cards

Fleiss' Kappa

Determine the level of agreement between two or more raters

37
New cards

Kendall's W

Used for ranking or ordinal data

38
New cards

Homogenous Test

A test that measures only one trait or characteristic.

39
New cards

Heterogenous Test

a test that measures more than one trait or characteristic

40
New cards

Dynamic Test

tests based on Vygotsky's theory that emphasize potential rather than past learning

41
New cards

Static Test

an individual is assessed at a given point in time, and the results of a test are used to determine what the person can and cannot do on his or her own

42
New cards

Speed Tests

large number of relatively easy items in limited test period

43
New cards

Power Tests

reflects the level of difficulty of items the test takers answer correctly

44
New cards

Criterion-Referenced Tests

Tests where the student's performance is compared to a standard or criterion. The student's score is not based on how he/she compared with other students, but rather on how the student did as measured by the criteria or standards. Criterion-referenced test will yield such scores as percentages or number of correct answers.

45
New cards

Classical Test Theory

Each testtaker has a true score on a test that would be obtained but for the action of measurement error.

46
New cards

Domain Sampling Theory

Estimate the extent to which specific sources of variation under defined conditions are contributing to the test scores.

47
New cards

Generalizability Theory

based on the idea that a person's test scores vary from testing to testing because of variables in the testing situation

48
New cards

Item Response Theory (IRT)

a mathematical approach to choosing test items in which the probability of a positive response to an item is determined by the person's estimated position on the underlying trait being measured, as well as by characteristics of the item

49
New cards

The person who has ability 1 would be able to perform the ability 2

Explain IRT

50
New cards

Latent-Trait Theory

Another name for IRT

51
New cards

Item discrimination

the degree to which a test item is able to correctly differentiate test-takers who vary according to the construct measured by the test.

52
New cards

Polytomous Item

A test item for which more than two outcomes are possible, such as "disagree," "neutral," and "agree."

53
New cards

Dichotomous Item

Binary item.

54
New cards

Confidence Interval

a range of values so defined that there is a specified probability that the value of a parameter lies within it.

likely to contain true scores

55
New cards

can aid a test user in determining how large a difference should be before it is considered statistically significant

Standard Error of the Difference

56
New cards

refers to the standard error of the difference between the predicted and observed values

Standard Error of Estimate

57
New cards

Validity

A judgment or estimate of how well a test measures what it supposed to measure

58
New cards

≥ 0.35

What Validity coefficient is valid

59
New cards

Face Validity

extent to which respondents can tell what the items are measuring

60
New cards

Content validity

The degree to which the content of a test is representative of the domain it's supposed to cover.

61
New cards

Test blueprint

A plan regarding the types of information to be covered by the items, the number of items tapping each area of coverage, the organization of the items in the test, etc.

62
New cards

Underrepresentation

failure to capture needed components

63
New cards

Overrepresentation

disproportionately higher incidence or greater presence of a characteristic than expected; may be desired to ensure inclusion of minority groups; impacts generalizability of findings as proportions do not match what would be found typically or generally

64
New cards

Construct Validity

The ability of a test to represent the underlying construct (the theory developed to organize and explain some aspects of existing knowledge and observations).

65
New cards

Irrelevant variance

Other factors influenced the construct.

66
New cards

Method of Contrasted groups

Demonstrate that scores on the test vary in a predictable way as a function of membership in a group.

67
New cards

Divergent

Constructs are not expected to correlate

68
New cards

Convergent

constructs are expected to correlate

69
New cards

Factor Analysis

Statistical tool used to analyze interrelationships among constructs

Identify the factor/s in common between test scores on sub-scales within a particular test

70
New cards

Factor loading

Conveys info about the extent to which the factor determines the test score or scores.

71
New cards

Criterion-Related Validity

Evaluates test based on an external source

72
New cards

Concurrent Validity

Extent to which test scores may be used to estimate an individual's present standing on a criterion

73
New cards

Predictive Validity

The success with which a test predicts the behavior it is designed to predict; it is assessed by computing the correlation between test scores and the criterion behavior.

74
New cards

Validity coefficient

correlation coefficient between a test score (predictor) and a performance measure (criterion)

75
New cards

Incremental validity

the degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use

76
New cards

Criterion contamination

Occurs when the criterion measure includes aspects of performance that are not part of the job or when the measure is affected by construct-irrelevant factors.

77
New cards

Leniency Error

occurs when ratings of all employees fall at the high end of the scale

78
New cards

Rating Error

Intentional or unintentional misuse of the scale.

79
New cards

Severity Error

Rater is strict in scoring.

80
New cards

Central Tendency Error

Rater's rating would tend to cluster in the middle of the rating scale.

81
New cards

Halo effect

tendency of an interviewer to allow positive characteristics of a client to influence the assessments of the client's behavior and statements

82
New cards

Normative sample

a group of individuals who were given the test to identify standards of performance at specific age levels

83
New cards

Norm

Test performance data of a particular group of test takers that are designed for use as a reference when evaluating and interpreting individual test scores

84
New cards

Norming

Deriving norms

85
New cards

Percentile Norms

Raw data from a test's standardization sample converted to percentile form.

86
New cards

percentage correct

the distribution of raw scores, the number of items that were answered correctly multiplied by 100 and divided by the total number of items

87
New cards

Developmental Norms

Developed on the basis of any trait, ability, skills, or other characteristic that is presumed to develop, deteriorate, or affect stage of life

88
New cards

Age norms

age equivalent scores; indicate the average performance of different test takers who were at various ages at the time the test was administered

89
New cards

Grade norms

Indicate the average test performance of testtakers in a given school grade

90
New cards

National Norms

Norms derived from a standardization sample that was nationally representative of the population

91
New cards

National Anchor Norms

An equivalency table for scores on two nationally standardized tests designed to measure the same thing

92
New cards

Subgroup Norms

Normative sample can be segmented by any criteria initially used in selecting subjects for the sample.

93
New cards

Local Norms

provide normative information with respect to the local population's performance on some test

94
New cards

Expectancy Data

provide an indication that a test taker will score within some interval of scores on a criterion measure - passing, acceptable, failing

95
New cards

Taylor Russel tables

Provide an estimate of the criterion based on another group different from the original group from which the test was validated.

96
New cards

Selection ratio

Numerical value that reflects the relationship between the number of people to be hired and the number of people available to be hired.

97
New cards

Base rate

Percentage of current employees who are considered successful.

98
New cards

Naylor-Shine Tables

Entails obtaining the difference between the means of the selected and unselected groups to derive an index of what the test is adding to already established procedures.

99
New cards

Brogden-Cronbach-Gleser Formula

Used to calculate the dollar amount of a utility gain resulting from the use of a particular selection instrument.

100
New cards

Utility gain

Estimate of the benefit of using a particular test.