Chapter 5 - Reliability

5.0(1)
studied byStudied by 16 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/59

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

60 Terms

1
New cards

reliability

refers to consistency in measurement

2
New cards

reliability coefficient

is an index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance

3
New cards

error

refers to the component of the observed test score that does not have to do with the testtaker’s ability

4
New cards

variance

a statistic useful in describing sources of test score variability;

the standard deviation squared

5
New cards

true variance

variance from true differences

6
New cards

error variance

variance from irrelevant, random sources

7
New cards

reliability

refers to the proportion of the total variance attributed to true variance

8
New cards

measurement error

refers to, collectively, all the factors associated with the process of measuring some variable, other than the variable being measured;

ca be categorized as being either systematic or random

9
New cards

random error

a source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process

10
New cards

systematic error

a source of error in measuring a variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured

11
New cards

item sampling or content sampling

refer to the variation among items within a test as well as to variation among items between tests

12
New cards

(1) test construction (2) administration (3) scoring (4) interpretation (5) sampling error (6) methodological error

what are the different sources of error variance?

13
New cards

test-retest reliability

an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test

14
New cards

coefficient of stability

referred to as the estimate of test-retest reliability when the interval between testing is greater than six months

15
New cards

coefficient of equivalence

termed as the degree of the relationship between various forms of a test can be evaluated by means of an alternate-forms or parallel-forms coefficient of reliability

16
New cards

parallel forms

this exist when the means and the variances of observed test scores are equal

17
New cards

parallel forms reliability

refers to an estimate of the extent to which items sampling and other errors have affected test scores on versions of the same test when the means and variances of observed test scores are equal

18
New cards

alternate forms

are simply different versions of a test that have been constructed so as to be parallel;

typically designed to be equivalent with respect to variables such as content and level of difficulty

19
New cards

alternate forms reliability

refers to an estimate of the extent to which these different forms of the same test have been affected by items sampling error or other error

20
New cards

internal consistency estimate of reliability or estimate of inter-item consistency

evaluation of the internal consistency of the test items

21
New cards

split-half reliability

is obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once;

it is a useful measure of reliability when it is impractical or undesirable

to assess reliability with two tests or to administer a test twice

22
New cards

odd-even reliability

a method of splitting a test by assigning odd-numbered items to one half of the test and even-numbered items to the other half

23
New cards

Spearman-Brown formula

allows a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test;

a specific application of a more general formula to estimate the reliability of a test

24
New cards

inter-item consistency

refers to the degree of correlation among all the items on a scale

25
New cards

homogeneity

an index of inter-item consistency is useful in assessing the _________ of the test

26
New cards

homogeneity

is the degree to which a test measures a single factor;

is the extent to which items in a scale are unifactorial

27
New cards

heterogeneity

describes the degree to which a test measures different factors

28
New cards

Kuder-Richardson formula 20

where test items are highly homogeneous;

the statistic of choice for determining the inter-item consistency of dichotomous items, primarily those items that can be scored right or wrong

29
New cards

coefficient alpha

developed by Cronbach;

the mean of all possible split-half correlations, corrected by Spearman-Brown formula

30
New cards

average proportional distance method

a measure used to evaluate the internal consistency of a test that focuses on the degree of difference that exists between item scores

31
New cards

inter-scorer reliability

is the degree of agreement or consistency between two or more scorers with regard to a particular measure

32
New cards

coefficient of inter-scorer reliability

the simplest way of determining the degree of consistency among scorers in the scoring of a test is to calculate a coefficient of correlation, which is the __________

33
New cards

dynamic characteristic

is a state, trait, or ability presumed to be ever-changing as a function of situational and cognitive experiences

34
New cards

static characteristic

trait, state, or ability presumed to be relatively unchanging such as intelligence

35
New cards

restriction of range or restriction of variance

an important issue in using and interpreting a coefficient of reliability

36
New cards

inflation of range or inflation of variance

what is the opposite of restriction of range or restriction of variance?

37
New cards

power test

when a time limit is long enough to allow testtakers to attempt all items, and if some tests are so difficult that no testtaker is able to obtain a perfect score

38
New cards

speed test

generally contains items of uniform level of difficulty so that, when given generous time limits, all testtakers should be able to complete all the test items correctly

39
New cards

criterion-referenced test

designed to provide an indication of where a testtaker stands with respect to some variable or criterion

40
New cards

classical test theory

referred to as the true score model of measurement

41
New cards

true score

a value that according to classical test theory genuinely reflects an individual’s ability level as measured by a particular test

42
New cards

domain sampling theory

proponents of this theory seek to estimate the extent to which specific sources of variation under defined conditions are contributing to the test score;

a test’s reliability is conceived of as an objective measure of how precisely the test score assesses the domain from which the test draws a sample

43
New cards

generalizability theory

is based on the idea that a person’s test scores vary from testing to testing because of variables in the testing situation

44
New cards

universe

the particular test situation

45
New cards

facets

the universe is described in terms of its ______, which include things like the number of items in the test, the amount of training the test scorers have had, and the purpose of the test administration

46
New cards

universe score

as Cronbach noted, it is analogous to a true score in the true score model

47
New cards

generalizability study

examines how generalizable scores from a particular test are if the test is administered in different situations

48
New cards

coefficients of generalizability

the influence of particular facets on the test score is represented by this;

these coefficients are similar to reliability coefficients in the true score model

49
New cards

decision study

involves application of information from the generalizability study;

developers examine the usefulness of test scores in helping the test user make decisions

50
New cards

latent-trait theory

Because so often the psychological or educational construct being measured is physically unobservable (stated another way, is latent) and because the construct being measured may be a trait (it could also be something else, such as an ability), a synonym for IRT in the academic literature is

51
New cards

item-response theory

provide a way to model the probability that a person with X ability will be able to perform at a level of Y

52
New cards

discrimination

signifies the degree to which an item differentiates among people with higher or lower levels of the trait, ability, or whatever it is that is being measured

53
New cards

dichotomous test items

test items or questions that can be answered with only one of two alternative responses, such as true–false, yes–no, or correct–incorrect questions

54
New cards

polytomous test items

test items or questions with three or more alternative responses, where only one is scored correct or scored as being consistent with a targeted trait or other construct

55
New cards

Rasch model

is a reference to an IRT model with very specific assumptions about the underlying distribution;

each item on the test is assumed to have an equivalent relationship with the construct being measured by the test

56
New cards

Georg Rasch

a Danish mathematician who developed the Rasch model

57
New cards

standard error of measurement

is the tool used to estimate or infer the extent to which an observed score deviates from a true score;

58
New cards

standard error of a score

the standard deviation of a theoretically normal distribution of test scores obtained by one person on equivalent tests

59
New cards

confidence interval

a range or band of test scores that is likely to contain the true score

60
New cards

standard error of the difference

comparisons between scores;

a statistical measure that can aid a test user in determining how large a difference should be before it is considered statistically significant