Reliability Flashcards

0.0(0)

Studied by 0 people

View linked note

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/39

Earn XP

Description and Tags

Flashcards on Reliability

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

40 Terms

New cards

Reliability Coefficient

An index of reliability, a proportion that indicates the ratio between the true score variance and the total variance. Estimates in the range of 0.70 to 0.80 are good for basic research; .90 might not be enough in clinical settings; reliability greater than .95 should be attempted to find.

New cards

Classical Test Theory

Assumes that a testtaker's score reflects their true score and error. It also assumes the true score will not change with repeated applications of the same test; error is a component of the observed test score unrelated to the testtaker’s ability.

New cards

Formula for Classical Test Theory

X = T + E, where X is the observed score, T is the true score, and E is the error.

New cards

Variance

A statistic useful in describing sources of test score variability. It can be broken into true variance (from true differences) and error variance (from irrelevant, random sources).

New cards

True Variance

Variance from true differences.

New cards

Error Variance

Variance from irrelevant and random sources.

New cards

Reliability

Proportion of total variance attributed to true variance. The greater the proportion of the total variance attributed to true variance, the more reliable the test.

New cards

Random Error

A source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process.

New cards

Systematic Error

A source of error in measuring a variable that is typically constant or proportionate to the true value of the variable being measured. It DOES NOT affect score consistency.

New cards

Item Sampling

Refers to variation among test items within a test as well as to variation among items between tests.

New cards

Test-Retest Reliability

Obtained by correlating pairs of scores from the same people on two different administrations of the same test. Appropriate for stable traits.

New cards

Coefficient of Stability

Used when the interval between testing in test-retest reliability is greater than six months.

New cards

Carryover Effect

Occurs when the first testing session influences scores from the second session, potentially overestimating the true reliability. These are only of concern only when changes over time are random.

New cards

Practice Effects

Some skills improve with practice. When a test is given a second time, testtakers score better because they have sharpened their abilities by having taken the test the first time

New cards

Parallel Forms

For each form of the test, the means and the variances of observed test scores are equal.

New cards

Alternate Forms

Different versions of a test that have been constructed so as to be parallel although they do not meet the requirements for the legitimate designation “parallel,

New cards

Coefficient of Equivalence

Degree of relationship between various forms of a test can be evaluated by means of an alternate-forms or parallel-forms coefficient of reliability

New cards

Split-Half Reliability

Obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once.

New cards

Spearman-Brown Formula

Used to adjust the half-test reliability in split-half reliability estimates.

New cards

Inter-Item Consistency

Degree of correlation among all the items on a scale, calculated from a single administration of a single form of a test.

New cards

Homogeneity

Describes the degree to which a test measures a single factor or trait.

New cards

Heterogeneity

Describes the degree to which a test measures different factors or is composed of items that measure more than one trait.

New cards

Kuder-Richardson Formula 20 (KR-20)

Statistic of choice for determining the inter-item consistency of dichotomous items (those that can be scored right or wrong).

New cards

Cronbach’s Coefficient Alpha

Mean of all possible test-retest, split-half coefficients; appropriate for use on tests containing nondichotomous items.

New cards

Average Proportional Distance (APD)

A measure used to evaluate the internal consistency of a test that focuses on the degree of difference that exists between item scores.

New cards

Inter-Scorer Reliability

Degree of agreement or consistency between two or more scorers (or raters) with regard to a particular measure.

New cards

Kappa Statistic

Best method for assessing level of agreement among several observers, used in inter-scorer reliability.

New cards

Transient Error

Source of error attributable to variations in the testtaker’s feelings, moods, or mental state over time.

New cards

Dynamic Characteristic

Trait, state, or ability is presumed to be ever-changing

New cards

Static Characteristic

Trait, state, or ability is relatively unchanging

New cards

Power Test

A test where the time limit is long enough to allow testtakers to attempt all items and if some of the items are so difficult that no testtaker is able to obtain a perfect score

New cards

Speed Test

A test that generally contains items of uniform level of difficulty (typically low) so that, when given generous time limits, all testtakers should be able to complete all the test items correctly

New cards

Attenuation

Potential correlations are attenuated, or diminished, by measurement error

New cards

Domain sampling theory

Seeks to estimate the extent to which specific sources of variation under defined conditions are contributing to the test score

New cards

Generalizability Theory

Based on the idea that a person’s test scores vary from testing to testing because of variables in the testing situation

New cards

Item Response Theory (IRT)

Provide a way to model the probability that a person with X ability will be able to perform at a level of Y

New cards

Dichotomous test items

Test items that can be answered with only one of two alternative responses

New cards

Polytomous test items

Test items with three or more alternative responses, where only one is scored correct or scored as being consistent with a targeted trait or other construct

New cards

The Standard Error of Measurement (SEM)

Provides a measure of the precision of an observed test score

New cards

Standard Error of Difference

Statistical measure that can aid a test user in determining how large a difference should be before it is considered statistically significant