1/39
Flashcards on Reliability
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Reliability Coefficient
An index of reliability, a proportion that indicates the ratio between the true score variance and the total variance. Estimates in the range of 0.70 to 0.80 are good for basic research; .90 might not be enough in clinical settings; reliability greater than .95 should be attempted to find.
Classical Test Theory
Assumes that a testtaker's score reflects their true score and error. It also assumes the true score will not change with repeated applications of the same test; error is a component of the observed test score unrelated to the testtaker’s ability.
Formula for Classical Test Theory
X = T + E, where X is the observed score, T is the true score, and E is the error.
Variance
A statistic useful in describing sources of test score variability. It can be broken into true variance (from true differences) and error variance (from irrelevant, random sources).
True Variance
Variance from true differences.
Error Variance
Variance from irrelevant and random sources.
Reliability
Proportion of total variance attributed to true variance. The greater the proportion of the total variance attributed to true variance, the more reliable the test.
Random Error
A source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process.
Systematic Error
A source of error in measuring a variable that is typically constant or proportionate to the true value of the variable being measured. It DOES NOT affect score consistency.
Item Sampling
Refers to variation among test items within a test as well as to variation among items between tests.
Test-Retest Reliability
Obtained by correlating pairs of scores from the same people on two different administrations of the same test. Appropriate for stable traits.
Coefficient of Stability
Used when the interval between testing in test-retest reliability is greater than six months.
Carryover Effect
Occurs when the first testing session influences scores from the second session, potentially overestimating the true reliability. These are only of concern only when changes over time are random.
Practice Effects
Some skills improve with practice. When a test is given a second time, testtakers score better because they have sharpened their abilities by having taken the test the first time
Parallel Forms
For each form of the test, the means and the variances of observed test scores are equal.
Alternate Forms
Different versions of a test that have been constructed so as to be parallel although they do not meet the requirements for the legitimate designation “parallel,
Coefficient of Equivalence
Degree of relationship between various forms of a test can be evaluated by means of an alternate-forms or parallel-forms coefficient of reliability
Split-Half Reliability
Obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once.
Spearman-Brown Formula
Used to adjust the half-test reliability in split-half reliability estimates.
Inter-Item Consistency
Degree of correlation among all the items on a scale, calculated from a single administration of a single form of a test.
Homogeneity
Describes the degree to which a test measures a single factor or trait.
Heterogeneity
Describes the degree to which a test measures different factors or is composed of items that measure more than one trait.
Kuder-Richardson Formula 20 (KR-20)
Statistic of choice for determining the inter-item consistency of dichotomous items (those that can be scored right or wrong).
Cronbach’s Coefficient Alpha
Mean of all possible test-retest, split-half coefficients; appropriate for use on tests containing nondichotomous items.
Average Proportional Distance (APD)
A measure used to evaluate the internal consistency of a test that focuses on the degree of difference that exists between item scores.
Inter-Scorer Reliability
Degree of agreement or consistency between two or more scorers (or raters) with regard to a particular measure.
Kappa Statistic
Best method for assessing level of agreement among several observers, used in inter-scorer reliability.
Transient Error
Source of error attributable to variations in the testtaker’s feelings, moods, or mental state over time.
Dynamic Characteristic
Trait, state, or ability is presumed to be ever-changing
Static Characteristic
Trait, state, or ability is relatively unchanging
Power Test
A test where the time limit is long enough to allow testtakers to attempt all items and if some of the items are so difficult that no testtaker is able to obtain a perfect score
Speed Test
A test that generally contains items of uniform level of difficulty (typically low) so that, when given generous time limits, all testtakers should be able to complete all the test items correctly
Attenuation
Potential correlations are attenuated, or diminished, by measurement error
Domain sampling theory
Seeks to estimate the extent to which specific sources of variation under defined conditions are contributing to the test score
Generalizability Theory
Based on the idea that a person’s test scores vary from testing to testing because of variables in the testing situation
Item Response Theory (IRT)
Provide a way to model the probability that a person with X ability will be able to perform at a level of Y
Dichotomous test items
Test items that can be answered with only one of two alternative responses
Polytomous test items
Test items with three or more alternative responses, where only one is scored correct or scored as being consistent with a targeted trait or other construct
The Standard Error of Measurement (SEM)
Provides a measure of the precision of an observed test score
Standard Error of Difference
Statistical measure that can aid a test user in determining how large a difference should be before it is considered statistically significant