RELIABILITY
RELIABILITY
Consistency of scores obtained by the same persons on
different occasions, or with
different sets of equivalent items, or
under other variable-examining conditions.
Essentially, any condition irrelevant to the test's purpose represents error variance.
Error Variance
Variability of scores that is caused by other variables other than the independent variable
e.g., extraneous variables
How do we control error variance?
Uniformity in test procedures and test
Since all types of reliability are concerned with the degree of consistency or agreement between two independently derived sets of scores, they can all be expressed in terms of a correlation coefficient.
TYPES OF RELIABILITY
Test-Retest Reliability
finding the reliability of test scores is by repeating the identical test on a second occasion.
shows the extent to which scores on a test can be generalized over different occasions
the higher the reliability, the more standardized the test
reliability coefficient (rtt)
the correlation between the scores obtained by the same person in the two administrations of the test.
Error variance corresponds to the random fluctuations in performance
What considerations should guide the choice of
interval?
Short-range
random fluctuations during intervals are generally included under the error variance of the test score.
[In young children]
the period should be even shorter than for older persons
Due to children’s progressive developmental changes
For any type of person, the interval between retests should rarely exceed six months.
[Criticisms of Test-Retest Reliability]
Practice effect
produce varying amounts of improvement in the retest scores
test takers may recall many of their former responses.
correlation between them will be spuriously high.
Alternate-Form Reliability
Using or making an alternate form of the original test
Give the same person one form on the first occasion and another, equivalent form on the second.
[reliability coefficient]
correlation between the scores obtained on the two forms
a measure of both temporal stability and consistency of response to different item samples (or test forms).
[length of the interval]
If the two forms are administered in immediate succession, the resulting correlation shows reliability across forms only, not across occasions.
To measure temporal stability, consider the right interval
The error variance represents a fluctuation in performance from one set of items to another, but not fluctuations over time.
[Development of Alternate Forms]
ensure that they are truly parallel
Participants should understand the alternative form in the same way as the original form
Parallel forms of tests should be independently constructed and designed to meet the same specifications.
Same number of items
Items should be in the same form
Same type of content
Same range and level of difficulty of the items
[Limitations of Alternate Forms]
large practice effect
the use of alternate forms will reduce but not eliminate such an effect.
Changing the specific content of the items in the second form would not suffice to eradicate this carry-over from the first form.
practical difficulties of constructing truly equivalent forms.
Split-Half Reliability
two scores are obtained for each person by dividing the test into equivalent halves.
Less difficult
how to split the test to obtain the most nearly equivalent halves?
[Procedure: Odd-Even Split]
find the scores on the odd and even items of the test
[Precaution: Odd-Even Split]
a whole group of items should be assigned intact to one or the other half
Scorer Reliability (or Inter-Rater Reliability)
a kind of internal consistency reliability
mainly used where subjective or inconsistent assessments are made
Individuals are assessed independently by observers or interviewers who make use of rating scales agreed upon beforehand.
Kuder-Richardson Reliability (K-R 20) and
Coefficient Alpha
The two most frequently used formulas used to calculate inter-item consistency.
both K-R 20 and coefficient alpha produce estimates of reliability that are equivalent to the average of all the possible split-half coefficients that would result from all the possible different ways of splitting the test in half.
[Formulas for Calculation Internal Consistency]

[Applications]
K-R 20
Applied to dichotomous tests
Tests whose items are according to other all-or-none systems.
Coefficient Alpha
Tests whose items have multiple-scored items
OVERVIEW
Any reliability coefficient may be interpreted directly in terms of the percentage of score variance attributable to different sources.
A reliability coefficient of .85 signifies that 85% of the variance in the test scores depends on true variance in the trait measured, and 15% depends on error variance.
Should have a 0.7 reliability
Techniques for Measuring Reliability, in Relation to Test Form and Testing Session
Testing Sessions Required Test Forms Required |
One Two |
One
Alternate-Form (Immediate) |
Two Test-Retest Alternate-Form (Delayed) |
Sources of Error Variance in Relation to Reliability Coefficients
Type of Reliability Coefficient Error Variance |
Test-Retest Time Sampling |
Alternate-Form (Immediate) Content Sampling |
Alternate-Form (Delayed) Time Sampling and Content Sampling |
Split-Half Content Sampling |
Kuder-Richardson and Coefficient Alpha Content Sampling and Content Heterogeneity |
Scorer Interscorer Differences |