RELIABILITY

RELIABILITY

  • Consistency of scores obtained by the same persons on

    • different occasions, or with

    • different sets of equivalent items, or

    • under other variable-examining conditions.

Essentially, any condition irrelevant to the test's purpose represents error variance.

  • Error Variance

    • Variability of scores that is caused by other variables other than the independent variable

    • e.g., extraneous variables

How do we control error variance?

  • Uniformity in test procedures and test

Since all types of reliability are concerned with the degree of consistency or agreement between two independently derived sets of scores, they can all be expressed in terms of a correlation coefficient.

TYPES OF RELIABILITY

Test-Retest Reliability

  • finding the reliability of test scores is by repeating the identical test on a second occasion.

  • shows the extent to which scores on a test can be generalized over different occasions

    • the higher the reliability, the more standardized the test

  • reliability coefficient (rtt)

    • the correlation between the scores obtained by the same person in the two administrations of the test.

Error variance corresponds to the random fluctuations in performance

What considerations should guide the choice of

interval?

  • Short-range

    • random fluctuations during intervals are generally included under the error variance of the test score.

[In young children]

  • the period should be even shorter than for older persons

    • Due to children’s progressive developmental changes

For any type of person, the interval between retests should rarely exceed six months.

[Criticisms of Test-Retest Reliability]

  • Practice effect

    • produce varying amounts of improvement in the retest scores

    • test takers may recall many of their former responses.

    • correlation between them will be spuriously high.

Alternate-Form Reliability

  • Using or making an alternate form of the original test

  • Give the same person one form on the first occasion and another, equivalent form on the second.

[reliability coefficient]

  • correlation between the scores obtained on the two forms

  • a measure of both temporal stability and consistency of response to different item samples (or test forms).

[length of the interval]

  • If the two forms are administered in immediate succession, the resulting correlation shows reliability across forms only, not across occasions.

    • To measure temporal stability, consider the right interval

The error variance represents a fluctuation in performance from one set of items to another, but not fluctuations over time.

[Development of Alternate Forms]

  • ensure that they are truly parallel

    • Participants should understand the alternative form in the same way as the original form

  • Parallel forms of tests should be independently constructed and designed to meet the same specifications.

  • Same number of items

  • Items should be in the same form

  • Same type of content

  • Same range and level of difficulty of the items

[Limitations of Alternate Forms]

  1. large practice effect

    • the use of alternate forms will reduce but not eliminate such an effect.

  2. Changing the specific content of the items in the second form would not suffice to eradicate this carry-over from the first form.

  3. practical difficulties of constructing truly equivalent forms.

Split-Half Reliability

  • two scores are obtained for each person by dividing the test into equivalent halves.

  • Less difficult

how to split the test to obtain the most nearly equivalent halves?

[Procedure: Odd-Even Split]

  • find the scores on the odd and even items of the test

[Precaution: Odd-Even Split]

  • a whole group of items should be assigned intact to one or the other half

Scorer Reliability (or Inter-Rater Reliability)

  • a kind of internal consistency reliability

  • mainly used where subjective or inconsistent assessments are made

  • Individuals are assessed independently by observers or interviewers who make use of rating scales agreed upon beforehand.

Kuder-Richardson Reliability (K-R 20) and

Coefficient Alpha

  • The two most frequently used formulas used to calculate inter-item consistency.

  • both K-R 20 and coefficient alpha produce estimates of reliability that are equivalent to the average of all the possible split-half coefficients that would result from all the possible different ways of splitting the test in half.

[Formulas for Calculation Internal Consistency]

[Applications]

  • K-R 20

    • Applied to dichotomous tests

    • Tests whose items are according to other all-or-none systems.

  • Coefficient Alpha

    • Tests whose items have multiple-scored items

OVERVIEW

Any reliability coefficient may be interpreted directly in terms of the percentage of score variance attributable to different sources.

  • A reliability coefficient of .85 signifies that 85% of the variance in the test scores depends on true variance in the trait measured, and 15% depends on error variance.

    • Should have a 0.7 reliability

Techniques for Measuring Reliability, in Relation to Test Form and Testing Session

Testing Sessions

Required

Test Forms Required

One

Two

One

  • Split-Half

  • Kuder-Richardson and Coefficient Alpha

Alternate-Form

(Immediate)

Two

Test-Retest

Alternate-Form

(Delayed)

Sources of Error Variance in Relation to Reliability Coefficients

Type of Reliability Coefficient

Error Variance

Test-Retest

Time Sampling

Alternate-Form (Immediate)

Content Sampling

Alternate-Form (Delayed)

Time Sampling and Content

Sampling

Split-Half

Content Sampling

Kuder-Richardson and

Coefficient Alpha

Content Sampling and Content Heterogeneity

Scorer

Interscorer Differences