PSYC 3377 CHAPTER 3 RELIABILITY

Chapter 3: Getting it Right Every Time: Reliability and Its Importance

  • Lectures are being audio recorded.
  • Content referenced from Salkind, "Tests and Measurement, 3e," SAGE Publishing (2018).

Reliability

Definition

  • Reliability refers to the consistency of a test or measurement tool.

Key Points

  • Reliability is all about consistency and can be evaluated in several ways.
  • Types of reliability include:
    • Consistency of scores
    • Consistency among raters
    • Consistency across time

Test Scores: Components of Reliability

Components of a Test Score

  1. Observed Score (Os): The actual score obtained on the test.

  2. True Score (Ts): The true reflection of what the test taker knows.

  3. Error Score (Es): Represents the discrepancies between the True Score and the Observed Score.

    Os=Ts+EsOs = Ts + Es

Importance of Error in Measurement

  • The reliability of a test hinges on how accurately the True Score is measured.
  • Key Insights:
    • Observed score equals True score only if there is no error in measurement.
    • Error increases lead to decreased reliability; conversely, decreased error leads to increased reliability.

Sources of Error in Reliability

Types of Errors

  1. Trait Error: Errors related to the individual test taker, such as:
    • Lack of preparation
    • Distractions during the test.
  2. Method Error: Errors associated with the testing environment, such as:
    • Poor instructions
    • Room temperature issues.

Impact of Error on Reliability

  • Reliability can be conceptually illustrated using:
    • ext{Reliability} = rac{ ext{True Score}}{ ext{True Score + Error Score}}
  • As the error value decreases, the reliability value increases.
  • Perfect reliability is achieved without any error.

The Reliability Coefficient

Definition and Range

  • Reliability Coefficient: A correlation coefficient used to quantify the reliability of a test.
  • Ranges from 0.000.00 to 1.001.00.
  • Higher numbers indicate more reliable scores.

Types of Reliability

Overview

  • Reliability can be computed in various ways, commonly including:
    • Test-retest reliability: Assesses consistency over time.
    • Parallel forms reliability: Examines equivalency between different forms of the same test.
    • Internal consistency reliability: Evaluates if test items measure a single construct.
    • Interrater reliability: Determines if different raters give consistent ratings.

Test-Retest Reliability

Definition

  • Used to evaluate the reliability of a test over time.

Calculation

  • Correlates scores from the same test administered at two different times.
  • Example: For the Mastering Vocational Education Test (MVET), suppose results yield a correlation of r12=.89r_{1*2} = .89.

Problems

  • Potential issues include:
    • Changes in participants over time
    • Recall or practice effects.

Parallel Forms Reliability

Definition

  • Measures the equivalence of two different forms of the same test.

Calculation

  • Correlates scores from two different forms.
  • Example: For the Remembering Everything Test (RET), scores could yield a correlation of rAB=.12r_{A*B} = .12.

Internal Consistency Reliability

Definition

  • Determines if test items consistently represent one construct across the test.

Example of Application

  • For the Attitude Toward Health Care Test (ATHCT) with 20 items rated on a 5-point scale.
  • Sample items include:
    • “I like my HMO.”
    • “I don’t like anything other than private health insurance.”

Methods for Establishing Internal Consistency

  1. Split-Half Reliability: Compare two halves of the test using Spearman-Brown correction.
  2. Cronbach’s Alpha (α): Measures internal consistency across all items; takes the average of all possible split-half correlations.
  3. Kuder-Richardson 20 (KR20): Used for tests with binary score items (correct/incorrect).

Split-Half Reliability

Implementation

  • Split the test in half and correlate scores from each half.
  • Use odd/even item selection for reliability coefficient calculation, for example, r=.2428r = .2428.

Correction for Split-Half Coefficient

  • The Spearman-Brown formula is used to adjust the split-half reliability coefficient:
    r_{SB} = rac{2r}{1+r}
  • Example:
    If r=.73r = .73, then:
    r_{SB} = rac{2 imes .73}{1 + .73} = .84

Considerations

  • Potential issues with splitting tests include:
    • The resultant halves may not be equally reliable or representative.

Cronbach's Alpha (α)

Definition

  • A common method for assessing internal consistency.

Calculation Example

  • Given item variances and total score calculations:
    • Compute the mean for all possible split-half correlations corrected by Spearman-Brown.
    • If total variance S2=6.4S² = 6.4, results might yield (1.11.76)/1.11=.24(1.11 - .76) / 1.11 = .24.

Kuder-Richardson 20 (KR20)

Definition

  • A measure of internal consistency for tests scoring items as correct/incorrect.

Calculation Example

  • To compute KR20, a formula is applied, and results are derived from the percentage of correct and incorrect responses leading to calculations like:
    KR20 = rac{5 imes (5-1)}{1.11} .

Interrater Reliability

Definition

  • Assesses the agreement between different raters on a judgment.

Calculation Example

  • If two raters evaluate the same performance, interrater reliability could be calculated as:
    ext{Interrater reliability} = rac{10}{12} = .833 .

Interpreting Reliability Coefficients

Interpretation Guidelines

  • Ensure reliability coefficients are positive and close to 1.0.
  • A coefficient of 0.700.70 or above is acceptable; preferably 0.800.80 or greater.

Specific Examples

  1. Test-Retest Reliability: An example correlation of r12=.89r_{1*2} = .89 suggests reasonable consistency over time.
  2. Parallel Forms Reliability: An example correlation of rAB=.12r_{A*B} = .12 indicates low consistency across different test forms.
  3. Internal Consistency: An example of α=.24α = .24 raises validity concerns about item measurement.

Common Considerations and Final Thoughts

Reliability Assessment

  • Be cautious when reliability is not mentioned in research.
  • It may reflect common knowledge or indicate poor test design.

Standard Error of Measurement (SEM)

  • Defined as the expected variability in an individual’s true score.
  • Higher reliability correlates with lower SEM.

Improving Reliability

  • Key strategies to enhance reliability include:
    • Standardizing instructions
    • Increasing item quantity
    • Deleting unclear items
    • Adjusting difficulty levels
    • Minimizing the impact of external events.

Importance of Reliability

  • Establishing reliability is crucial for any measurement instrument; without reliability, conclusions drawn from the data are suspect.
  • Reliable instruments are essential for conducting quality research and making sound empirical determinations about the relationships between variables Y and X in scientific inquiries.