PSYC 3377 CHAPTER 3 RELIABILITY

Chapter 3: Getting it Right Every Time: Reliability and Its Importance

Lectures are being audio recorded.
Content referenced from Salkind, "Tests and Measurement, 3e," SAGE Publishing (2018).

Reliability

Definition

Reliability refers to the consistency of a test or measurement tool.

Key Points

Reliability is all about consistency and can be evaluated in several ways.
Types of reliability include:
- Consistency of scores
- Consistency among raters
- Consistency across time

Test Scores: Components of Reliability

Components of a Test Score

Observed Score (Os): The actual score obtained on the test.
True Score (Ts): The true reflection of what the test taker knows.
Error Score (Es): Represents the discrepancies between the True Score and the Observed Score.
$Os = Ts + Es$

Importance of Error in Measurement

The reliability of a test hinges on how accurately the True Score is measured.
Key Insights:
- Observed score equals True score only if there is no error in measurement.
- Error increases lead to decreased reliability; conversely, decreased error leads to increased reliability.

Sources of Error in Reliability

Types of Errors

Trait Error: Errors related to the individual test taker, such as:
- Lack of preparation
- Distractions during the test.
Method Error: Errors associated with the testing environment, such as:
- Poor instructions
- Room temperature issues.

Impact of Error on Reliability

Reliability can be conceptually illustrated using:
- ext{Reliability} = rac{ ext{True Score}}{ ext{True Score + Error Score}}
As the error value decreases, the reliability value increases.
Perfect reliability is achieved without any error.

The Reliability Coefficient

Definition and Range

Reliability Coefficient: A correlation coefficient used to quantify the reliability of a test.
Ranges from $0.00$ to $1.00$ .
Higher numbers indicate more reliable scores.

Types of Reliability

Overview

Reliability can be computed in various ways, commonly including:
- Test-retest reliability: Assesses consistency over time.
- Parallel forms reliability: Examines equivalency between different forms of the same test.
- Internal consistency reliability: Evaluates if test items measure a single construct.
- Interrater reliability: Determines if different raters give consistent ratings.

Test-Retest Reliability

Definition

Used to evaluate the reliability of a test over time.

Calculation

Correlates scores from the same test administered at two different times.
Example: For the Mastering Vocational Education Test (MVET), suppose results yield a correlation of $r_{1*2} = .89$ .

Problems

Potential issues include:
- Changes in participants over time
- Recall or practice effects.

Parallel Forms Reliability

Definition

Measures the equivalence of two different forms of the same test.

Calculation

Correlates scores from two different forms.
Example: For the Remembering Everything Test (RET), scores could yield a correlation of $r_{A*B} = .12$ .

Internal Consistency Reliability

Definition

Determines if test items consistently represent one construct across the test.

Example of Application

For the Attitude Toward Health Care Test (ATHCT) with 20 items rated on a 5-point scale.
Sample items include:
- “I like my HMO.”
- “I don’t like anything other than private health insurance.”

Methods for Establishing Internal Consistency

Split-Half Reliability: Compare two halves of the test using Spearman-Brown correction.
Cronbach’s Alpha (α): Measures internal consistency across all items; takes the average of all possible split-half correlations.
Kuder-Richardson 20 (KR20): Used for tests with binary score items (correct/incorrect).

Split-Half Reliability

Implementation

Split the test in half and correlate scores from each half.
Use odd/even item selection for reliability coefficient calculation, for example, $r = .2428$ .

Correction for Split-Half Coefficient

The Spearman-Brown formula is used to adjust the split-half reliability coefficient:
r_{SB} = rac{2r}{1+r}
Example:
If $r = .73$ , then:
r_{SB} = rac{2 imes .73}{1 + .73} = .84

Considerations

Potential issues with splitting tests include:
- The resultant halves may not be equally reliable or representative.

Cronbach's Alpha (α)

Definition

A common method for assessing internal consistency.

Calculation Example

Given item variances and total score calculations:
- Compute the mean for all possible split-half correlations corrected by Spearman-Brown.
- If total variance $S² = 6.4$ , results might yield $(1.11 - .76) / 1.11 = .24$ .

Kuder-Richardson 20 (KR20)

Definition

A measure of internal consistency for tests scoring items as correct/incorrect.

Calculation Example

To compute KR20, a formula is applied, and results are derived from the percentage of correct and incorrect responses leading to calculations like:
KR20 = rac{5 imes (5-1)}{1.11} .

Interrater Reliability

Definition

Assesses the agreement between different raters on a judgment.

Calculation Example

If two raters evaluate the same performance, interrater reliability could be calculated as:
ext{Interrater reliability} = rac{10}{12} = .833 .

Interpreting Reliability Coefficients

Interpretation Guidelines

Ensure reliability coefficients are positive and close to 1.0.
A coefficient of $0.70$ or above is acceptable; preferably $0.80$ or greater.

Specific Examples

Test-Retest Reliability: An example correlation of $r_{1*2} = .89$ suggests reasonable consistency over time.
Parallel Forms Reliability: An example correlation of $r_{A*B} = .12$ indicates low consistency across different test forms.
Internal Consistency: An example of $α = .24$ raises validity concerns about item measurement.

Common Considerations and Final Thoughts

Reliability Assessment

Be cautious when reliability is not mentioned in research.
It may reflect common knowledge or indicate poor test design.

Standard Error of Measurement (SEM)

Defined as the expected variability in an individual’s true score.
Higher reliability correlates with lower SEM.

Improving Reliability

Key strategies to enhance reliability include:
- Standardizing instructions
- Increasing item quantity
- Deleting unclear items
- Adjusting difficulty levels
- Minimizing the impact of external events.

Importance of Reliability

Establishing reliability is crucial for any measurement instrument; without reliability, conclusions drawn from the data are suspect.
Reliable instruments are essential for conducting quality research and making sound empirical determinations about the relationships between variables Y and X in scientific inquiries.