Slide 4-Reliability
Reliability of Measures
The Concept of Reliability
Definition: Reliability of measurement refers to the stability or consistency of the measurement outcome.
Identical Results: A measurement procedure is reliable if it yields nearly identical results when measuring the same individual under similar conditions repeatedly.
Example: If an IQ test measures a person's intelligence today and again next week under similar conditions, the scores should be nearly the same.
Sources of Measurement Inconsistency
Error: Inconsistency in measurements arises from various errors:
Observer Error: Human error by the individual conducting the measurements.
Environmental Changes: Variations in factors like time of day, temperature, or lighting can affect measurements.
Participant Changes: Variations in the participant's state (e.g., focus or attentiveness) can lead to inconsistent results, such as differences in reaction times based on hunger.
Conclusion: Every measurement involves an element of error, influencing its reliability. Higher errors correlate with lower reliability and vice versa.
Types of Reliability
Inter-rater Reliability
Test-retest Reliability
Parallel Form Reliability
Internal Consistency
Methods to Check Reliability of Tests
1. Inter-rater Reliability
Definition: Degree of agreement between two or more observers measuring the same behavior.
Example: Two psychologists observing preschool children's social behaviors and recording their findings. The consistency in their measurements is termed inter-rater reliability.
Measurement: Can be assessed by correlating the scores of both observers or calculating the percentage of agreement.
2. Test-retest Reliability
Definition: Reliability obtained by comparing scores from two successive measurements.
Procedure: The same testing method administered to the same group at two separate times; reliability is measured by the correlation of scores.
Remarks: Unreliable test-retest correlations do not necessarily invalidate the test itself.
Limitations of Test-retest Reliability
Carry Over Effect: The first session influences the second session, e.g., remembering answers from the first test.
Practice Effects: Skills improve with practice, affecting scores in subsequent tests, thus skewing reliability.
Time Interval Considerations: Choosing the appropriate time interval between tests is critical to minimize errors.
Motivation Level: Can also influence results between tests.
3. Parallel Form Reliability
Definition: Reliability measured by comparing scores from different but equivalent forms of a test.
Procedure: Two forms of a measurement instrument are used, yielding two score sets from the same participants at different times. The correlation between these scores indicates reliability.
Example of Procedure: Group A completes Test A, while Group B completes Test B, then their performances are compared.
4. Limitations of Parallel Form Reliability
Influence of External Factors: Factors like motivation and fatigue can alter test consistency.
Resource Intensive: Developing equivalent test forms is both time-consuming and costly.
Internal Consistency Measures
Methods:
Split Half Reliability
KR20 Formula
Cronbach Alpha
1. Split Half Reliability
Definition: Researchers split items in a test to measure consistency.
Procedure: Divide test into two halves (methods include random, odd-even, or by content/difficulty) and correlate the scores from both halves.
Single Administration: Requires only one test instance.
2. KR20 Formula and Cronbach Alpha
KR20: A measure of internal consistency for dichotomous choices, not suitable for multi-choice items.
Cronbach Alpha: Assess correlation among all items on a test to estimate internal consistency. It works for both homogeneous and heterogeneous tests.
3. How to Improve Reliability
Aim for an alpha coefficient between .70 and .90.
Increase Items: More items can enhance reliability.
Sample Size Impact: Changes in sample size affect reliability.
Factor and Item Analysis: Analyzing item scores against total scores helps identify ineffective items.
Correction for Attenuation: Acknowledges that low reliability reduces the chance for significant findings, making unreliable tests less valuable.