Reliability and Validity

Definition of Reliability
- Reliability refers to the degree of consistency and stability of a measuring instrument or procedure.
- It indicates the likelihood that a measurement error exists when assessing participants' scores.
Observed Score Equation
- Observed Score = True Score + Measurement Error
- Measurement error can arise from various sources.

Measurement Error Overview
- Measurement error is an inherent aspect of data collection affecting the reliability of scores.
True Score
- The true score refers to the score an instrument would yield without any measurement error.
Observed Score
- This is the score recorded during measurement, incorporating both the true score and any errors.
Measurement Instrument Example
- Example of using a bathroom scale:
  - Weighing oneself in different locations yields inconsistent results (e.g., carpet vs. concrete).
  - Changes in time of day, positioning, or disturbances can cause differing weights.
  - Factors contributing to measurement error include variations in environment and participant states.

Trait Error
- Attributes of the participant influencing results, including:
  - Cognitive impairments (e.g., vision problems) affecting task performance.
  - Language proficiency differences for non-native speakers affecting comprehension.
  - Participant's mood, attitude, or motivation toward the study impacting performance.
Method Error
- Characteristics of testing conditions influencing outcomes:
  - Environmental factors (e.g., ambient noise, lighting).
  - Poorly designed survey questions, such as double-barreled queries.
  - Researcher errors in signal interpretation or mood affecting participants' performance.

Consistency Variability
- Systematic changes in variable measurements versus unsystematic random errors:
  - Systematic: consistent and predictable shifts in outcomes.
  - Unsystematic: random, with no predictable patterns, often due to noise.
Confounding Variables
- Confounding factors can provide alternative explanations for observed outcomes, complicating interpretations of causation.

Inter-Rater Reliability (Iterator Reliability)
- This assesses the degree to which different observers give consistent ratings or judgments.
- Common in areas such as observational studies, competitions (e.g., gymnastics, diving).
- Raters should operate independently following a stringent coding scheme, with training to ensure clarity in definitions.
- Reliability can be expressed as a correlation coefficient or calculated as a percentage agreement.
- Example for calculating percentage agreement:
  - If two judges evaluate five contestants:
  - Agreement on 3 out of 5 gives a reliability of 60%.

Focuses on the consistency of responses across multiple items measuring the same construct.
Example using a self-esteem inventory highlighting how each item's responses should correlate positively.
Reliability evaluation:
- Items that correlate strongly suggest good internal consistency.
Assessment methodologies:
- Average inter-item correlations calculated to analyze reliability as a whole.

Split-Half Reliability
- Divides a measurement tool into halves to determine if they yield similar results.
- Items ranked for their ability to measure the same construct correlate well.
Test-Retest Reliability
- Evaluates whether individuals maintain similar scores over time on the same measure.
- Crucial conditions: ensure adequate time intervals to avoid recall bias and practice effects.
Reliability coefficients interpretations:
- ( \alpha ) = Cronbach's Alpha used, where:\n * 0.7 - 0.8 acceptable,
  - 0.8 - 0.9 good, and
  - above 0.9 indicates excellent reliability.

Environmental conditions (i.e., noise, settings) may skew data results and reliability ratings.
Attrition Concerns
- Participant dropout from studies jeopardizing data integrity, especially concerning differential attrition between study groups.
Addressing potential confounds and ensuring robust operational definitions aids in capturing reliable data outcomes.

Reliable measurements are critical for valid conclusions in research.
Understanding and managing measurement errors, assessing various forms of reliability, and carefully considering the environments in which measurements are taken are all pivotal in improving research outcomes and robustness in findings.