Psychological Assessment: Reliability, Validity, and Utility

Reliability

  • Definition: Reliability refers to the extent to which measurements in research are consistent and repeatable.

    • Key Quotes:
    • (Nunnally): Research requires dependable measurement.
    • (Gay): Any random influences causing measurement variation are sources of measurement error.
  • Random Errors vs Systematic Errors:

    • Random Errors: Affect reliability.
    • Systematic Errors: Affect validity.
  • Types of Reliability:

    • Test-Retest Reliability:
    • Indicates consistency of scores over time.
    • Issues: Memory effects, maturation, and learning can influence scores across sessions.
    • Equivalent-Forms Reliability:
    • Compares scores from two different tests that measure the same construct.
    • The two tests must be constructed carefully to ensure they are equivalent.
    • The coefficient obtained is the coefficient of stability or equivalence.
    • Split-Half Reliability:
    • Assesses reliability within a single test administration, particularly for long tests.
    • The test is split into two halves (often odd-even) to gauge consistency.
    • Requires a correction formula, usually the Spearman-Brown prophecy formula.
    • Rationale Equivalence Reliability:
    • Estimates internal consistency by looking at item interrelationships rather than correlations.
    • Internal Consistency Reliability:
    • Measures how all items on a test correlate with one another.
    • Kuder-Richardson: An estimate similar to average split-half reliability for tests without correct answers.
    • Cronbach's Alpha: Widely used method for estimating reliability of continuous or Likert-scale responses.
    • Standard Error of Measurement:
    • Expresses reliability in terms of how often errors of a certain size occur.

Validity

  • Definition: Validity refers to the extent to which a test measures what it is intended to measure.
    • Validity assessments depend on context: the test form, purpose, and target population.
  • Types of Validity:
    • Content Validity:
    • Ensures all content areas of a construct are represented in a test.
    • Not empirically derived; assessed logically.
    • Example: A geography test with most questions focusing on New England is not content-valid for American geography.
    • Face Validity:
    • Refers to how the test appears to measure what it claims.
    • Criterion-Oriented/ Predictive Validity:
    • Evaluates how well current test scores predict future performance.
    • Correlates current test scores with future criteria.
    • Concurrent Validity:
    • Measures how test scores correspond to another established test administered simultaneously.
    • Example: Assessing a new test against an older, established one at the same time.
    • Construct Validity:
    • Evaluates the degree a test measures a theoretical construct.
    • Often involves experiments to correlate test scores with behaviors linked to the construct.
    • Example: Validating an anxiety measure where anxiety is expected to increase under stress conditions.

Utility

  • Definition: In psychometrics, utility refers to the useful practical value of a test in decision-making.
  • Factors Contributing to Utility:
    • Cost efficiency.
    • Time savings.
    • Comparative utility: A measure of one test’s usefulness relative to another.
    • Clinical utility: Utility for diagnostic assessment/treatment purposes.
    • Diagnostic utility: Classification effectiveness relative to other tests.
  • Psychometric Soundness:
    • A test is considered psychometrically sound if its reliability and validity coefficients are acceptably high.
    • Utility indices reflect how well test scores facilitate better decision-making, particularly regarding cost-effectiveness in outcomes.