Lecture 8 - Psychometric Methods

Validity in Testing

  • Validity Definition

    • A validity test measures what it claims to measure, which ensures that the inferences drawn from the test results are appropriate, meaningful, and useful (AERA, APA, NCME, 2014). Validity is context-dependent and necessitates a range of evidence from multiple sources to support claims made about the test's accuracy.

  • Reliability vs Validity

    • Reliability: Refers to the consistency of the test results across different administrations. A reliable test yields similar results under consistent conditions; hence, reliable tests ensure that each individual's performance is measured consistently over time. Factors affecting reliability may include test conditions, test formats, and scoring methods.

    • Validity: Centers around whether a test accurately measures what it is intended to measure. A key point to remember is that while a test can be reliable (consistently producing the same results), it can fail to be valid if it consistently measures an incorrect construct; thus, reliability is a necessary, yet not sufficient condition for validity.

Types of Validity

  • Content Validity

    • Content validity addresses the extent to which a test represents the entirety of the construct it purports to measure. To establish strong content validity, tests must require observable behaviors and have a well-defined construct. This validity type is especially crucial for assessments of subjective constructs such as attitudes, professional skills, or knowledge in specific domains.

  • Criterion Validity

    • Criterion validity evaluates how well one measure predicts an outcome of another measure. It is subdivided into two primary types:

      • Predictive Validity: Measures how well test scores can predict future behavior or performance, often evaluated through longitudinal studies.

      • Concurrent Validity: Assesses how well test scores correlate with other measures taken at the same time, providing insight into the test’s performance against established benchmarks or criteria.

  • Construct Validity

    • Construct validity measures how well a test embodies the theoretical construct it intends to examine. This involves scrutinizing the relationships of the test with other variables, ensuring the test accurately reflects the larger domain of the theoretical framework.

Determining Validity

  • Establishing Validity: Validity is not a static entity but an ongoing process that resides on a continuum. Establishing validity requires gathering various types of evidence:

    1. Test Content (Content Validity)

    2. Response Processes: Analyzing how test-takers respond to items, which can unveil insights into the validity of their interpretations.

    3. Internal Structure (Construct Validity): Ensuring that the test items accurately reflect the underlying theoretical constructs.

    4. Relations with Other Variables (Criterion-related Validity): Examining how the test correlates with other, established measures.

    5. Consequences of Testing: Evaluating the broader implications of test results and their impact on societal or educational outcomes.

Evidence of Content Validity

  • Content validity serves as a foundation for tests focused on skills or academic knowledge. Common methods for collecting evidence include:

    1. During Test Development: Involves the careful definition of the construct and selection of representative test items.

    2. Post Hoc Validity Assessment: Employing expert evaluations after a test’s formulation to assess whether the content aligns with the construct’s definition.

Content Validity Ratio (CVR)

  • CVR is a quantitative measure used to determine the importance of each test item. The formula for calculating CVR is:

    • CVR = (ne - N/2) / (N/2)

      • ne = number of Subject Matter Experts (SMEs) who rate an item as essential

      • N = total number of SMEs providing input.

  • The CVR can range from -1 (indicating none of the experts agree on the item’s importance) to +1 (indicating complete consensus).

Criterion-Related Validity

  • Objective vs. Subjective Criteria:

    • Objective Criteria: These are measurable outcomes, such as GPA or performance ratings.

    • Subjective Criteria: Based on personal judgment, such as letters of recommendation or self-reports that may introduce biases.

Validity Coefficient

  • The validity coefficient quantifies the strength of the correlation between a test and a criterion measure, which is critical for assessing how well the test predicts actual performance or behavior. This involves performing correlation analysis, alongside significance tests which evaluate whether the observed correlation is statistically reliable and meaningful.

Hypothesis Testing Steps

  1. State the null hypothesis (H0) and an alternative hypothesis (H1).

  2. Specify sample size (N) and establish the alpha level, commonly set at .05.

  3. Conduct test statistic analysis to determine the validity of the hypotheses.

  4. Compare the obtained statistic to the critical value, thus enabling a conclusion regarding the validity.

Using Validity to Make Predictions

  • Linear Regression:

    • The formula used is: Ŷ = bX + a

      • Ŷ = predicted score

      • b = slope, representing the effect of X on Y

      • a = intercept, indicating baseline measurements when the predictor (X) is zero.

  • Multiple Regression:

    • The formula used is: Y = a + b1X1 + b2X2 + … + bmXm

    • This method assesses the impact of multiple predictor variables simultaneously, providing a comprehensive understanding of their joint effects on the criterion score and decision-making processes.