Reliability and Validity
Note 1: The main purpose of this reading is to learn key concepts in psychometrics (i.e., metrics, or measurement, of psychological constructs). In the study of personality, or more broadly, individual differences, two key concepts within this area of psychometrics are reliability and validity.
Note 2: The authors describe a type of validity called discriminant validity. They mention that a measure “should be able to discriminate between different people being measured” (p. 3242). Just to be clear, this is not referring to discrimination (such as racial discrimination), but rather the notion that a measure of “Extraversion” should yield a high score for someone who is more extraverted, and a low score for someone who is more introverted. If this is true, then the measure is able to distinguish between these two individuals, as it should be able to if it is truly measuring extraversion.
How does the measurement of a psychological construct differ from measurement of something in the ‘physical world’?
Measurement of a psychological construct is less of a tangible entity and more of a collective of proposed criteria that can be reliable and verifiable.
What is reliability?
Reliability: how consistently or dependently does a measurement scale measure what it is supposed to be measuring
Describe/define each of the following specific types of reliability, including a description of how it can be assessed:
Test-retest reliability: a commonly used indicator of the reliability of a measurement scale where the measurement scale under development is administered on two different occasions to the same sample and the scores are compared
Internal consistency: used to assess how well the different items measure the same characteristic by separating the items within a group and comparing the data with the idea that both scores will be the same
What is validity?
Validity: whether or not it is reliably measuring what you want it to measure
Describe/define each of the following specific types of validity, including a description of how one would evaluate each component of validity:
Face validity: assessment of whether a measurement scale looks reasonable, are the items included in the scale relevant, questioned by experts/researchers
Content validity: measures whether a scale has included all the relevant and excluded irrelevant issues in terms of its content, assessed by an expert panel/compared literature/both
Concurrent validity: assesses the extent to which a measurement scale under development correlates with the “gold standard” by testing the developing scale with one that is either currently used or already created
Predictive validity: correlates the results of one scale with the results of a second scale that is administered later to measure how well the item or scale predicts expected future observations
Construct validity: when there is not a gold standard/current measure, testing to what extent the measurement scale under development correlates with the construct under investigation by defining the topic and stating/disproving a hypothesis
Convergent Validity: assesses the sensitivity of the scale, testing the hypothesis of how the scale will correlate to the measure
Divergent Validity: assesses the specificity of the scale, testing the variance in results the scale gives in terms of the measure tested
If a personality measure is reliable, does that mean it is valid too?
Yes, because if it is a reliable measure, the result will be consistent and therefore a valid measure of the result
What about if a measure is valid, does that mean it must be reliable?
No, because a measure could be valid for a certain circumstance but not applicable in other situations and therefore not a reliable measure for a specific result.
Note 1: The main purpose of this reading is to learn key concepts in psychometrics (i.e., metrics, or measurement, of psychological constructs). In the study of personality, or more broadly, individual differences, two key concepts within this area of psychometrics are reliability and validity.
Note 2: The authors describe a type of validity called discriminant validity. They mention that a measure “should be able to discriminate between different people being measured” (p. 3242). Just to be clear, this is not referring to discrimination (such as racial discrimination), but rather the notion that a measure of “Extraversion” should yield a high score for someone who is more extraverted, and a low score for someone who is more introverted. If this is true, then the measure is able to distinguish between these two individuals, as it should be able to if it is truly measuring extraversion.
How does the measurement of a psychological construct differ from measurement of something in the ‘physical world’?
Measurement of a psychological construct is less of a tangible entity and more of a collective of proposed criteria that can be reliable and verifiable.
What is reliability?
Reliability: how consistently or dependently does a measurement scale measure what it is supposed to be measuring
Describe/define each of the following specific types of reliability, including a description of how it can be assessed:
Test-retest reliability: a commonly used indicator of the reliability of a measurement scale where the measurement scale under development is administered on two different occasions to the same sample and the scores are compared
Internal consistency: used to assess how well the different items measure the same characteristic by separating the items within a group and comparing the data with the idea that both scores will be the same
What is validity?
Validity: whether or not it is reliably measuring what you want it to measure
Describe/define each of the following specific types of validity, including a description of how one would evaluate each component of validity:
Face validity: assessment of whether a measurement scale looks reasonable, are the items included in the scale relevant, questioned by experts/researchers
Content validity: measures whether a scale has included all the relevant and excluded irrelevant issues in terms of its content, assessed by an expert panel/compared literature/both
Concurrent validity: assesses the extent to which a measurement scale under development correlates with the “gold standard” by testing the developing scale with one that is either currently used or already created
Predictive validity: correlates the results of one scale with the results of a second scale that is administered later to measure how well the item or scale predicts expected future observations
Construct validity: when there is not a gold standard/current measure, testing to what extent the measurement scale under development correlates with the construct under investigation by defining the topic and stating/disproving a hypothesis
Convergent Validity: assesses the sensitivity of the scale, testing the hypothesis of how the scale will correlate to the measure
Divergent Validity: assesses the specificity of the scale, testing the variance in results the scale gives in terms of the measure tested
If a personality measure is reliable, does that mean it is valid too?
Yes, because if it is a reliable measure, the result will be consistent and therefore a valid measure of the result
What about if a measure is valid, does that mean it must be reliable?
No, because a measure could be valid for a certain circumstance but not applicable in other situations and therefore not a reliable measure for a specific result.