PSYC 3377 CHAPTER 4 VALIDITY
Chapter 4: The Truth, the Whole Truth, and Nothing but the Truth: Validity and its Importance
- All lectures are being audio recorded.
- Content from Salkind, Tests and Measurement 3e. SAGE Publishing (2018), unless noted otherwise.
Validity
- Definition of Validity: Validity refers to whether a tool measures what it claims to measure.
- General Definition from APA and National Council on Measurement in Education: Validity is the extent to which inferences made from a test are appropriate, meaningful, and useful (Salkind, Tests and Measurement 3e. SAGE Publishing, 2018).
Establishing Validity
- Establishing validity is markedly different from establishing reliability.
- Reliability can be quantified through objective measures.
- Validity cannot be quantified in the same manner.
- There is no “universally valid” test or measurement technique; validity is conditional and context-dependent (Cohen et al., 2013).
- Validity exists within reasonable boundaries of potential use.
- Validation: The process of gathering and evaluating evidence regarding the validity of a test (Cohen et al., 2013).
Focus on Types of External Evidence
- The discussion of validity will emphasize the type of external evidence used in the validation process.
- Validity can be established through several external evidence methods:
- Feedback from experts in a given field.
- Relations to other tests that have been validated.
- Correlation of test findings to other criteria.
Relationship Between Reliability and Validity
- Reliability and validity are intricately linked.
- A test cannot be considered valid until it is proven reliable.
- A test must perform consistently (reliability) to be deemed effective at what it claims to measure (validity).
Types of Validity
- Question Prompt: What types of validity are you familiar with? (Salkind, Tests and Measurement 3e. SAGE Publishing, 2018).
- The major types of validity are:
- Face Validity
- Content Validity
- Criterion Validity
- Predictive Validity
- Concurrent Validity
- Construct Validity
Face Validity
- Definition: Face validity refers to the degree to which a test appears to cover the relevant content, as evaluated by experts.
- It is an informal measure based on the general impression that the test adequately addresses what it aims to assess.
- There are no application criteria from external measures.
- Implication of Poor Face Validity: If the test is perceived as irrelevant by stakeholders (test-takers, legislators, parents), negative consequences may arise, such as poor test-taker attitudes or even legal challenges (Cohen et al., 2013).
- Face validity is sometimes incorrectly used interchangeably with content validity; however, they are distinct concepts (as detailed further).
Content Validity
- Definition: Content validity measures how well a sample of test items represents the entire universe of items related to a specific topic.
- This type of validity is most commonly used for achievement tests.
- Assessment: Requires a thorough examination of the content to ensure an accurate sample is being tested.
- Content Validity Ratio (CVR): A method to support content validity by providing numerical values that reflect the relevance of test items.
Content Validity Ratio (CVR) Computation
- Judges/Raters: Each item is categorized as “essential,” “useful but not essential,” or “not necessary.”
- CVR Formula:
CVR = \frac{(E - N/2)}{(N/2)}
where:
- E = number of judges indicating that the question/item is essential
- N = total number of judges.
Examples of CVR Computation
- If 10 judges indicate 5 as essential:
CVR = \frac{(5 - 10/2)}{(10/2)} = 0 - Negative CVR: Occurs when fewer than half the judges indicate an item as essential.
- Example: 4 out of 10 judges indicate essential:
CVR = \frac{(4 - 10/2)}{(10/2)} = -0.2
- Positive CVR: Occurs when more than half indicate the item as essential:
- Example: 9 out of 10 judges indicate essential:
CVR = \frac{(9 - 10/2)}{(10/2)} = 0.80
- As defined by C. H. Lawshe, items have some content validity if more than half of judges indicate the item as essential (Cohen, Swerdlik, & Sturman, 2013).
Criterion Validity
- Definition: Criterion validity assesses the extent to which test results correlate with a relevant criterion reflecting abilities in current or future settings.
- It is often used in achievement tests and for certification or licensing assessments.
- Types of criterion validity include:
- Concurrent Validity: Evaluates how well test scores correlate with outcomes measured by another similar test in the present.
- Predictive Validity: Assesses how well a test predicts abilities or outcomes in the future.
Concurrent Validity
- Example: The DOC scale measures specific doctor skills, with judges employing a criterion based on rankings to validate the test scores.
- A high correlation (validity coefficient) with judges’ rankings validates the test (Salkind, Tests and Measurement 3e. SAGE Publishing, 2018).
- Study Example: Ambrosini et al. (1991) studied the concurrent validity of the Beck Depression Inventory (BDI) for adolescents against other measures validated for that age group. It effectively distinguished between depressed and non-depressed adolescents (Cohen, Swerdlik, & Sturman, 2013).
Predictive Validity
- Definition: Predictive validity evaluates how well a test reflects a set of abilities or outcomes that are relevant in the future.
- Common in entrance exams and employment tests.
- Example: Using the DOC scale to assess if students succeed in practices ten years later by correlating past scores with the criterion (successful practice = 1, non-practicing = 0). A high correlation indicates predictive validity (Salkind, Tests and Measurement 3e. SAGE Publishing, 2018).
Criteria Quality and Establishing Validity
- Establishing criterion validity hinges on the quality of the chosen criterion:
- Criteria must accurately reflect the necessary set of skills.
- Relevant literature on traits, abilities, or performance is crucial.
- The reliability of the criterion is paramount, and it should not duplicate the test being validated (this is known as criterion contamination).
Construct Validity
- Definition: Construct validity evaluates whether a test accurately measures an underlying psychological construct.
- Constructs consist of interrelated variables often derived from theoretical perspectives.
- For example, aggression may be defined through various related behaviors like violence or poor social skills.
- It is considered one of the most challenging types of validity to establish due to the complexity in defining constructs precisely.
Establishing Construct Validity - Example (FIGHT Scale)
- The FIGHT scale is a self-report tool with items developed from the literature regarding aggressive behaviors.
- The test should correlate positively with anticipated related behaviors (e.g., criminal activity) and negatively with unrelated behaviors (e.g., educational achievements), thus supporting its validity (Salkind, Tests and Measurement 3e. SAGE Publishing, 2018).
Summary of Different Types of Validity
| Type of Validity | When You Use It | How You Use It | Example |
|---|
| Content Validity | When wanting to affirm a universe of items reflects the topic under examination. | Closely examine content to ensure an accurate sample for testing. | "My weekly quiz in my stat class adequately represents the chapter's content." |
| Criterion Validity | When correlating test scores to other measures affirming competency in a domain. | Correlate test scores with measures that are validated and assess the same abilities. | "The EATS test of culinary skills correlates with success in culinary school." |
| Construct Validity | When determining if a test measures an underlying psychological construct. | Correlate test scores with outcomes that reflect the theoretical construct. | "Men involved in contact sports score higher on the TEST(osterone) test of aggression." |
Multitrait-Multimethod Matrix
- An approach to establish construct validity, which involves measuring multiple traits through various methods while examining relationships that align with theoretical expectations.
- The roundabout process is resource-intensive and is a high standard in the field (Salkind, Tests and Measurement 3e. SAGE Publishing, 2018).
Example of Multitrait-Multimethod Matrix (FIGHT Scale)
- Three measurement methods employed:
- Observational tool
- Self-report via FIGHT scale
- Teacher evaluations
- Three traits being assessed:
- Aggression
- Intelligence
- Emotional stability
- Correlations among all traits and methods will be evaluated to provide construct validity.
Validity Coefficients Interpretation
- Lows: Different methods measuring different traits illustrate discriminant validity.
- Moderates: Same method assessing different traits.
- Highs: Different methods measuring the same trait show convergent validity (validity coefficients may suggest high correlations).
- Very Highs: Correlations reflect the same method measuring the same trait.
Validity Assurance and Adjustments
- If validity issues arise:
- For Content Validity: Revise questions to align with expert consistency needs.
- For Criterion Validity: Reassess the nature of test items and their relation to the selected criterion, evaluating both relevance and usefulness.
- For Construct Validity: Analyze the foundational theoretical rationale that guides the test.
Conclusion: Designing Instruments
- A cautionary note for students: Designing measurement instruments for academic projects is exciting but also complex, requiring extensive work for reliability and validity assurance.
Practice Question
- A classmate reports a reliability coefficient of .49 and a validity coefficient of .74, claiming these results indicate adequate validity. What is wrong with this statement?
- (Salkind, Tests and Measurement 3e. SAGE Publishing, 2018).