Measurement in Scientific Research
Exam Performance and Its Relation to Measurement
Individuals often experience discrepancies between their preparation for exams and their actual performance.
It's common to perform well on an exam without complete preparation if the questions align with known material.
Conversely, one may struggle even with diligent preparation if the exam covers unfamiliar topics.
These experiences highlight the complexities of measurement not limited to education but relevant to scientific research as well.
Ensuring accurate measurements is critical in research to avoid misleading results.
Measurement Concepts: Reliability and Validity
Reliability: Refers to consistency in measurements, ensuring repeatability of scores.
Validity: Refers to accuracy; whether the measure is assessing what it intends to measure.
Analogy of Reliability and Validity Using Archery
Shooting at a bull's eye analogy helps to differentiate the concepts.
The green center of the bull's eye symbolizes the construct being measured.
Closeness to the bull's eye reflects validity; variability reflects reliability.
Target 1: High Reliability, Low Validity
Example: Shooting five arrows that land in the upper left but far from the bull's eye.
Conclusion: Reliable (consistency in scores) but not valid (not measuring the intended construct).
Target 2: Low Reliability, Improved Validity
Shooting five arrows that are more spread out but closer to the bull's eye.
Result: Validity improves due to proximity to the target, but reliability decreases due to a broader spread.
Target 3: Poor Reliability and Validity
Shooting five arrows with only one arrow hitting the bull's eye but others missing completely.
Conclusion: Neither valid nor reliable measurement.
Target 4: High Reliability and Validity
Shooting five arrows, all hitting the bull's eye consistently.
Conclusion: Both valid and reliable results achieved.
Relationship Between Reliability and Validity
If a measure is valid, it is also reliable by definition.
Conversely, a measure can be reliable without being valid (e.g., Target 1).
Importance of ensuring both validity and reliability in research for credible conclusions.
Types of Reliability
Exploring three types of reliability essential for measurements.
1. Inter-rater Reliability
Defined as the degree of agreement among judges or raters.
Multiple judges rating performances need to agree for high reliability.
Example: Four judges score a singer as 8, 8.5, 8.5, and 8, leading to high inter-rater reliability.
Lack of training for judges can lead to variability in scoring.
Factors Affecting Inter-rater Reliability
Importance of training judges to ensure consistency in measuring various aspects of performance and avoiding bias by discussing scores.
2. Test-Retest Reliability
Refers to the measure's stability over time, ensuring consistent results when tested repeatedly.
Exemplar: Contestants singing multiple times to obtain accurate scoring of talent.
Good stability indicated when scores are consistent across performances (e.g., scores of 7, 7.5, and 6.5).
Implications on Constructs
Constructs like mood may yield poorer test-retest reliability due to natural fluctuations, while stable constructs like intelligence should correlate well over time.
3. Internal Consistency
Describes consistency across items within a measure that are meant to assess the same construct.
Internal consistency can be shown with multiple badges examining social anxiety or aggression in various ways.
Strong internal consistency indicated when items yield similar scores, using measures such as Cronbach's Alpha.
Assessing Internal Consistency
Split-half technique involves correlating the first half of the items with the second half.
Cronbach's Alpha provides a mean score across all possible splits, aiming for a score of 0.80 or higher for good reliability.
Improving Measurement Reliability
Strategies for enhancing reliability in research tools:
Inter-rater Reliability
Ensuring clear and comprehensive evaluation protocols for raters.
Conducting mock sessions for training helps refine consistency among judges.
Test Stability and Internal Consistency
Increasing the number of items improves reliability; more questions lessen the impact of any single response.
Clarity in questions aids understanding, ultimately supporting reliability across measures.
Sample Size Impact
Raising sample size can enhance reliability, though it must be done thoughtfully to avoid introducing bias.
Ideally, sample consistency is maintained through appropriate target demographics.
Validity of Measures
Validity stands as a critical aspect of measurement accuracy, focusing on how well a measure reflects the intended variable.
Types of Validity
Five types generally examined:
1. Face Validity
The degree to which a measure appears to assess its intended construct.
Example: Self-esteem questionnaires including relevant items reflect face validity.
2. Content Validity
The completeness of coverage across the construct of interest.
Example: Athletic ability measures require insights into various athletic behaviors.
3. Predictive Validity
Evaluates correlational effectiveness between measured scores and future performance.
SAT scores forecasting college GPAs illustrate predictive validity.
4. Convergent Validity
Ensures correlation with other measures that assess similar constructs, like measuring anxiety across various methods.
5. Discriminant Validity
Proves low correlation with measures that are not supposed to correlate, validating the measure's specificity.
Validity Assessment Techniques
Similar to reliability, validity often measures correlations, commonly employing Pearson's product correlation across the different types of validity, except for face and content validity which are typically assessed qualitatively.