Title: Item Analysis Lecture
Institution: Aston University, Birmingham, UK
Course Code: PY2501 Research Methods & Data Analysis
Instructor: Dr. Ryan
Psychologists utilize questionnaires to assess psychological constructs, such as:
Personality
Anxiety
Self-esteem
Internal motivation
Questionnaires provide indirect measures of these constructs primarily depending on self-reporting.
The challenge for psychologists is to ensure that these questionnaires are reliable.
(Very) Unreliable Questionnaire: Lacks consistent responses and accuracy.
Reliable Questionnaire: Produces consistent responses and accurately measures psychological constructs.
Reliability vs. Validity:
Reliability refers to the consistency of the measurement scale.
Types of Reliability:
Internal Reliability (focus of this lecture)
Test-Retest Reliability
Importance of measuring reliability for questionnaire items.
Identify a reverse scored item related to extraversion.
Define a questionnaire with good reliability.
Define a questionnaire with good validity.
Define a questionnaire that exhibits both good validity and reliability.
Correlation Coefficient Ranges:
Negative relationships:
-0.1 = weak
-0.3 = moderate
-0.5 = strong
-1 = perfect
Positive relationships:
0.1 = weak
0.3 = moderate
0.5 = strong
1 = perfect
Correlational analysis measures the relationship between two continuous variables.
Correlation coefficient (r) values range from -1 to 1, indicating strength based on absolute size.
The strength is determined by the degree of scatter rather than slope.
More scatter indicates a smaller correlation coefficient.
Perfect Positive: 1
Strong Positive: 0.5
Moderate Positive: 0.3
No Correlation: 0
Moderate Negative: -0.3
Strong Negative: -0.5
Perfect Negative: -1
Internal Reliability (recap)
Item Analysis: Improving Internal Reliability
Item-total correlation
Cronbach’s alpha if deleted
Additional checks:
Test-Retest Reliability
Validity Evidence
Individual items should reflect the same construct.
High correlation among items allows scores to be summed for a total variable score.
Example: Beck's Depression Inventory (BDI)
Participants scoring high on one item should score high on related items.
Split-half Reliability:
Randomly split the dataset and check correlation between halves.
Correct for reversed questions to ensure accuracy.
Issue: Potential variability based on split method.
Overcomes split-half issues.
Provides an average correlation coefficient across all possible splits.
Cronbach's alpha (α) value ranges between -1 and +1.
Internal consistency interpretation:
α ≥ .9: Excellent
.9 > α ≥ .8: Good
.8 > α ≥ .7: Acceptable
.7 > α ≥ .6: Questionable
.6 > α ≥ .5: Poor
.5 > α: Unacceptable
Best statistic to assess internal reliability?
Ideal value for optimal internal reliability?
What is indicated by calculating the average of item scores in a ten-item scale?
Remove items reducing reliability.
Item Total Correlation:
Check if item score correlates with total score excluding that item.
Good items correlate at least .4 with the total score.
Cronbach’s Alpha if Deleted:
Check how alpha changes if an item is removed.
Goal is to achieve α ~.9.
New 8-item depression scale developed.
The shorter scale improves efficiency but needs reliability checks.
Jamovi provides item-total correlations and Cronbach’s α if deleted.
Each column represents questions; rows represent individual scores.
Example output shows:
Cronbach’s α at .78 with all items included.
Identify problematic items with item-total correlation < .4.
Assess effectiveness of removing items one at a time.
Start with the worst-performing item for analysis.
After removing Q3, α increased to .81.
Further examination suggests Q7 should also be removed to improve α to .85.
Final changes yield Cronbach’s α at .85 after removing Q3 and Q7.
Q6 could also be removed but it has a reasonable correlation with other items.
Recommendation regarding an item with high correlation and good internal reliability?
Decision-making on another item with high alpha if deleted score.
Transition to Part 3: Overview of Test-Retest Reliability and Validity.
Measures reliability over time.
Correlation between Time 1 and Time 2 should exceed .7 for consistency.
Validity assesses if the questionnaire measures the intended constructs.
No absolute way to demonstrate validity; evidence is gathered through criteria.
Content Validity: Range of items evaluated by experts.
Face Validity: Expert judgment on whether it measures the intended construct.
Criterion Validity:
Concurrent: Correlation with existing measures.
Predictive: Ability to make future predictions based on the measure.
Thank you for participation!
Questions can be directed to: r.blything@aston.ac.uk