Instructor: Sarah Sramota (s.sramota@vu.nl)
Lecture Time: Tue 11 Feb, 09:00
Modules Covered: 4-6
Week 1: Association between Quantitative Variables
Week 2: Association between Categorical Variables
Topics: Probabilities, Relative Risk, Odds, Odds Ratio, Perception of Risk
Module 5: Comparing groups
Risks of comparing groups
Controlling for confounding variables
Module 6: Reliability analysis
The ‘Diederik Stapel’ case
Cronbach’s Alpha
Scale construction
Categorical Variables
Presentation Methods:
Contingency table / Cross-tabulation
Bar chart
Calculation Methods:
Joint and marginal proportions
Conditional proportions
(Relative) risk
Quantitative Variables
Visualization: Scatterplot
Statistical Measure: Correlation
Association Visualization:
Definition: Association between two categorical variables visualized in a contingency table.
Example Table:
Low Productivity | Medium Productivity | High Productivity | Total | |
---|---|---|---|---|
Espresso | 5 | 20 | 50 | 75 |
Cafe Latte | 10 | 35 | 30 | 75 |
Tea | 15 | 25 | 10 | 50 |
Total | 30 | 80 | 90 | 200 |
Conditional Proportions
Calculation: Dividing cells by row totals gives conditional proportions for each beverage type.
Examples:
Espresso: 5/75 (Low Productivity), 20/75 (Medium), 50/75 (High).
Total (each row sums to 1.00)
Examples: Voting Behavior
Analysis: Compare conditional proportions of voting for candidates based on ethnicity/race.
Considerations for Interpretations:
Conditional Probability: Understanding demographic influences on voting behavior
Example Table:
Race/Ethnicity | Trump | Harris | Total |
---|---|---|---|
White | 84% | 66% | ?% |
Black | 3% | 18% | ?% |
Latino | 9% | 11% | ?% |
Other | 4% | 5% | ?% |
Total | 100% | 100% | 100% |
Calculation method: Probabilities are computed by dividing each cell's individual occurrences by the total count (200).
Example of Probability Calculations:
Low Productivity | Medium Productivity | High Productivity | Total | |
---|---|---|---|---|
Espresso | 0.025 | 0.1 | 0.25 | 0.375 |
Cafe Latte | 0.05 | 0.175 | 0.15 | 0.375 |
Tea | 0.075 | 0.125 | 0.05 | 0.25 |
Total | 0.15 | 0.4 | 0.45 | 1 |
Definitions:
Joint probabilities: factors that mutually exclude one another
Marginal probabilities: consider one variable only
Definition: Ratio of proportions; probability of one occurrence (p1) divided by the probability of another (p2).
Calculation example: 0.25 (espresso) / 0.05 (latte) = 5. Espresso drinkers are 5x more likely to be highly productive than tea drinkers.
Important Update: Relative Risk previously calculated based on joint probabilities. Should be calculated based on conditional probabilities, as it measures the likelihood of an outcome occurring in one group relative to another. Using joint probabilities may give misleading results since it does not consider how common each group is in the population.
Reference to additional explainer: ADDENDUM 20/02: Probabilities, relative risk, odds (ratio).
Absolute Risk: Actual probability of an event occurring
Example: "10% of smokers get lung cancer."
Relative Risk: Comparison of risk between two groups
Example: "Smokers are 5 times more likely to get lung cancer than non-smokers."
Example: Shark Encounters
Absolute Risk Calculation: 10/1000 = 1% gets approached by sharks.
With Shark Repellent: 5/1000 = 0.5% encounter rate.
Absolute Risk Reduction: 1% - 0.5% = 0.5%.
Relative Risk Reduction: RRR = (0.5%) / (1%) = 50%.
Key Considerations: Ensure comparisons are valid and consistent. Examples include misleading statistics and consistent definitions.
Experimental Control
Definition: Maintaining variables constant (e.g., temperature) to eliminate their influence on results.
Statistical Control
Definition: Including other explanatory factors in the analysis (e.g., age, gender).
Importance
Objective: Confirm that a scale consistently measures the same concept (e.g., trust items).
Internal consistency validation via Cronbach's Alpha value.
Example: Constructing a Scale
Situation: Political trust and engagement study utilizing World Value Survey items.
Function: Measures reliability and internal consistency of survey items.