stats midterm 2
1. Categorical vs. Quantitative Variables & Comparing Groups
Understanding Variable Types
You must know how to classify variables, because this determines which statistical method you use.
Categorical variables
Describe groups or categories
Examples: gender, previous quit attempt (yes/no), treatment (AZT vs placebo)
Typically summarized using: counts, proportions, bar graphs, segmented bar graphs
Quantitative variables
Numerical values where arithmetic makes sense
Examples: age, weight, number of cigarettes, number of words memorized
Summarized with: mean, median, SD, boxplots, histograms
What dictates a comparison?
Compare proportions → when variables are categorical
Compare means → when variables are quantitative
Statistical Goals in Experimental Design
When comparing background variables between treatment groups:
You hope to fail to reject null hypotheses (i.e., groups are similar)
Because well-balanced groups help guarantee fairness and reduce confounding
---
2. Randomization Tests & Two-Way Tables (Shift Study Example)
Observational Units
Know how to identify what “one row of data” represents — here it was one shift.
Explanatory vs. Response Variables
Explanatory: “Gilbert on shift?” (yes/no)
Response: “At least one death?” (yes/no)
Two-way tables
Used when both variables are categorical.
Statistics Commonly Used
Difference in proportions
Risk ratio
Odds ratio
Randomization Test Logic
A randomization test:
1. Assumes shifts are assigned randomly under the null
2. Reassigns shift labels many times
3. Measures how often the simulated statistic is as extreme as observed
Interpreting Simulated Null Distributions
If observed statistic is far in the tail → reject H₀
If it’s common → fail to reject H₀
“Lawyer interpretation” Skills
Be able to argue against causation:
Observational study?
Confounding variables?
Imbalanced shifts?
Patterns may appear by chance
Alternate Measures of Extremeness
Examples:
z-score
standardized statistic
tail proportion in permutation distribution
---
3. Observational Studies vs. Experiments & Confounding
(Positive/Negative Emotion and Colds Study)
Identifying explanatory & response variables
Explanatory: emotional state score (quantitative → categorized into thirds)
Response: did the person catch a cold? (binary categorical)
Study type matters
Experiment → researcher assigns explanatory variable
Observational study → just observes, no assignment
This example: observational study.
Implication for causal conclusions
Cannot conclude causation
Must consider confounding variables
Common Confounders
Examples:
stress levels
sleep
income
underlying health
exposure to virus outside study
You need to be able to:
1. Name a potential confounder
2. Explain how it affects both the explanatory & response variables
---
4. Two-Sample Z Tests, Segmented Bar Graphs, and Random Assignment
(AZT vs. placebo example)
Segmented Bar Graphs
Understand how they visualize:
Proportion infected within each treatment group
Validity conditions for a two-sample z test
For comparing two proportions, both groups must have:
At least 10 successes and
At least 10 failures
in both groups under the null OR observed counts.
Study Design
Important distinction:
Random assignment → supports causal inference
Random sampling → supports generalization to population
In this example: random assignment only
→ causation is justified, generalization is limited
---
5. Interpreting Two-Sample t-test Outputs & Effects on p-value
Understanding Software Output
Key pieces to interpret:
Sample means
Standard deviations
SE of difference
Test statistic (t-value)
p-value
Confidence interval
Predicting how changes affect p-values
You need conceptual understanding of how p-values behave:
1. Adding 1 to every observation
Increases both means equally → difference stays identical
p-value unchanged
2. Increasing sample standard deviations
SE increases
t-statistic decreases
p-value increases (less significant)
3. Increasing sample sizes
SE decreases
t-statistic increases
p-value decreases (more significant)
---
6. Matched Pairs Designs, Simulation Based Tests, Paired t-tests
(Jumping jacks & memory example)
Identifying Explanatory & Response Variables
Explanatory: condition (exercise vs. not) — categorical
Response: number of words memorized — quantitative
Null & Alternative Hypotheses
Know symbolic forms:
μᵈ = 0, μᵈ > 0, μᴇ - μₙ = 0, etc.
Simulation Plots (Red, Blue, Black)
You need to know:
Black plot = null distribution
Each dot = randomized difference in means (or paired differences) under null
Randomization mechanism
Understand how an applet reassigns values for matched pairs:
Randomly swap the two condition labels for each subject
Conclusion logic in randomization tests
Reject H₀ if:
Observed difference is in extreme tail of null distribution
Validity conditions for paired t-test
Need:
Differences are approximately normal
No extreme outliers
Sample size n ≥ 15 gives robustness
Paired t-test & CI Concepts
You must know formulas conceptually:
t = (mean diff) / (SD(diff)/√n)
CI = mean diff ± t* × SE(diff)
Interpretation:
A CI describes plausible values for the true mean difference.
---
7. Independent Samples vs Matched Pairs Design
(Milking methods example)
Know how to classify designs:
Independent samples → different cows in each group (Design A)
Matched pairs → same cow measured twice OR paired based on similarity (Designs B & C)
Key idea:
Matched pairs controls for cow-to-cow variability → reduces noise → increases power.
---
8. One-Sample Paired t-test: Violation of Expectation Study
(Helper vs Hinderer looking-time example)
Parameter of Interest
Always:
Mean difference in population (μᵈ)
Hypotheses
H₀: μᵈ = 0
Hₐ: μᵈ > 0 (longer looking at hinderer)
Appropriate test
One-sample paired t-test on differences
Validity conditions
Differences ~ normal shape OR
sample size ≥ 15
Here n = 16, so t-test is appropriate.
Interpretation of p-value
Always in context:
Probability of observing a difference as large or larger if infants in population truly have no preference.
Interpretation of Confidence Interval
CI gives range of plausible mean differences
If interval does not include 0 → supports significance