Course Title: MATH1041 Statistics for Life and Social Sciences
Authors: P. Lafaye De Micheaux, L. Helme-Guizon, J. Stocklosa, D. Warton, and past lecturers
Term: 1, 2025
Probability, Discrete Random Variables and the Binomial Distribution
Lecture 1: Probability
Lecture 2: Random Variables
Lecture 3: Means & Variances for Discrete Random Variables
Lecture 4: The Binomial Distribution and Other Probability Models
Aim: Introduce the science of collecting, analyzing, and interpreting data.
Previous Topics: Least-squares regression, residuals, and the r-squared value.
Current Focus: Probability and random variables.
Linear Least Squares Regression: Method to find the line of best fit; allows prediction of y values based on x.
R-squared (r²) Value: Measures how well the model explains variation in the dependent variable.
Example: If r = -0.78, r² = 0.61 indicates 61% of variation in y is explained by x.
Residuals: Assess if regression line is appropriate.
Definition: Residuals = Observed y - Fitted y
Ideal: No pattern should be present (random scatter around 0).
Heteroscedasticity: Variation in residuals concerning different values of x; leads to a trumpet-shaped pattern in graphs.
Bonus examples of heteroscedasticity:
Income vs Meal Expenditure: Variation in spending is greater for higher incomes.
Suburban Population vs Retail: Larger suburbs show more variation in type and number of local shops.
Defining Random Variables: X denotes possible measured outcomes in an experiment.
Sample space: Set of all potential outcomes from a random experiment.
Discrete vs Continuous Random Variables:
Discrete: Can assume a countable number of values (e.g., number of heads in flips).
Continuous: Can take any value within an interval (e.g., height).
Definition: Function assigning probabilities to every value of a discrete random variable.
Mean of Random Variable X:
mu_X = E(X) = Σ (x * P(X = x)) over all x values.
Example: Rolling a fair die results in different probabilities for values from 1 to 6.
Variance of X:
Var(X) = σ²_X = E[(X - mu_X)²] which measures dispersion.
Example: Variance when rolling a fair die vs variance when flipping heads in multiple trials.
Independence of Random Variables: P(X and Y) = P(X) * P(Y).
Expected Value: Represents the long-term average of random variables over trials.
Definition: Number of successes in n independent Bernoulli trials with a success probability p.
Notation: X ∼ B(n, p).
Example: Finding the number of left-handed individuals in a group can be modeled as binomial where success = being left-handed.
Probability Distribution Formula:
P(X = k) = (n choose k) p^k (1-p)^(n-k)
Notable examples: Left-handedness in a random sample of 3 people or tossing a coin multiple times.
Calculating binomial probabilities: Tools like R can help simplify calculations.
Event Probability
Independent Random Variables: Events where one doesn't influence the other.
Mean (µ): Expected weighted average value based on probabilities.
Variance (σ²): A measure of how outcomes deviate from the mean.