lecture 2

Course Title: MATH1041 Statistics for Life and Social Sciences
Authors: P. Lafaye De Micheaux, L. Helme-Guizon, J. Stocklosa, D. Warton, and past lecturers
Term: 1, 2025

Probability, Discrete Random Variables and the Binomial Distribution
- Lecture 1: Probability
- Lecture 2: Random Variables
- Lecture 3: Means & Variances for Discrete Random Variables
- Lecture 4: The Binomial Distribution and Other Probability Models

Aim: Introduce the science of collecting, analyzing, and interpreting data.
Previous Topics: Least-squares regression, residuals, and the r-squared value.
Current Focus: Probability and random variables.

Linear Least Squares Regression: Method to find the line of best fit; allows prediction of y values based on x.
R-squared (r²) Value: Measures how well the model explains variation in the dependent variable.
- Example: If r = -0.78, r² = 0.61 indicates 61% of variation in y is explained by x.
Residuals: Assess if regression line is appropriate.
- Definition: Residuals = Observed y - Fitted y
- Ideal: No pattern should be present (random scatter around 0).

Heteroscedasticity: Variation in residuals concerning different values of x; leads to a trumpet-shaped pattern in graphs.
Bonus examples of heteroscedasticity:
- Income vs Meal Expenditure: Variation in spending is greater for higher incomes.
- Suburban Population vs Retail: Larger suburbs show more variation in type and number of local shops.

Defining Random Variables: X denotes possible measured outcomes in an experiment.
Sample space: Set of all potential outcomes from a random experiment.
Discrete vs Continuous Random Variables:
- Discrete: Can assume a countable number of values (e.g., number of heads in flips).
- Continuous: Can take any value within an interval (e.g., height).

Definition: Function assigning probabilities to every value of a discrete random variable.
Mean of Random Variable X:
- mu_X = E(X) = Σ (x * P(X = x)) over all x values.
- Example: Rolling a fair die results in different probabilities for values from 1 to 6.

Variance of X:
- Var(X) = σ²_X = E[(X - mu_X)²] which measures dispersion.
- Example: Variance when rolling a fair die vs variance when flipping heads in multiple trials.
Independence of Random Variables: P(X and Y) = P(X) * P(Y).
Expected Value: Represents the long-term average of random variables over trials.

Definition: Number of successes in n independent Bernoulli trials with a success probability p.
- Notation: X ∼ B(n, p).
- Example: Finding the number of left-handed individuals in a group can be modeled as binomial where success = being left-handed.
Probability Distribution Formula:
- P(X = k) = (n choose k) p^k (1-p)^(n-k)

Notable examples: Left-handedness in a random sample of 3 people or tossing a coin multiple times.
Calculating binomial probabilities: Tools like R can help simplify calculations.