lecture 2

Faculty of Science School of Mathematics and Statistics

  • Course Title: MATH1041 Statistics for Life and Social Sciences

  • Authors: P. Lafaye De Micheaux, L. Helme-Guizon, J. Stocklosa, D. Warton, and past lecturers

  • Term: 1, 2025

Table of Contents

  • Probability, Discrete Random Variables and the Binomial Distribution

    • Lecture 1: Probability

    • Lecture 2: Random Variables

    • Lecture 3: Means & Variances for Discrete Random Variables

    • Lecture 4: The Binomial Distribution and Other Probability Models

Lecture 1: Probability

  • Aim: Introduce the science of collecting, analyzing, and interpreting data.

  • Previous Topics: Least-squares regression, residuals, and the r-squared value.

  • Current Focus: Probability and random variables.

Introduction to Probability

  • Linear Least Squares Regression: Method to find the line of best fit; allows prediction of y values based on x.

  • R-squared (r²) Value: Measures how well the model explains variation in the dependent variable.

    • Example: If r = -0.78, r² = 0.61 indicates 61% of variation in y is explained by x.

  • Residuals: Assess if regression line is appropriate.

    • Definition: Residuals = Observed y - Fitted y

    • Ideal: No pattern should be present (random scatter around 0).

Heteroscedasticity vs Homoscedasticity

  • Heteroscedasticity: Variation in residuals concerning different values of x; leads to a trumpet-shaped pattern in graphs.

  • Bonus examples of heteroscedasticity:

    • Income vs Meal Expenditure: Variation in spending is greater for higher incomes.

    • Suburban Population vs Retail: Larger suburbs show more variation in type and number of local shops.

Random Variables and Data Collection

  • Defining Random Variables: X denotes possible measured outcomes in an experiment.

  • Sample space: Set of all potential outcomes from a random experiment.

  • Discrete vs Continuous Random Variables:

    • Discrete: Can assume a countable number of values (e.g., number of heads in flips).

    • Continuous: Can take any value within an interval (e.g., height).

Probability Distribution

  • Definition: Function assigning probabilities to every value of a discrete random variable.

  • Mean of Random Variable X:

    • mu_X = E(X) = Σ (x * P(X = x)) over all x values.

    • Example: Rolling a fair die results in different probabilities for values from 1 to 6.

Properties of Variance

  • Variance of X:

    • Var(X) = σ²_X = E[(X - mu_X)²] which measures dispersion.

    • Example: Variance when rolling a fair die vs variance when flipping heads in multiple trials.

  • Independence of Random Variables: P(X and Y) = P(X) * P(Y).

  • Expected Value: Represents the long-term average of random variables over trials.

Binomial Distribution

  • Definition: Number of successes in n independent Bernoulli trials with a success probability p.

    • Notation: X ∼ B(n, p).

    • Example: Finding the number of left-handed individuals in a group can be modeled as binomial where success = being left-handed.

  • Probability Distribution Formula:

    • P(X = k) = (n choose k) p^k (1-p)^(n-k)

Genes and the Binomial Distribution

  • Notable examples: Left-handedness in a random sample of 3 people or tossing a coin multiple times.

  • Calculating binomial probabilities: Tools like R can help simplify calculations.

Summary of Key Concepts and Definitions

  • Event Probability

  • Independent Random Variables: Events where one doesn't influence the other.

  • Mean (µ): Expected weighted average value based on probabilities.

  • Variance (σ²): A measure of how outcomes deviate from the mean.

robot