Study Notes on Probability and Probability Distributions

Introduction

  • Probability as a Measure of Uncertainty

    • Introduced basic probability rules

    • Theoretical vs. empirical probability

    • Empirical probability calculated from relative frequencies

  • Transition to exploring probabilities of all possible outcomes

Probability Distribution

  • Definition: A tool to describe the behavior of all possible outcomes of a random variable

  • Random variable:

    • Possible values are determined randomly by chance

    • Cannot predict which value will occur in the short run, but shows a pattern in the long run

    • In statistics, refers to numerical variables

  • Types of Random Variables:

    • Discrete Random Variable

    • Can take certain distinct or separate values

    • Examples:

      • Number of phone calls received in a day (0, 1, 2, 3…)

      • Countable values with finite or countably infinite outcomes

      • Note: Non-finite values like 2.5 phone calls don’t make sense

    • Continuous Random Variable

    • Can take any value within an interval

    • Examples include measurements like height, weight, commuting time, etc.

    • Possible values can’t be listed due to infinitude in any interval (e.g., 0 to 0.1 has infinitely many values)

Displaying Probability Distributions

  • Discrete Variables

    • Can be displayed in tables or like frequency distributions

    • Probability distributions show associated probabilities of outcomes

  • Continuous Variables

    • More complex as it involves infinite outcomes

    • Probability represented through areas under curves rather than distinct values

Features of Probability Distributions

  • Outcomes must derive from a random process

  • Each possible outcome has an associated probability

    • Must agree with axioms of probability:

    • All values between 0 and 1

    • Sum of all probabilities equals 1 (covers the sample space)

Discrete Probability Distribution Example

  • Example: Number of times a student changes major

    • Define random variable X for number of changes (0 to 8)

    • Display probability distribution with associated probabilities

    • Valid Probability Model check:

    • Ensure all probabilities between 0 and 1

    • Sum of probabilities equals 1

  • Visualization as probability histogram

    • Horizontal axis: Range of X (0 to 8)

    • Vertical axis: Probabilities corresponding to X values

    • Heights of bars equal to probabilities

Calculating Probabilities from Distribution

  • Example Questions:

    • Probability student will change major at most once

    • Denoted as P(X ≤ 1)

    • Calculate by summing P(X = 0) + P(X = 1) = 0.406

    • Interpretation: 40.6% chance

    • Probability changing major at least twice

    • Denote as P(X ≥ 2)

    • Alternative calculation via complement:

      • P(X ≥ 2) = 1 - (P(X = 0) + P(X = 1)) = >0.59

Games and Probability Distribution

  • Example of rolling a fair die

    • Possible values are -1, 0, or 5 points

    • Probability of specific outcomes calculated:

    • P(X = -1) = 1/6

    • P(X = 0) = 3/6

    • P(X = 5) = 2/6

    • Sum of probabilities must equal 1

Infinite Discrete Random Variables

  • Experiment: Roll die until a six appears

    • Define X as number of trials until six is rolled

    • P(X = 1), P(X = 2), etc., derive from the multiplication rule

    • Not feasible for a table or graph due to infinite possible outcomes

    • Equation pattern:

    • P(X = x) = \left(\frac{5}{6}\right)^{(x-1)} \cdot \left(\frac{1}{6}\right)

Transition to Continuous Random Variables

  • Differences from discrete: Continuous occur over intervals, not distinct values

  • Probability density curves represent probabilities rather than bars

  • Area under the curve is probability for a continuous random variable

  • Continuous random variable probabilities derived from intervals, yielding no probability of exact outcomes

Properties of Probability Density Curves

  • Total area under the curve = 1

  • Probability of any specific outcome = 0

  • Areas correspond to probabilities in continuous variables

Normal Distribution

  • Properties:

    • Symmetric, bell-shaped curve

    • Used for many natural phenomena (height, weight, etc.)

    • Introduced by Gauss in measurement errors context

    • Described with (N(\mu, \sigma^2))

    • Determined by mean and standard deviation:

    • Mean location = center

    • Standard deviation influence spread

  • Empirical rule: 68% within 1 standard deviation, 95% within 2, and 99.7% within 3

Working with Normal Distributions

  • Calculating probabilities can be achieved via:

    • Density functions and integrals

    • Computational tools like R

    • Standard Normal Table (Z-table)

Standard Normal Transformation

  • Z transformation:

    • Converts normal variables to standard normal distribution (mean = 0, std = 1) via (Z = \frac{X - \mu}{\sigma})

  • Usage of Z table enables probability calculation following standard conversion

    • Left tail probabilities provided with standard scores

Problems Involving Standardization

  • What to do for given probabilities?

    • Use Z-scores to derive values from probabilities or areas given

    • From the Z-table, find closest corresponding Z-scores

Percentiles and Inverses

  • Finding values for given percentiles using Z-scores

  • Utilizing R for calculations:

    • pnorm() for probability

    • qnorm() for inverse percentile

Examples of Percentile Calculation

  • Heights of children following normal distribution (mean = 39, std deviation = 2)

    • Calculating 90th percentile using Z transformations

  • Empirical observations indicating health metrics depending on percentile rankings

Next Steps in Learning

  • Transition to sampling distributions in the next lecture