Study Notes on Probability and Probability Distributions
Introduction
Probability as a Measure of Uncertainty
Introduced basic probability rules
Theoretical vs. empirical probability
Empirical probability calculated from relative frequencies
Transition to exploring probabilities of all possible outcomes
Probability Distribution
Definition: A tool to describe the behavior of all possible outcomes of a random variable
Random variable:
Possible values are determined randomly by chance
Cannot predict which value will occur in the short run, but shows a pattern in the long run
In statistics, refers to numerical variables
Types of Random Variables:
Discrete Random Variable
Can take certain distinct or separate values
Examples:
Number of phone calls received in a day (0, 1, 2, 3…)
Countable values with finite or countably infinite outcomes
Note: Non-finite values like 2.5 phone calls don’t make sense
Continuous Random Variable
Can take any value within an interval
Examples include measurements like height, weight, commuting time, etc.
Possible values can’t be listed due to infinitude in any interval (e.g., 0 to 0.1 has infinitely many values)
Displaying Probability Distributions
Discrete Variables
Can be displayed in tables or like frequency distributions
Probability distributions show associated probabilities of outcomes
Continuous Variables
More complex as it involves infinite outcomes
Probability represented through areas under curves rather than distinct values
Features of Probability Distributions
Outcomes must derive from a random process
Each possible outcome has an associated probability
Must agree with axioms of probability:
All values between 0 and 1
Sum of all probabilities equals 1 (covers the sample space)
Discrete Probability Distribution Example
Example: Number of times a student changes major
Define random variable X for number of changes (0 to 8)
Display probability distribution with associated probabilities
Valid Probability Model check:
Ensure all probabilities between 0 and 1
Sum of probabilities equals 1
Visualization as probability histogram
Horizontal axis: Range of X (0 to 8)
Vertical axis: Probabilities corresponding to X values
Heights of bars equal to probabilities
Calculating Probabilities from Distribution
Example Questions:
Probability student will change major at most once
Denoted as P(X ≤ 1)
Calculate by summing P(X = 0) + P(X = 1) = 0.406
Interpretation: 40.6% chance
Probability changing major at least twice
Denote as P(X ≥ 2)
Alternative calculation via complement:
P(X ≥ 2) = 1 - (P(X = 0) + P(X = 1)) = >0.59
Games and Probability Distribution
Example of rolling a fair die
Possible values are -1, 0, or 5 points
Probability of specific outcomes calculated:
P(X = -1) = 1/6
P(X = 0) = 3/6
P(X = 5) = 2/6
Sum of probabilities must equal 1
Infinite Discrete Random Variables
Experiment: Roll die until a six appears
Define X as number of trials until six is rolled
P(X = 1), P(X = 2), etc., derive from the multiplication rule
Not feasible for a table or graph due to infinite possible outcomes
Equation pattern:
P(X = x) = \left(\frac{5}{6}\right)^{(x-1)} \cdot \left(\frac{1}{6}\right)
Transition to Continuous Random Variables
Differences from discrete: Continuous occur over intervals, not distinct values
Probability density curves represent probabilities rather than bars
Area under the curve is probability for a continuous random variable
Continuous random variable probabilities derived from intervals, yielding no probability of exact outcomes
Properties of Probability Density Curves
Total area under the curve = 1
Probability of any specific outcome = 0
Areas correspond to probabilities in continuous variables
Normal Distribution
Properties:
Symmetric, bell-shaped curve
Used for many natural phenomena (height, weight, etc.)
Introduced by Gauss in measurement errors context
Described with (N(\mu, \sigma^2))
Determined by mean and standard deviation:
Mean location = center
Standard deviation influence spread
Empirical rule: 68% within 1 standard deviation, 95% within 2, and 99.7% within 3
Working with Normal Distributions
Calculating probabilities can be achieved via:
Density functions and integrals
Computational tools like R
Standard Normal Table (Z-table)
Standard Normal Transformation
Z transformation:
Converts normal variables to standard normal distribution (mean = 0, std = 1) via (Z = \frac{X - \mu}{\sigma})
Usage of Z table enables probability calculation following standard conversion
Left tail probabilities provided with standard scores
Problems Involving Standardization
What to do for given probabilities?
Use Z-scores to derive values from probabilities or areas given
From the Z-table, find closest corresponding Z-scores
Percentiles and Inverses
Finding values for given percentiles using Z-scores
Utilizing R for calculations:
pnorm() for probability
qnorm() for inverse percentile
Examples of Percentile Calculation
Heights of children following normal distribution (mean = 39, std deviation = 2)
Calculating 90th percentile using Z transformations
Empirical observations indicating health metrics depending on percentile rankings
Next Steps in Learning
Transition to sampling distributions in the next lecture