Study Notes on Probability and Probability Distributions

Probability as a Measure of Uncertainty
- Introduced basic probability rules
- Theoretical vs. empirical probability
- Empirical probability calculated from relative frequencies
Transition to exploring probabilities of all possible outcomes

Definition: A tool to describe the behavior of all possible outcomes of a random variable
Random variable:
- Possible values are determined randomly by chance
- Cannot predict which value will occur in the short run, but shows a pattern in the long run
- In statistics, refers to numerical variables
Types of Random Variables:
- Discrete Random Variable
- Can take certain distinct or separate values
- Examples:
  - Number of phone calls received in a day (0, 1, 2, 3…)
  - Countable values with finite or countably infinite outcomes
  - Note: Non-finite values like 2.5 phone calls don’t make sense
- Continuous Random Variable
- Can take any value within an interval
- Examples include measurements like height, weight, commuting time, etc.
- Possible values can’t be listed due to infinitude in any interval (e.g., 0 to 0.1 has infinitely many values)

Discrete Variables
- Can be displayed in tables or like frequency distributions
- Probability distributions show associated probabilities of outcomes
Continuous Variables
- More complex as it involves infinite outcomes
- Probability represented through areas under curves rather than distinct values

Outcomes must derive from a random process
Each possible outcome has an associated probability
- Must agree with axioms of probability:
- All values between 0 and 1
- Sum of all probabilities equals 1 (covers the sample space)

Example: Number of times a student changes major
- Define random variable X for number of changes (0 to 8)
- Display probability distribution with associated probabilities
- Valid Probability Model check:
- Ensure all probabilities between 0 and 1
- Sum of probabilities equals 1
Visualization as probability histogram
- Horizontal axis: Range of X (0 to 8)
- Vertical axis: Probabilities corresponding to X values
- Heights of bars equal to probabilities

Differences from discrete: Continuous occur over intervals, not distinct values
Probability density curves represent probabilities rather than bars
Area under the curve is probability for a continuous random variable
Continuous random variable probabilities derived from intervals, yielding no probability of exact outcomes

Properties:
- Symmetric, bell-shaped curve
- Used for many natural phenomena (height, weight, etc.)
- Introduced by Gauss in measurement errors context
- Described with (N(\mu, \sigma^2))
- Determined by mean and standard deviation:
- Mean location = center
- Standard deviation influence spread
Empirical rule: 68% within 1 standard deviation, 95% within 2, and 99.7% within 3

Calculating probabilities can be achieved via:
- Density functions and integrals
- Computational tools like R
- Standard Normal Table (Z-table)

Z transformation:
- Converts normal variables to standard normal distribution (mean = 0, std = 1) via (Z = \frac{X - \mu}{\sigma})
Usage of Z table enables probability calculation following standard conversion
- Left tail probabilities provided with standard scores

What to do for given probabilities?
- Use Z-scores to derive values from probabilities or areas given
- From the Z-table, find closest corresponding Z-scores

Finding values for given percentiles using Z-scores
Utilizing R for calculations:
- pnorm() for probability
- qnorm() for inverse percentile

Heights of children following normal distribution (mean = 39, std deviation = 2)
- Calculating 90th percentile using Z transformations
Empirical observations indicating health metrics depending on percentile rankings