Normal & Binomial Distributions Study Notes

Introduction to Statistical Methods

Course: STAT 2331
Institution: SMU

Types of Random Variables

Random variables can be classified into two main types:
- Discrete Random Variables: Probabilities are computed by adding and subtracting outcomes.
- Continuous Random Variables: Probabilities are calculated by finding the area under a density curve.

Focus of the Module

This module primarily addresses two significant types of probability distributions:
- Normal Distributions
- Binomial Distributions

Objectives for Normal & Binomial Distributions

Determine if a distribution is approximately normal.
Apply the Empirical Rule to normally distributed data.
Transform normally distributed values to the standard normal distribution.
Calculate probabilities for normal random variables.
Perform inverse normal calculations to find percentiles.
Assess when a count can be modeled using the binomial distribution.
Calculate probabilities for a binomial random variable.
Use the normal approximation for the binomial distribution when suitable.

Normal Distributions

Characteristics and Properties

Normal distributions are bell-shaped; they are also known as symmetric and unimodal.
Notation for a normal distribution: N(\mu, \sigma) where
- \mu = mean
- \sigma = standard deviation
The area under the entire curve of a normal distribution equals 1.

Normal Q-Q Plots

It is essential to visually assess if data is normally distributed. A normal quantile-quantile (Q-Q) plot can be used.
- Distributions where points closely follow the reference line indicate normality.
- Systematic deviations from the line suggest the distribution is not normal.

The Empirical Rule (68-95-99.7 Rule)

For approximately normal distributions:
- 68% of observations fall within 1 standard deviation of the mean (\sigma).
- 95% of observations fall within 2 standard deviations of the mean (2\sigma).
- 99.7% of observations fall within 3 standard deviations of the mean (3\sigma).

Examples for Normal Distributions

Example 1

Exam scores are normally distributed with:
- Mean \mu = 80
- Standard deviation \sigma = 5
Calculate the following probabilities:
- 68% of test scores are between: [80 - 5, 80 + 5] = [75, 85]
- 95% of test scores are between: [80 - 2(5), 80 + 2(5)] = [70, 90]
- 99.7% of test scores are between: [80 - 3(5), 80 + 3(5)] = [65, 95]
What proportion of scores is higher than 75? (Use Z-scores and normal distribution properties.)
What proportion of scores is between 70 and 85?

Example 2

Heights for adult females (normal distribution):
- \mu = 65 inches, \sigma = 3.5 inches
Heights for adult males (normal distribution):
- \mu = 70 inches, \sigma = 4.0 inches
For a randomly selected female and male:
- Graph both distributions.
- Find what female height is one standard deviation above the mean: \mu + \sigma = 65 + 3.5 = 68.5 inches.
- Find what male height is three standard deviations below the mean: \mu - 3\sigma = 70 - 3(4) = 58 inches.
- Calculate the number of standard deviations from the mean for a female height of 72 inches using:
  \mu_X + z\sigma_X = 72.
- Similarly, calculate for male height of 62 inches.

Z-scores

The formula for calculating a Z-score is:
z = \frac{x - \mu_X}{\sigma_X}
Where:
- X is any random variable
- x is a specific value of X
- z indicates how many standard deviations x is from the mean \mu
Z-scores allow us to standardize observations.

Standard Normal Distribution

If a variable is normally distributed:
X \sim N(\mu, \sigma)
Z-scores are also normally distributed with mean \mu_Z = 0 and standard deviation \sigma_Z = 1:
Z \sim N(0, 1)
Probabilities for X are equivalent to those for Z: P(X < x) = P(Z < z) .
- This notation represents the probability to the left of x or its equivalent Z-score.

Examples of Calculating Probabilities

Example 4

The distribution of female heights is X \sim N(65, 3.5).
Probability that a randomly selected adult female is shorter than 68 inches:
- State in mathematical notation and calculate the Z-score.
- Draw a graph aligning both distribution curves.
- Use properties to find:
  P(X < 68) = P(Z < z) resulting in z = 0.857142 .

Calculating Normal Probabilities Using Software and Calculators

In Excel: Use:
- For cumulative probabilities: =NORM.DIST(x,\mu,\sigma,TRUE)
- For probabilities: =NORM.DIST(x,\mu,\sigma,FALSE) .
On a TI calculator:
- Use:
  normalcdf(lower,upper,\mu,\sigma) .
Example calculation with a dragonfly wingspan:

Example 5.1

Distribution: X \sim N(4, 0.25).
Find probabilities less and greater than specific wingspan lengths.

Inverse Normal Calculations

Used when given probability and seeking the variable corresponding to it.
In Excel: Use:
=NORM.INV(probability, \mu, \sigma) .
On a TI calculator: Use:
invNorm(probability, \mu, \sigma) .
Example related to dragonfly wingspan percentiles.

Important Points for Normal Distributions

Notation: X \sim N(\mu, \sigma)
Use Q-Q plots to assess normality.
Empirical Rule: 68%, 95%, and 99.7% for standard deviations.
Z-scores transform X-values to Z-values standardizing distributions.

Transitioning from Quantitative to Categorical Data

Previous examples utilized quantitative data.
Focus shifts to categorical data modeling via binomial distributions.

Binomial Distributions

A binomial setting must satisfy four conditions:
- B – Binary: Each trial results in a success or failure.
- I – Independent: The outcome of each trial is not affected by the others.
- N – Number: A fixed number of trials, denoted as n.
- S – Success: A constant probability of success, denoted as p.

Binomial Random Variable

Notation for the binomial distribution: X \sim B(n,p)
The random variable X counts successes in n trials.

Binomial Probability Formula

The formula is: P(X = k) = \frac{n!}{k!(n-k)!} p^k (1-p)^{n-k}
- Where:
- X = number of successes in n trials
- k = 0,1,…,n
- p = probability of success
- 1 - p = probability of failure
- n! = n(n-1)(n-2)…(3)(2)(1)

Application Examples for Binomial Distributions

Example 7

Describe a scenario using a weighted coin to assess a binomial setting with probability outcomes.

Example 8

Implementation of a binomial distribution with sampling in a small population, including consultations of success rates.

Software for Calculating Binomial Probabilities

Probabilities can be calculated using software easily:
On TI-83 Calculator, use:
binompdf(n,p,x) for probabilities,
binomcdf(n,p,x) for cumulative probabilities.
In Excel, use:
=BINOM.DIST(x,n,p,cumulative)

Mean, Standard Deviation, and Normal Approximation for Binomial Distributions

For a binomially distributed variable:
\mu = np and \sigma = \sqrt{np(1-p)}
As the sample size n becomes large, the binomial distribution can be approximated by a normal distribution given the conditions np \geq 10 and n(1-p) \geq 10.

Final Example and Calculation

Example 10

Assess the probability of defective items in a shipment using binomial distribution principles and normal approximation.

Important Points for Binomial Distributions

Notational convention: X \sim B(n,p) for binomial distributions.
The variable X signifies the count of successes over n trials.
Remember the four conditions embodied by the acronym BINS: Binary, Independent, Number, Success.
Calculate binomial probabilities using either manual methods or software mechanisms to streamline calculations.