Study Notes for STAT 1000: Density Curves & Normal Distributions

Density Curves

Definition: Density curves provide a model for the distribution of a continuous random variable.
- They offer an overall picture without considering small irregularities or outliers.
- Smooth curves are easier to work with than histograms.

Histogram Example

A histogram of test scores for $5,000$ high school students taking a provincial math exam illustrates the distribution of scores.
Observations:
- Scores exhibit a fairly regular, symmetric distribution.
- Pattern descends smoothly from the center; no gaps or outliers present.
Fitting a Curve: A smooth curve can be fitted to approximate the histogram's distribution, representing a mathematical model of the data.

Area Representation

The total area of the curve represents proportions of observations.
Like histograms, the area underneath the density curve can be scaled to equal one, indicating relative frequencies or proportions within intervals.
- For example, the proportion of students scoring less than $60$ can be calculated as follows:
  - Representing $1,600$ students: $\frac{1600}{5000} = 0.3200$ .

Properties of Density Curves

All density curves have three key properties:
- They lie entirely above the $x$ -axis.
- The area under the curve is equal to one.
- They represent a proper function: each value of $x$ corresponds to a unique value of $y$ .

Large Populations

When dealing with large populations (e.g., heights, incomes, GPAs):
- Continuous populations lead to smoother histograms approximating density curves.
- For infinitely large populations, histograms can have as many intervals as needed, producing a density curve.

Uniform Distribution

Definition: A uniform distribution is one in which all values in a range are equally likely.
Mathematical Properties:
- Area under the curve equals the area of a rectangle defined by base and height.
- For example, in the interval $[0, 1]$ : Area $= (1-0) \cdot \text{height} = 1$ , confirming it's a valid density curve.
- For an interval $[0.45, 1]$ : Proportion $= \text{Area} = 0.45 \cdot 1 = 0.45$ .

Finding Proportions in a Uniform Distribution

Calculation of proportions falling within intervals is straightforward due to the shape of uniform distribution:
- For any range $[a, b]$ , the calculation is $\text{Area} = (b - a) \cdot \text{height}$ .

Example Problems

Area between values: Calculate the area (proportion of observations) between $1.6$ and $3.3$ .
- Result using the uniform distribution height will yield P(1.6 < X < 3.3) = (3.3 - 1.6) \cdot \text{height} = 1.7 \cdot 0.25 = 0.425.
Solving Percentiles: Find values corresponding to a proportion (e.g., $10\%$ of observations in a uniform distribution).

Triangular Distribution

This type of distribution can be defined with its maximum height $h$ by ensuring that the area sums to $1$ . For a triangle:
- Area $= \frac{1}{2} \cdot \text{base} \cdot h$ . For the triangle with a defined base of $5$ .
- The height must equal $h = \frac{2}{5}$ for normalization.

Parameters & Statistics

Today’s statistics class defined key terms:
- Sample mean $\bar{x}$ vs. population mean $\mu$ .
- Sample standard deviation $s$ vs. population standard deviation $\sigma$ .
- Distinction of parameters (describes populations) vs. statistics (from samples).

Example Problem

MGD Beer example illustrating sample statistics (average content of cans) and indicating whether the values are parameters or statistics.

The Normal Distribution

A specific type of density curve, known for its bell shape and symmetric distribution. Key characteristics include:
- Defined by parameters: population mean $\mu$ and standard deviation $\sigma$ .
- The distribution is symmetric and has a total area under the curve equal to one.
Notation: The normal distribution is denoted as $X \sim N(\mu, \sigma)$ .

Properties of Normal Distribution

Two main parameters:
- Mean ( $\mu$ ): Location of the center of the distribution.
- Standard deviation ( $\sigma$ ): Measure of spread; must be positive.

Relationship Between Parameters

The mean indicates central location, while the standard deviation reflects the spread:
- For an example with $\mu = 100$ and $\sigma = 10$ , the normal distribution is defined by the equation: $X \sim N(100, 10)$ .

68-95-99.7 Rule

Key statistical rule for normal distributions:
- Approximately:
  - $68\%$ of values fall within one standard deviation of the mean ( $\mu \pm \sigma$ ).
  - $95\%$ of values fall within two standard deviations ( $\mu \pm 2\sigma$ ).
  - $99.7\%$ of values fall within three standard deviations ( $\mu \pm 3\sigma$ ).

Example Application

If $X \sim N(150, 20)$ , the proportion between $130$ and $170$ falls, according to the rule, within $68\%$ .
The proportion for values between $110$ and $190$ can be calculated as $95\%$ , indicating coverage within $2$ standard deviations.

Standard Normal Distribution

A special type of normal distribution with:
- Mean $\mu = 0$ , standard deviation $\sigma = 1$ .
- Denoted by the variable $Z$ , where $Z \sim N(0, 1)$ .

Transforming To Standard Normal

To convert a normal variable $X$ into a standard normal variable $Z$ , calculate the z-score as follows:
- $Z = \frac{x - \mu}{\sigma}$ .

Example Calculations

For a height of $187$ cm when $X \sim N(178, 6)$ :
- $Z = \frac{187 - 178}{6} = 1.5$ . This means the height is $1.5$ standard deviations above the population mean.

Proportion Calculations with Standard Normal Distribution

Techniques for finding proportions using the $Z$ transformation process:
- Sketch the normal curve, shading the area required for the computation. Use properties and known values from the standard normal table to derive answers.

Example problems

For P(-1 < Z < 1) the probability is approximately $0.68$ .
For P(Z > 2), details of symmetry can be used for computation leading to $0.025$ through the left area corresponding to Z < -2.

Backward Normal: Finding z-values

Methods of determining values $z$ corresponding to known proportions: find desired values in tables and transforming as necessary based on the projected proportions.

Finding Percentiles & Quantile Ranges

Steps to find specific values corresponding to given percentiles, involving locating values through known standard proportion tables and computation adjustments due to existing symmetry.
- Example: Determine the interquartile range of a standard normal distribution by finding $Q1$ and $Q3$ .

Conclusion

Summary of how to approach finding areas under the normal curve, how to transform and find corresponding mean and standard deviations, as well as predicting values relative to targeted proportions. Examples provided illustrate practical applications of these concepts within various contexts, including the evaluation of normal variables methodology outlined.

Next Steps: Unit 05 - Probability & Sampling Distribution

Probability (Unit 05)

Definition: Probability quantifies the likelihood of an event occurring. It is a value between $0$ and $1$ (or $0\%$ and $100\%$ ).
- $P(\text{event}) = \frac{\text{Number of favorable outcomes}}{\text{Total number of possible outcomes}}$
Key Concepts:
- Experiment: A process that leads to well-defined outcomes.
- Outcome: A single possible result of an experiment.
- Sample Space ( $S$ ): The set of all possible outcomes of an experiment.
- Event: A subset of the sample space; a collection of one or more outcomes.

Rules of Probability

Rule 1: Probability Range: For any event $A$ , $0 \le P(A) \le 1$ .
Rule 2: Sum of Probabilities: The sum of probabilities of all possible outcomes in a sample space is $1$ . $P(S) = 1$ .
Rule 3: Complement Rule: The probability that an event $A$ does not occur is $P(A^c) = 1 - P(A)$ .
Rule 4: Addition Rule for Disjoint Events: If two events $A$ and $B$ are disjoint (mutually exclusive), meaning they cannot occur at the same time, then $P(A \text{ or } B) = P(A) + P(B)$ .
- Example: Rolling a $1$ or a $6$ on a single die roll.
Rule 5: General Addition Rule: For any two events $A$ and $B$ (not necessarily disjoint), $P(A \text{ or } B) = P(A) + P(B) - P(A \text{ and } B)$ .
- $P(A \text{ and } B)$ is the probability that both $A$ and $B$ occur.
Rule 6: Multiplication Rule for Independent Events: If two events $A$ and $B$ are independent, meaning the occurrence of one does not affect the occurrence of the other, then $P(A \text{ and } B) = P(A) \cdot P(B)$ .
- Example: Flipping a coin twice and getting heads both times.
Rule 7: Conditional Probability: The probability of event $B$ occurring given that event $A$ has already occurred is denoted as $P(B|A)$ .
- $P(B|A) = \frac{P(A \text{ and } B)}{P(A)}$ , provided P(A) > 0.
Rule 8: General Multiplication Rule: For any two events $A$ and $B$ (not necessarily independent), $P(A \text{ and } B) = P(A) \cdot P(B|A)$ .

Random Variables

Definition: A random variable is a numerical outcome of a random phenomenon.
Types of Random Variables:
- Discrete Random Variable: A random variable that can take on a finite or countably infinite number of values. These values are often integers.
  - Examples: Number of heads in 3 coin flips ( $0, 1, 2, 3$ ), number of cars passing a certain point in an hour.
- Continuous Random Variable: A random variable that can take on any value within a given range.
  - Examples: Height, weight, temperature, time. (As discussed with density curves).

Probability Distributions for Discrete Random Variables

A probability distribution for a discrete random variable lists all possible values the variable can take and their corresponding probabilities.
Properties:
- $0 \le P(X=x) \le 1$ for each possible value $x$ .
- $\sum P(X=x) = 1$ (The sum of