Lecture 18 - Comprehensive Study Notes on Normal Distribution, Empirical Rules, and Statistical Inference

Review of Normal Distribution and Deductive Logic

Conceptual Framework: Deductive logic in statistics involves taking general knowledge about a population (parameters) to determine the probability of specific events (observations).
Scenario Background: Observations of year nine javelin throws follow a normal distribution with the following parameters: * Population Mean ( $\mu$ ): $24\,m$ * Standard Deviation ( $\sigma$ ): $4\,m$
The Quantitative Algorithm: There is a three-step instruction set for solving normal distribution problems: 1. Draw it: Sketch the normal distribution, mark the center ( $\mu = 24$ ), define the spread ( $\sigma = 4$ ), and identify the observation ( $x = 33$ ). 2. Calculate the Z-score: Determine the standardized score to see how many standard deviations the observation is from the mean. 3. Use the Probability Table: Look up the Z-score in the statistical tables to find the percentile or probability.

Standardized Calculations for Specific Javelin Throws

Point Observation ( $33\,m$ ): * Determining the percentile for a throw of $33\,m$ (the probability of throwing $33\,m$ or less). * Z-score Calculation: $Z = \frac{x - \mu}{\sigma}$ $Z = \frac{33 - 24}{4} = \frac{9}{4} = 2.25$ * Interpretation: The throw is exactly $2.25$ standard deviations above the population mean. * Table Lookup: Using the positive Z-value table, locating the row for $2.2$ and column for $0.05$ yields a probability of $0.9878$ . * Result: This throw is in the $98.78^{th}$ percentile.
Point Observation ( $22\,m$ ): * The speaker's nephew, Nico, throws consistently at $22\,m$ . * The process remains the same: Draw the curve, compute the negative Z-score, and find the lower percentile in the tables.
Tail Probability (Greater than Case): * Task: Find the probability that a student throws further than $28\,m$ . * Logic: Since the total area under the normal curve equals $1$ , the probability of a value being higher than an observation is calculated as $1 - \text{percentile}$ . * Z-score Calculation: $Z = \frac{28 - 24}{4} = 1.00$ * Probability: The table value for $Z = 1.00$ is $0.8413$ . The final answer is $1 - 0.8413 = 0.1587$ .

Probability Between Two Observations

Problem Statement: What is the probability that a randomly selected student throws between $20\,m$ and $28\,m$ ?
Strategic Approach: Captured by the formula P(20 < X < 28). * The goal is to find the area under the curve between two points. This is achieved by taking the probability of the larger value (everything to the left of $28$ ) and subtracting the probability of the smaller value (everything to the left of $20$ ).
Step-by-Step Execution: 1. Z-score for 28: $Z_1 = \frac{28 - 24}{4} = 1.00$ Corresponding Probability ( $P_1$ ): $0.8413$ 2. Z-score for 20: $Z_2 = \frac{20 - 24}{4} = -1.00$ Corresponding Probability ( $P_2$ ) from the negative table: $0.1587$ 3. Subtraction: $0.8413 - 0.1587 = 0.6826$
Conclusion: There is a $68.26\%$ probability that a throw falls between $20\,m$ and $28\,m$ .

The Inverse Normal Problem

Definition: An inverse problem occurs when the percentile/probability is provided, and the task is to find the corresponding value of the observation ( $x$ ).
Example Scenario: If a student is in the $67^{th}$ percentile, how far did they throw the javelin?
Algorithm: 1. Draw it: Identify the $50\%$ mark (the mean) and shade from the left until reaching approximately $0.67$ . 2. Table Search: Look into the body of the probability tables for the value closest to $0.6700$ . In the positive table, a probability of approximately $0.67$ corresponds to a Z-score of $0.44$ . 3. Algebraic Rearrangement: Use the Z-score formula solved for $x$ : $x = (Z \times \sigma) + \mu$ 4. Calculation: $x = (0.44 \times 4) + 24$ $x = 1.76 + 24 = 25.76\,m$
Note on Algebra: The formula rearrangement involves multiplying both sides by $\sigma$ and adding $\mu$ to isolate $x$ .

The Empirical Rule (68-95-99.7 Rule)

Core Principle: Because all normal distributions share the same shape, they can all be transformed into a standard normal distribution ( $\mu = 0$ , $\sigma = 1$ ). This allows for a constant set of probabilities: * Within $1$ Standard Deviation: Approximately $68\%$ ( $68.26\%$ ) of all values fall within $\mu \pm 1\sigma$ . * Within $2$ Standard Deviations: Approximately $95\%$ of all values fall within $\mu \pm 2\sigma$ . * Within $3$ Standard Deviations: Approximately $99.7\%$ of all values fall within $\mu \pm 3\sigma$ .
Outlier Definition: A measurement is considered an "extreme outlier" if it falls more than three standard deviations above or below the mean (a probability of roughly $3$ in $1,000$ ).
Extreme Ranges: Four standard deviations cover $99.99\%$ of the distribution, making such an occurrence a $1$ in $10,000$ chance.

Introduction to Statistical Inference

The Key Goal: Moving from the world of statistics (sample data) to estimating population parameters. This is the "money" in statistical data analysis.
Sampling Error: Error that arises inherently because a sample is measured instead of the entire population. It is an unavoidable uncertainty that statistics seeks to quantify.
Inference Definition: Methods and procedures for forming judgments and estimating population parameters using sample statistics calculate from random samples.
Confidence and Credibility: Statistics does not give a single certain answer; it provides an estimate with a quantified level of confidence or uncertainty.

Fundamental Terminology and Taxonomy

Population: The entire set of units for which we measure a trait. * Example: All New Zealand citizens eligible to vote; all stars in the Milky Way galaxy.
Parameter: A characteristic of the population. * Assumptions: Parameters are considered fixed and unknown (unless a full census is taken). * Example: The actual proportion of all voters choosing the Labor party.
Sample: A collection of units or a subset taken from the population of interest. * Example: A Colmar Brunson poll of $1,000$ New Zealand voters; stars visible via binoculars.
Statistic: A characteristic or feature of a sample. * Nature: Statistics are not fixed; they vary from sample to sample. * Example: The percentage in a specific sample of $1,000$ people who vote Labor.

Population Parameters: Means and Proportions

Notation Convention: Greek letters are used for population parameters (to look "fancy and smart") while Roman letters/hats are used for statistics.
Population Mean ( $\mu$ ): * Formula for $N$ units: $\mu = \frac{1}{N} \sum_{i=1}^{N} X_i$ * Calculable only if every measurement in the population is known.
Sample Mean ( $\bar{x}$ ): * Formula for $n$ units: $\bar{x} = \frac{1}{n} \sum_{i=1}^{n} X_i$ * Used to estimate $\mu$ .
Population Proportion ( $\pi$ ): * Symbol used is the Greek letter pi ( $\pi$ ). Note: This is notation, not the numeric constant $3.14159$ . * Structurally identical to a mean, where responses are coded as $1$ (yes/success) or $0$ (no/failure).
Sample Proportion ( $\hat{p}$ ): * Referred to as "P-hat." Used when dealing with categorical data in a sample.

Questions & Discussion

Question: How can I capture the area between two observations on the graph?
Response: The strategy is to find the probability of the larger value and subtract the probability of the smaller value. This "takes out" the unshaded chunk you don't need.
Question: What is the Z-score for the throw of $28\,m$ ?
Response: $1.00$ . The observation ( $28$ ) minus the mean ( $24$ ) equals $4$ . $4$ divided by the standard deviation ( $4$ ) equals $1$ . It is important to express this as $1.00$ for table lookup purposes.
Question: How do we rearrange the Z-score formula to find an observation?
Response: Multiple by the denominator ( $\sigma$ ) to get $Z \times \sigma = x - \mu$ . Then add the mean ( $\mu$ ) to both sides to solve for $x$ .