Lecture 18 - Comprehensive Study Notes on Normal Distribution, Empirical Rules, and Statistical Inference
Review of Normal Distribution and Deductive Logic
Conceptual Framework: Deductive logic in statistics involves taking general knowledge about a population (parameters) to determine the probability of specific events (observations).
Scenario Background: Observations of year nine javelin throws follow a normal distribution with the following parameters: * Population Mean (): * Standard Deviation ():
The Quantitative Algorithm: There is a three-step instruction set for solving normal distribution problems: 1. Draw it: Sketch the normal distribution, mark the center (), define the spread (), and identify the observation ( ). 2. Calculate the Z-score: Determine the standardized score to see how many standard deviations the observation is from the mean. 3. Use the Probability Table: Look up the Z-score in the statistical tables to find the percentile or probability.
Standardized Calculations for Specific Javelin Throws
Point Observation (): * Determining the percentile for a throw of (the probability of throwing or less). * Z-score Calculation: * Interpretation: The throw is exactly standard deviations above the population mean. * Table Lookup: Using the positive Z-value table, locating the row for and column for yields a probability of . * Result: This throw is in the percentile.
Point Observation (): * The speaker's nephew, Nico, throws consistently at . * The process remains the same: Draw the curve, compute the negative Z-score, and find the lower percentile in the tables.
Tail Probability (Greater than Case): * Task: Find the probability that a student throws further than . * Logic: Since the total area under the normal curve equals , the probability of a value being higher than an observation is calculated as . * Z-score Calculation: * Probability: The table value for is . The final answer is .
Probability Between Two Observations
Problem Statement: What is the probability that a randomly selected student throws between and ?
Strategic Approach: Captured by the formula P(20 < X < 28). * The goal is to find the area under the curve between two points. This is achieved by taking the probability of the larger value (everything to the left of ) and subtracting the probability of the smaller value (everything to the left of ).
Step-by-Step Execution: 1. Z-score for 28: Corresponding Probability (): 2. Z-score for 20: Corresponding Probability () from the negative table: 3. Subtraction:
Conclusion: There is a probability that a throw falls between and .
The Inverse Normal Problem
Definition: An inverse problem occurs when the percentile/probability is provided, and the task is to find the corresponding value of the observation ().
Example Scenario: If a student is in the percentile, how far did they throw the javelin?
Algorithm: 1. Draw it: Identify the mark (the mean) and shade from the left until reaching approximately . 2. Table Search: Look into the body of the probability tables for the value closest to . In the positive table, a probability of approximately corresponds to a Z-score of . 3. Algebraic Rearrangement: Use the Z-score formula solved for : 4. Calculation:
Note on Algebra: The formula rearrangement involves multiplying both sides by and adding to isolate .
The Empirical Rule (68-95-99.7 Rule)
Core Principle: Because all normal distributions share the same shape, they can all be transformed into a standard normal distribution (, ). This allows for a constant set of probabilities: * Within Standard Deviation: Approximately () of all values fall within . * Within Standard Deviations: Approximately of all values fall within . * Within Standard Deviations: Approximately of all values fall within .
Outlier Definition: A measurement is considered an "extreme outlier" if it falls more than three standard deviations above or below the mean (a probability of roughly in ).
Extreme Ranges: Four standard deviations cover of the distribution, making such an occurrence a in chance.
Introduction to Statistical Inference
The Key Goal: Moving from the world of statistics (sample data) to estimating population parameters. This is the "money" in statistical data analysis.
Sampling Error: Error that arises inherently because a sample is measured instead of the entire population. It is an unavoidable uncertainty that statistics seeks to quantify.
Inference Definition: Methods and procedures for forming judgments and estimating population parameters using sample statistics calculate from random samples.
Confidence and Credibility: Statistics does not give a single certain answer; it provides an estimate with a quantified level of confidence or uncertainty.
Fundamental Terminology and Taxonomy
Population: The entire set of units for which we measure a trait. * Example: All New Zealand citizens eligible to vote; all stars in the Milky Way galaxy.
Parameter: A characteristic of the population. * Assumptions: Parameters are considered fixed and unknown (unless a full census is taken). * Example: The actual proportion of all voters choosing the Labor party.
Sample: A collection of units or a subset taken from the population of interest. * Example: A Colmar Brunson poll of New Zealand voters; stars visible via binoculars.
Statistic: A characteristic or feature of a sample. * Nature: Statistics are not fixed; they vary from sample to sample. * Example: The percentage in a specific sample of people who vote Labor.
Population Parameters: Means and Proportions
Notation Convention: Greek letters are used for population parameters (to look "fancy and smart") while Roman letters/hats are used for statistics.
Population Mean (): * Formula for units: * Calculable only if every measurement in the population is known.
Sample Mean (): * Formula for units: * Used to estimate .
Population Proportion (): * Symbol used is the Greek letter pi (). Note: This is notation, not the numeric constant . * Structurally identical to a mean, where responses are coded as (yes/success) or (no/failure).
Sample Proportion (): * Referred to as "P-hat." Used when dealing with categorical data in a sample.
Questions & Discussion
Question: How can I capture the area between two observations on the graph?
Response: The strategy is to find the probability of the larger value and subtract the probability of the smaller value. This "takes out" the unshaded chunk you don't need.
Question: What is the Z-score for the throw of ?
Response: . The observation () minus the mean () equals . divided by the standard deviation () equals . It is important to express this as for table lookup purposes.
Question: How do we rearrange the Z-score formula to find an observation?
Response: Multiple by the denominator () to get . Then add the mean () to both sides to solve for .