A sampling distribution was created by taking all possible samples of size n=2.
The average of all the averages from these samples was 75.
Sampling Distribution
The sampling distribution of the means is the histogram created from the means of all possible samples.
The question posed is: what would the histogram of all the sample proportions look like?
Do Now: Cereal Experiment
Students are instructed to take a small cup and scoop a handful of cereal.
They are to sample 5 pieces of cereal from their cup and record the proportion of marshmallows.
This is repeated for a sample of size 20, and if time permits, for a sample of size 50.
Simulating with Lucky Charms
General Mills claims that about 30% of Lucky Charms cereal pieces are marshmallows.
The question is whether the student samples match this claim.
Sampling Distribution of Proportions
The sampling distribution of the proportions is the histogram that would result from looking at all the proportions from all possible samples.
The question is reiterated: what would this histogram look like?
Modeling the Distribution of Sample Proportions
A sampling distribution model allows quantification of variation in sample proportions from sample to sample.
It helps determine how likely it is to observe a sample proportion in a particular interval.
To use a Normal model, the mean ($\mu$) is set at p.
Binomial Probability Model
Review of the Binomial Probability Model for Bernoulli Trials:
Binom(n, p)
n = number of trials
p = probability of success
q = 1 - p = probability of failure
X = number of successes in n trials
Mean: \mu = np
Standard Deviation: \sigma = \sqrt{npq}
Modeling the Distribution of Sample Proportions (cont.)
The standard deviation used is: \sqrt{\frac{pq}{n}}
The distribution of sample proportions is modeled with a probability model that is approximately Normal.
Sampling Distribution Model for a Sample Proportion
Provided sampled values are independent and the sample size is large enough, the sampling distribution of sample proportion \hat{p} can be reasonably modeled by a Normal model.
Mean: \mu(\hat{p}) = p
Standard deviation: SD(\hat{p}) = \sqrt{\frac{pq}{n}}
Sampling Distribution for Cereal Pieces
Sampling distribution for n=5 cereal pieces is visualized (histogram).
Sampling distribution for n=20 cereal pieces is visualized (histogram).
Central Limit Theorem
The sampling distribution of any sample becomes more nearly Normal as the sample size grows, given independent observations collected with randomization.
The shape of the original population does not matter.
Central Limit Theorem: The mean of a random sample is a random variable whose sampling distribution can be approximated by a normal model. The larger the sample, the better the approximation.
Comparing Sampling Distributions
Graphs I and II represent sampling distributions of the sample mean for the same random variable but with different sample sizes.
The sample size of I is less than the sample size of II.
How Good Is the Normal Model?
The Normal model improves as a model for the distribution of sample proportions as the sample size increases.
Conditions: np \geq 10 and nq \geq 10
Conditions and Assumptions
Randomization Condition: Data from an experiment should have subjects randomly assigned to treatments. Surveys should use simple random samples. The sampling method should be unbiased and representative of the population.
Independence Assumption: Individuals in the sample must be independent of each other.
10% Condition: If the sample represents a large fraction (more than 10%) of the population, the Normal model may not be reasonable.
Sample Size Assumption: The sample size, n, must be large enough.
Success/Failure Condition: np \geq 10 and nq \geq 10
Conditions to Check
Randomization:
Random Condition: Respondents were randomly selected OR experimental treatments randomly assigned OR people are representative of population.
Independent:
Sampled Independently OR 10% Condition: Sample was less than 10% of the entire population.
Sample Size Condition:
np \geq 10 and nq \geq 10
If these are met, a Normal model can be used.
Empirical Rule
Because we have a Normal model, approximately 68% of Normally distributed values are within one standard deviation of the mean and approximately 95% are within two standard deviations of the mean.
Consequently, we would not be surprised if 95% of various polls gave results near the mean, varying by no more than two standard deviations.
Sampling Error
Sampling error refers to the variability expected from one sample to another. A better term would be sampling variability.
Multiple Choice Practice
Question about the purpose of doubling the sample size in polls two weeks before an election. The main purpose is to decrease the standard deviation of the sampling distribution of the sample proportion.
Sampling Distribution Model for a Proportion
Provided that the sampled values are independent and the sample size is large enough, the sampling distribution of \hat{p} is modeled by a Normal model with:
Mean: p
Standard deviation: SD = \sqrt{\frac{pq}{n}} = \sqrt{\frac{p(1-p)}{n}}.
Example: High School Diploma
According to the US Census Bureau, 87% of Americans over 25 have a high school diploma.
Suppose a random sample of 200 Americans in this age group is taken to calculate the proportion with a high school diploma.
What is the probability that the proportion of people in the sample with a high school diploma is less than 85%?
Checking Conditions for Normal Model
Check if np \geq 10 and nq \geq 10.
Calculating Mean and Standard Deviation
\mu = p
\sigma = \sqrt{\frac{pq}{n}}
Calculating Probability
Calculate P(\hat{p} < .85) using normalcdf.
Reptilian Overlords Example
In a 2013 poll, 4% of American voters believe that shape-shifting reptilian people control the world.
Assume that is the actual proportion.
Use a Normal model to calculate the probability of finding a sample where 6 or more of 100 people believe this.
Checking Conditions for Normal Model
Check if np \geq 10 and nq \geq 10.
Calculating Mean and Standard Deviation
\mu = p
\sigma = \sqrt{\frac{pq}{n}}
Calculating Probability
Calculate P(\hat{p} > .06) using normalcdf.
JFK Conspiracy Example
The same poll found that 51% of American voters believe there was a larger conspiracy responsible for the assassination of President Kennedy.
Use a Normal model to calculate the probability that, in a random sample of 100 people, at least 57% believe in the JFK conspiracy theory.
Checking Conditions for Normal Model
Check if np \geq 10 and nq \geq 10.
Calculating Mean and Standard Deviation
\mu = p
\sigma = \sqrt{\frac{pq}{n}}
Calculating Probability
Calculate P(\hat{p} \geq .57) using normalcdf.
Extra Practice: BMI Example
Is the percentage of students with BMI 25 or more unusually low?
Checking Conditions for BMI Example
Randomization Condition: Random sample, so respondents should be independent and randomly selected.
10% Condition: 200 respondents is less than 10% of all female students.
Success/Failure Condition: np = 200(0.22) = 44 and nq = 200(0.78) = 156, both at least 10.
It's okay to use a Normal model.
Analyzing the BMI Proportion
The phys ed department observed \hat{p} = \frac{31}{200} = 0.155.
Department expected E(\hat{p}) = p = 0.22, with SD(\hat{p}) = \sqrt{\frac{(0.22)(0.78)}{200}} = 0.029.
z = \frac{0.155 - 0.22}{0.029} = -2.24.
Values more than 2 standard deviations below the mean of a Normal model show up less than 2.5% of the time.
This suggests women at this college may differ from the general population, or self-reporting may not provide accurate heights and weights.