Exam Preparation Notes on Distribution of Sample Means
Announcements
- Exam scores will be visible on Canvas soon.
- Mean, median, and standard deviation (SD) will be announced (TBA).
- Exam results (missed questions) can be viewed during office hours or in discussion sections, not SI sessions.
- Overall course grade will be curved; a calculator will be posted after Exam 2.
Academic Achievement - Online Tutoring
- Online tutoring is available for PSYC 60.
- Schedule in-person or online tutoring sessions at the provided URL: https://aah.ucsd.edu/content-tutoring/index.html
Distribution of Sample Means (Chapter 7)
- Based on Chapter 7 in Gravetter & Wallnau.
Probability and Samples
- Initially, the discussion focuses on samples of size 1.
- Example: Given a normal distribution with µ=68 and σ=6, find the probability of selecting a person taller than 80 inches.
- Answer: Convert 80 inches to a z-score and find the proportion greater than that z-score in the unit normal table.
- P(X>80) = 0.0228 (for z = 2.0).
- Most research, however, involves samples with n>1.
- For n = 1, P(X>80) = 0.0228.
- For n = 2, P( (X1 + X2) / 2 > 80 ) = ???
- For n = 100, P( (X1 + X2 + ··· + X_{100}) / 100 > 80 ) = ???
- The probabilities are not equal; therefore, the single-score method must be modified for samples with n>1.
- Sampling error: Natural differences that exist by chance between a sample statistic and a population parameter.
Sampling Error: Review
- Population of UCSD students:
- Population Parameters:
- Average Age = 21.3 years
- Average IQ = 112.5
- 47% female, 48% male, 5% other
- Population Parameters:
- Sample #1: Adam, Brad, Chelsea, Derrick, Elisa
- Sample Statistics:
- Average Age = 19.8
- Average IQ = 104.6
- 40% Female, 60% Male, 0% other
- Sampling Error for #1:
- 19.8 vs. 21.3 years
- 104.6 vs. 112.5 IQ
- 40 vs. 47% female
- 60 vs. 48% male
- 0 vs. 5% other
- Sample Statistics:
- Sample #2: Amy, Bryan, Chris, Deanna, Eric
- Sample Statistics:
- Average Age = 20.4
- Average IQ = 114.2
- 40% Female, 40% Male, 10% other
- Sampling Error for #2:
- 20.4 vs. 21.3 years
- 114.2 vs. 112.5 IQ
- 60 vs. 47% female
- 40 vs. 48% male
- 10 vs. 5% other
- Sample Statistics:
Probability and Samples
- Each independent sample from the population will exhibit some sampling error.
- Key questions:
- How well does a sample (on average) represent the population from which it was drawn?
- How likely is it that we draw a sample with particular characteristics?
Probability and Samples
- Detailed question: Given a population with a set µ and σ, how likely is it to obtain a certain sample mean (M) when we take a sample of size n?
- Many possible samples can be obtained from a given population, each with different individuals, scores, and means.
- These possible samples form an orderly pattern: The Distribution of Sample Means (DSM).
Sample Means
- The distribution of sample means is the collection of the means of all the possible random samples of a particular size (n) that can be obtained from a population.
- This distribution differs from individual score distributions because it is composed of statistics (sample means), not individual scores.
- Referred to as a sampling distribution (or “Sampling distribution of M”).
Sample Means
- Example: A population of 4 scores: 2, 4, 6, 8. (X: 2, 4, 6, 8)
Sample Means
- Construct the distribution of sample means for samples of size n = 2.
- Population (N = 4): 2, 4, 6, 8
- Procedure:
- Write down all 16 possible samples and the sample mean (M) for each.
- Place all the obtained sample means in a frequency distribution and/or histogram.
Sample Values (n=2 from population 2, 4, 6, 8)
- Population: 2, 4, 6, 8
- Number of possible samples = N^n = 4^2 = 16
- Shows a table of all 16 samples, first score, second score, and sample mean.
Sample Means
- Things to note about the distribution:
- Mean of sample means = mean of population.
- Shape looks (sort of) normal.
- This distribution can be used to answer questions about probabilities of sample means.
- µ = 5
Sample Means
- We can use this distribution to answer questions about probabilities of sample means.
- If you take a sample of n = 2 scores from the original population, what is the probability of obtaining a sample mean greater than 6?
- In symbols: p(M > 6) = ?
- Probability = 3/16 = 0.1875 (3 of the 16 possible sample means are greater than 6)
Central Limit Theorem
- What about situations with larger populations and larger samples where calculating all possible sample means is unrealistic?
- Use the Central Limit Theorem:
- For any population with mean µ and standard deviation σ, the distribution of sample means for sample size n will have a mean of µ, a standard deviation of σ /
√n, and will approach a normal distribution as n approaches infinity.
Central Limit Theorem
- In table form:
- Original Population (OP)
- Distribution of Sample Means (DSM)
- Sample Size (n)
- | | OP | DSM |
- | :---- | :----- | :----------- |
- | Mean | µ | µ_M = µ |
- | S.D. | σ | σ_M = σ / √n |
- | Shape | any | normal if n >=30 or normal always normal |
Central Limit Theorem: Mean of DSM
- Mean of the distribution of sample means is µ_M and always has a value equal to the mean of the population of scores, µ.
- Mean of the distribution of sample means (µ_M) is called the expected value of M.
- M is an unbiased statistic because µ_M, the expected value of M, is equal to the population mean, µ.
Central Limit Theorem: S.D. of DSM
- Variability of a distribution of scores is measured by the standard deviation (σ).
- Variability of a distribution of sample means is measured by the standard deviation of the sample means, and is called the standard error of M and written as σ_M.
- In journal articles or other textbooks, the standard error of M might be identified as “standard error,” “SE,” or “SEM”.
Central Limit Theorem
- Standard deviation: standard distance between a score X and the population mean µ.
- Standard error: standard distance between a sample mean M and the mean of the distribution of sample means µ_M.
Standard Error: σ_M = σ / √n
- Magnitude determined by two factors.
- Size of sample
- Law of large numbers: as the sample size increases, the error between the sample mean and the population mean should decrease.
- Population standard deviation:
- Standard deviation is “starting point” for standard error.
- n=1: σ_M = σ
- n>1: σ_M < σ
- The smaller the population variance (S.D.), the less error between M and µ.
- Size of sample
“Law of Large Numbers”
- The larger a sample, the better its mean approximates the mean of the population (and thus the smaller σ_M will be).
- For a normal population with µ = 10, σ = 5, take 100 samples of size n, and plot the means of those 100 samples.
Relationship between S.E. and n
- Standard Error as a function of sample size.
- The graph illustrates how standard error decreases as sample size (n) increases, given σ = 10.
- Standard distance between a sample mean and the population mean.
Population variance
- The smaller the population variance, the better the sample mean (M) approximates the population mean (µ) (and thus the smaller σ_M will be).
Central Limit Theorem: Shape of DSM
- The DSM is almost perfectly normal if either of the following two conditions are satisfied:
- The population from which the samples are drawn is normal.
- OR
- The number scores in each sample (n) is 30 or more.
Central Limit Theorem Proof: DSM Shape
- Illustrates how the shape of the distribution of sample means (DSM) approaches a normal distribution as the sample size (n) increases.
- Several graphs demonstrate the transition from non-normal to normal distributions as 'n' goes up.
Central Limit Theorem Proof: DSM Shape
- Illustrates how the shape of the distribution of sample means (DSM) approaches a normal distribution as the sample size (n) increases for different populations.
- Several graphs demonstrate the transition from non-normal to normal distributions as 'n' goes up for different populations and their corresponding DSM.
Central Limit Theorem Proof: DSM Shape
- The mean of each sampling distribution is equal to 0.94
- Population Parameters given by: μ = 0.94 and σ = 0.05
- The standard deviation of each DSM gets smaller as n gets larger
- When n = 2 : μx = 0.94 and σx = 0.038
- When n = 4 : μx = 0.94 and σx = 0.027
- When n = 16 : μx = 0.94 and σx = 0.013
- When n = 64 : μx = 0.94 and σx = 0.006
The Distribution Triad: Population, Single Sample, and DSM
- (a) Original population of IQ scores.
- μ = 100
- σ = 15
- (b) A sample of n = 25 IQ scores.
- M = 101.2
- S=11.5
- (c) The distribution of sample means. Sample means for all the possible random samples of n = 25 IQ scores.
Interactive learning!
- https://shiny.rit.albany.edu/stat/sampdist/
Learning Check
- Question 1: A population of unknown shape has a mean of μ = 60 with σ = 5. The mean of the distribution of sample means for samples of size n = 4 selected from this population would have a value of .
- A) 5
- B) 15
- C) 30
- D) 60
- Question 2: A population of unknown shape has a mean of μ = 60 with σ = 5. The distribution of sample means for samples of size n = 4 selected from this population would have a standard deviation of .
- A) 1.25
- B) 2.5
- C) 5
- D) 15
- Question 3: A population of unknown shape has a mean of μ = 60 with σ = 5. The shape of the distribution of sample means for samples of size n = 4 selected from this population would be .
- A) Normal
- B) Positively Skewed
- C) Negatively Skewed
- D) Cannot determine from the information given
- Use the password provided by Dr. Lowe to fill out the Canvas survey: Chapter 7 Question Set 1
- Password: quentin
Central Limit Theorem: Review
- Original Population = OP
- Distribution of Sample Means = DSM
- Sample Size = n
- | | OP | DSM |
- | :---- | :----- | :----------- |
- | Mean | µ | µ_M = µ |
- | S.D. | σ | σ_M = σ / √n |
- | Shape | any | normal if n >=30 or normal always normal |
- Put this information on your cheat sheet!!!!!
Probability and the Distribution of Sample Means
- We can use the distribution of sample means to find out probabilities (= proportions!).
- For example: Given a population, how likely is it to obtain a sample of size n with a certain M?
Probability and the Sample Means
- Single score vs Sample mean
- z = (X - µ) / σ
- X = score
- µ = population mean
- σ = Standard Dev.
- Interpretation: Given a population, how likely is it to obtain a score with a certain value X?
- Or…proportion of individuals within our population with a score of X.
- z = (M - µM) / σM
- M = sample mean
- µ_M = mean of DSM ( = µ)
- σ_M = Standard Error
- Interpretation: Given a population, how likely is it to obtain a sample of size n with a certain M?
- Or…proportion of samples (with size n) with that mean out of all the total possible samples (with size n) from our population.
Probability and the Sample Means
- Example:
- SAT-scores (normal, μ=500, σ=100).
- Take a sample n=25.
- What is p(M>540)?
Example:
- SAT scores are normally distributed and have µ = 500, σ =100
- Probability of drawing one score > 540?
- Probability of drawing one sample mean > 540? when n = 36
- Probability of drawing one sample mean > 540? when n = 100
Probability and the Sample Means
- Another Example: SAT-scores (normal, µ=500, σ=100). Take sample n=25. What range of values for M can be expected 80% of the time (in other words, what are the boundaries of the middle 80%)?
Review: Sampling Error
- There will almost always be discrepancy between a sample mean and the true population mean
- This discrepancy is called sampling error
- The amount of sampling error varies across samples
- The variability of sampling error is measured by the standard error of the mean
Review: Standard Error & n
- The standard error tells us how much error, on average, should exist between a sample mean and the population mean.
- As the sample size n increases, the standard error decreases.
Importance of large sample sizes
- A meme is presented to demonstrate the importance of sample size.
In the Literature
- Journals vary in how they refer to the standard error but frequently use:
- “SE”
- “SEM”
- Often reported in a table along with n and M for the different groups in the experiment
- Example table:
- | Group | n | Mean | SE |
- | :---- | :- | :---- | :--- |
- | A | 17 | 32.23 | 2.31 |
- | B | 15 | 45.17 | 2.78 |
In the Literature
- Graphs often include error bars representing standard error.
- Example: Mscore (±SE), M number of mistakes (±SE)
Using the Standard Error
- The standard error can help us decide which of the two alternatives is more likely.
- Imagine an experiment:
- Difference between sample and population:
- due to treatment?
- due to sampling error?
- Difference between sample and population:
Using the Standard Error
- 95% of all the possible sample means (from the untreated population!) for n = 25 fall between 392.16 and 407.84.
- Suppose we take a sample of n = 25 rats and obtain M = 404. Did the growth hormone work?
- M = 404
Using the Standard Error
- 95% of all the possible sample means (from the untreated population!) for n = 25 fall between 392.16 and 407.84.
- Suppose we take a sample of n = 25 rats and obtain M = 409. Did the growth hormone work?
- M = 409
Learning Check
- Question 1: A population forms a normal distribution with μ = 80 and σ = 20. If a single score is selected from this population, how much distance, on average, would you expect between the score and the population mean?
- A) 0
- B) 5
- C) 10
- D) 20
- E) 80
- Question 2: A population forms a normal distribution with μ = 80 and σ = 20. If a sample of n = 4 is selected from this population, how much distance, on average, would you expect between the sample mean and the population mean?
- A) 0
- B) 5
- C) 10
- D) 20
- E) 80
- Use the password provided by Dr. Lowe to fill out the Canvas survey: Chapter 7 Question Set 2
- Password: gone2
More Examples…
- For a population mean of µ = 70 and a standard deviation of σ = 20, how much error, on average, would you expect between the sample mean (M) and the population mean for each of the following sample sizes?
- n = 4
- n = 16
- n = 25
More Examples…
- For a population with σ = 12, how large a sample is necessary to have a standard error that is:
- less than 4 points?
- less than 3 points?
- less than 2 points?
More Examples…
- A normal distribution has a mean of µ = 54 and a standard deviation of σ = 6.
- What is the probability of randomly selecting a score less than X = 51?
- What is the probability of selecting a sample of n = 4 scores with a mean less than M = 51?
- What is the probability of selecting a sample of n = 36 scores with a mean less than M = 51?