Exam Preparation Notes on Distribution of Sample Means

Announcements

Exam scores will be visible on Canvas soon.
Mean, median, and standard deviation (SD) will be announced (TBA).
Exam results (missed questions) can be viewed during office hours or in discussion sections, not SI sessions.
Overall course grade will be curved; a calculator will be posted after Exam 2.

Academic Achievement - Online Tutoring

Online tutoring is available for PSYC 60.
Schedule in-person or online tutoring sessions at the provided URL: https://aah.ucsd.edu/content-tutoring/index.html

Distribution of Sample Means (Chapter 7)

Based on Chapter 7 in Gravetter & Wallnau.

Probability and Samples

Initially, the discussion focuses on samples of size 1.
Example: Given a normal distribution with $µ=68$ and $σ=6$ , find the probability of selecting a person taller than 80 inches.
Answer: Convert 80 inches to a z-score and find the proportion greater than that z-score in the unit normal table.
P(X>80) = 0.0228 (for $z = 2.0$ ).
Most research, however, involves samples with n>1.
For $n = 1$ , P(X>80) = 0.0228.
For $n = 2$ , P( (X1 + X2) / 2 > 80 ) = ???
For $n = 100$ , P( (X1 + X2 + ··· + X_{100}) / 100 > 80 ) = ???
The probabilities are not equal; therefore, the single-score method must be modified for samples with n>1.
Sampling error: Natural differences that exist by chance between a sample statistic and a population parameter.

Sampling Error: Review

Population of UCSD students:
- Population Parameters:
  - Average Age = 21.3 years
  - Average IQ = 112.5
  - 47% female, 48% male, 5% other
Sample #1: Adam, Brad, Chelsea, Derrick, Elisa
- Sample Statistics:
  - Average Age = 19.8
  - Average IQ = 104.6
  - 40% Female, 60% Male, 0% other
- Sampling Error for #1:
  - 19.8 vs. 21.3 years
  - 104.6 vs. 112.5 IQ
  - 40 vs. 47% female
  - 60 vs. 48% male
  - 0 vs. 5% other
Sample #2: Amy, Bryan, Chris, Deanna, Eric
- Sample Statistics:
  - Average Age = 20.4
  - Average IQ = 114.2
  - 40% Female, 40% Male, 10% other
- Sampling Error for #2:
  - 20.4 vs. 21.3 years
  - 114.2 vs. 112.5 IQ
  - 60 vs. 47% female
  - 40 vs. 48% male
  - 10 vs. 5% other

Probability and Samples

Each independent sample from the population will exhibit some sampling error.
Key questions:
- How well does a sample (on average) represent the population from which it was drawn?
- How likely is it that we draw a sample with particular characteristics?

Probability and Samples

Detailed question: Given a population with a set $µ$ and $σ$ , how likely is it to obtain a certain sample mean (M) when we take a sample of size n?
Many possible samples can be obtained from a given population, each with different individuals, scores, and means.
These possible samples form an orderly pattern: The Distribution of Sample Means (DSM).

Sample Means

The distribution of sample means is the collection of the means of all the possible random samples of a particular size (n) that can be obtained from a population.
This distribution differs from individual score distributions because it is composed of statistics (sample means), not individual scores.
Referred to as a sampling distribution (or “Sampling distribution of M”).

Sample Means

Example: A population of 4 scores: 2, 4, 6, 8. (X: 2, 4, 6, 8)

Sample Means

Construct the distribution of sample means for samples of size $n = 2$ .
Population ( $N = 4$ ): 2, 4, 6, 8
Procedure:
1. Write down all 16 possible samples and the sample mean (M) for each.
2. Place all the obtained sample means in a frequency distribution and/or histogram.

Sample Values (n=2 from population 2, 4, 6, 8)

Population: 2, 4, 6, 8
Number of possible samples = $N^n = 4^2 = 16$
Shows a table of all 16 samples, first score, second score, and sample mean.

Sample Means

Things to note about the distribution:
1. Mean of sample means = mean of population.
2. Shape looks (sort of) normal.
3. This distribution can be used to answer questions about probabilities of sample means.
$µ = 5$

Sample Means

We can use this distribution to answer questions about probabilities of sample means.
If you take a sample of $n = 2$ scores from the original population, what is the probability of obtaining a sample mean greater than 6?
In symbols: p(M > 6) = ?
Probability = 3/16 = 0.1875 (3 of the 16 possible sample means are greater than 6)

Central Limit Theorem

What about situations with larger populations and larger samples where calculating all possible sample means is unrealistic?
Use the Central Limit Theorem:
For any population with mean $µ$ and standard deviation $σ$ , the distribution of sample means for sample size n will have a mean of $µ$ , a standard deviation of $σ / </li> </ul> √n$ , and will approach a normal distribution as n approaches infinity.
Central Limit Theorem
- In table form:
 - Original Population (OP)
 - Distribution of Sample Means (DSM)
 - Sample Size (n)
 - | | OP | DSM |
 - | :---- | :----- | :----------- |
 - | Mean | $µ$ | $µ_M = µ$ |
 - | S.D. | $σ$ | $σ_M = σ / √n$ |
 - | Shape | any | normal if n >=30 or normal always normal |
Central Limit Theorem: Mean of DSM
- Mean of the distribution of sample means is $µ_M$ and always has a value equal to the mean of the population of scores, $µ$ .
- Mean of the distribution of sample means ( $µ_M$ ) is called the expected value of M.
- M is an unbiased statistic because $µ_M$ , the expected value of M, is equal to the population mean, $µ$ .
Central Limit Theorem: S.D. of DSM
- Variability of a distribution of scores is measured by the standard deviation ( $σ$ ).
- Variability of a distribution of sample means is measured by the standard deviation of the sample means, and is called the standard error of M and written as $σ_M$ .
- In journal articles or other textbooks, the standard error of M might be identified as “standard error,” “SE,” or “SEM”.
Central Limit Theorem
- Standard deviation: standard distance between a score X and the population mean $µ$ .
- Standard error: standard distance between a sample mean M and the mean of the distribution of sample means $µ_M$ .
Standard Error: $σ_M = σ / √n$
- Magnitude determined by two factors.
 1. Size of sample
 - Law of large numbers: as the sample size increases, the error between the sample mean and the population mean should decrease.
 2. Population standard deviation:
 - Standard deviation is “starting point” for standard error.
 - $n=1: σ_M = σ$
 - n>1: σ_M < σ
 - The smaller the population variance (S.D.), the less error between M and $µ$ .
“Law of Large Numbers”
- The larger a sample, the better its mean approximates the mean of the population (and thus the smaller $σ_M$ will be).
- For a normal population with $µ = 10$ , $σ = 5$ , take 100 samples of size n, and plot the means of those 100 samples.
Relationship between S.E. and n
- Standard Error as a function of sample size.
- The graph illustrates how standard error decreases as sample size (n) increases, given $σ = 10$ .
- Standard distance between a sample mean and the population mean.
Population variance
- The smaller the population variance, the better the sample mean (M) approximates the population mean ( $µ$ ) (and thus the smaller $σ_M$ will be).
Central Limit Theorem: Shape of DSM
- The DSM is almost perfectly normal if either of the following two conditions are satisfied:
 - The population from which the samples are drawn is normal.
 - OR
 - The number scores in each sample (n) is 30 or more.
Central Limit Theorem Proof: DSM Shape
- Illustrates how the shape of the distribution of sample means (DSM) approaches a normal distribution as the sample size (n) increases.
- Several graphs demonstrate the transition from non-normal to normal distributions as 'n' goes up.
Central Limit Theorem Proof: DSM Shape
- Illustrates how the shape of the distribution of sample means (DSM) approaches a normal distribution as the sample size (n) increases for different populations.
- Several graphs demonstrate the transition from non-normal to normal distributions as 'n' goes up for different populations and their corresponding DSM.
Central Limit Theorem Proof: DSM Shape
- The mean of each sampling distribution is equal to 0.94
- Population Parameters given by: $μ = 0.94$ and $σ = 0.05$
- The standard deviation of each DSM gets smaller as n gets larger
 - When n = 2 : $μx = 0.94$ and $σx = 0.038$
 - When n = 4 : $μx = 0.94$ and $σx = 0.027$
 - When n = 16 : $μx = 0.94$ and $σx = 0.013$
 - When n = 64 : $μx = 0.94$ and $σx = 0.006$
The Distribution Triad: Population, Single Sample, and DSM
- (a) Original population of IQ scores.
 - $μ = 100$
 - $σ = 15$
- (b) A sample of n = 25 IQ scores.
 - $M = 101.2$
 - $S=11.5$
- (c) The distribution of sample means. Sample means for all the possible random samples of n = 25 IQ scores.
Interactive learning!
- https://shiny.rit.albany.edu/stat/sampdist/
Learning Check
- Question 1: A population of unknown shape has a mean of $μ = 60$ with $σ = 5$ . The mean of the distribution of sample means for samples of size $n = 4$ selected from this population would have a value of .
 - A) 5
 - B) 15
 - C) 30
 - D) 60
- Question 2: A population of unknown shape has a mean of $μ = 60$ with $σ = 5$ . The distribution of sample means for samples of size $n = 4$ selected from this population would have a standard deviation of .
 - A) 1.25
 - B) 2.5
 - C) 5
 - D) 15
- Question 3: A population of unknown shape has a mean of $μ = 60$ with $σ = 5$ . The shape of the distribution of sample means for samples of size $n = 4$ selected from this population would be .
 - A) Normal
 - B) Positively Skewed
 - C) Negatively Skewed
 - D) Cannot determine from the information given
- Use the password provided by Dr. Lowe to fill out the Canvas survey: Chapter 7 Question Set 1
- Password: quentin
Central Limit Theorem: Review
- Original Population = OP
- Distribution of Sample Means = DSM
- Sample Size = n
- | | OP | DSM |
- | :---- | :----- | :----------- |
- | Mean | $µ$ | $µ_M = µ$ |
- | S.D. | $σ$ | $σ_M = σ / √n$ |
- | Shape | any | normal if n >=30 or normal always normal |
- Put this information on your cheat sheet!!!!!
Probability and the Distribution of Sample Means
- We can use the distribution of sample means to find out probabilities (= proportions!).
- For example: Given a population, how likely is it to obtain a sample of size n with a certain M?
Probability and the Sample Means
- Single score vs Sample mean
- $z = (X - µ) / σ$
 - X = score
 - µ = population mean
 - $σ$ = Standard Dev.
 - Interpretation: Given a population, how likely is it to obtain a score with a certain value X?
 - Or…proportion of individuals within our population with a score of X.
- $z = (M - µM) / σM$
 - M = sample mean
 - $µ_M$ = mean of DSM ( = µ)
 - $σ_M$ = Standard Error
 - Interpretation: Given a population, how likely is it to obtain a sample of size n with a certain M?
 - Or…proportion of samples (with size n) with that mean out of all the total possible samples (with size n) from our population.
Probability and the Sample Means
- Example:
 - SAT-scores (normal, $μ=500, σ=100$ ).
 - Take a sample n=25.
 - What is p(M>540)?
Example:
- SAT scores are normally distributed and have $µ = 500, σ =100$
- Probability of drawing one score > 540?
- Probability of drawing one sample mean > 540? when n = 36
- Probability of drawing one sample mean > 540? when n = 100
Probability and the Sample Means
- Another Example: SAT-scores (normal, $µ=500, σ=100$ ). Take sample $n=25$ . What range of values for M can be expected 80% of the time (in other words, what are the boundaries of the middle 80%)?
Review: Sampling Error
- There will almost always be discrepancy between a sample mean and the true population mean
- This discrepancy is called sampling error
- The amount of sampling error varies across samples
- The variability of sampling error is measured by the standard error of the mean
Review: Standard Error & n
- The standard error tells us how much error, on average, should exist between a sample mean and the population mean.
- As the sample size n increases, the standard error decreases.
Importance of large sample sizes
- A meme is presented to demonstrate the importance of sample size.
In the Literature
- Journals vary in how they refer to the standard error but frequently use:
 - “SE”
 - “SEM”
- Often reported in a table along with n and M for the different groups in the experiment
- Example table:
 - | Group | n | Mean | SE |
 - | :---- | :- | :---- | :--- |
 - | A | 17 | 32.23 | 2.31 |
 - | B | 15 | 45.17 | 2.78 |
In the Literature
- Graphs often include error bars representing standard error.
- Example: Mscore (±SE), M number of mistakes (±SE)
Using the Standard Error
- The standard error can help us decide which of the two alternatives is more likely.
- Imagine an experiment:
 - Difference between sample and population:
 - due to treatment?
 - due to sampling error?
Using the Standard Error
- 95% of all the possible sample means (from the untreated population!) for n = 25 fall between 392.16 and 407.84.
- Suppose we take a sample of n = 25 rats and obtain M = 404. Did the growth hormone work?
- M = 404
Using the Standard Error
- 95% of all the possible sample means (from the untreated population!) for n = 25 fall between 392.16 and 407.84.
- Suppose we take a sample of n = 25 rats and obtain M = 409. Did the growth hormone work?
- M = 409
Learning Check
- Question 1: A population forms a normal distribution with $μ = 80$ and $σ = 20$ . If a single score is selected from this population, how much distance, on average, would you expect between the score and the population mean?
 - A) 0
 - B) 5
 - C) 10
 - D) 20
 - E) 80
- Question 2: A population forms a normal distribution with $μ = 80$ and $σ = 20$ . If a sample of $n = 4$ is selected from this population, how much distance, on average, would you expect between the sample mean and the population mean?
 - A) 0
 - B) 5
 - C) 10
 - D) 20
 - E) 80
- Use the password provided by Dr. Lowe to fill out the Canvas survey: Chapter 7 Question Set 2
- Password: gone2
More Examples…
- For a population mean of $µ = 70$ and a standard deviation of $σ = 20$ , how much error, on average, would you expect between the sample mean (M) and the population mean for each of the following sample sizes?
 - n = 4
 - n = 16
 - n = 25
More Examples…
- For a population with $σ = 12$ , how large a sample is necessary to have a standard error that is:
 - less than 4 points?
 - less than 3 points?
 - less than 2 points?
More Examples…
- A normal distribution has a mean of $µ = 54$ and a standard deviation of $σ = 6$ .
 - What is the probability of randomly selecting a score less than $X = 51$ ?
 - What is the probability of selecting a sample of $n = 4$ scores with a mean less than $M = 51$ ?
 - What is the probability of selecting a sample of $n = 36$ scores with a mean less than $M = 51$ ?

Exam Preparation Notes on Distribution of Sample Means

Announcements

Academic Achievement - Online Tutoring

Distribution of Sample Means (Chapter 7)

Probability and Samples

Sampling Error: Review

Probability and Samples

Probability and Samples

Sample Means

Sample Means

Sample Means

Sample Values (n=2 from population 2, 4, 6, 8)

Sample Means

Sample Means

Central Limit Theorem

Central Limit Theorem

Central Limit Theorem: Mean of DSM

Central Limit Theorem: S.D. of DSM

Central Limit Theorem

Standard Error: σM=σ/√nσ_M = σ / √nσM​=σ/√n

“Law of Large Numbers”

Relationship between S.E. and n

Population variance

Central Limit Theorem: Shape of DSM

Central Limit Theorem Proof: DSM Shape

Central Limit Theorem Proof: DSM Shape

Central Limit Theorem Proof: DSM Shape

The Distribution Triad: Population, Single Sample, and DSM

Interactive learning!

Learning Check

Central Limit Theorem: Review

Probability and the Distribution of Sample Means

Probability and the Sample Means

Probability and the Sample Means

Example:

Probability and the Sample Means

Review: Sampling Error

Review: Standard Error & n

Importance of large sample sizes

In the Literature

In the Literature

Using the Standard Error

Using the Standard Error

Using the Standard Error

Learning Check

More Examples…

More Examples…

More Examples…

Standard Error: $σ_M = σ / √n$