Exam Preparation Notes on Distribution of Sample Means

Announcements

  • Exam scores will be visible on Canvas soon.
  • Mean, median, and standard deviation (SD) will be announced (TBA).
  • Exam results (missed questions) can be viewed during office hours or in discussion sections, not SI sessions.
  • Overall course grade will be curved; a calculator will be posted after Exam 2.

Academic Achievement - Online Tutoring

  • Online tutoring is available for PSYC 60.
  • Schedule in-person or online tutoring sessions at the provided URL: https://aah.ucsd.edu/content-tutoring/index.html

Distribution of Sample Means (Chapter 7)

  • Based on Chapter 7 in Gravetter & Wallnau.

Probability and Samples

  • Initially, the discussion focuses on samples of size 1.
  • Example: Given a normal distribution with µ=68 and σ=6, find the probability of selecting a person taller than 80 inches.
  • Answer: Convert 80 inches to a z-score and find the proportion greater than that z-score in the unit normal table.
  • P(X>80) = 0.0228 (for z = 2.0).
  • Most research, however, involves samples with n>1.
  • For n = 1, P(X>80) = 0.0228.
  • For n = 2, P( (X1 + X2) / 2 > 80 ) = ???
  • For n = 100, P( (X1 + X2 + ··· + X_{100}) / 100 > 80 ) = ???
  • The probabilities are not equal; therefore, the single-score method must be modified for samples with n>1.
  • Sampling error: Natural differences that exist by chance between a sample statistic and a population parameter.

Sampling Error: Review

  • Population of UCSD students:
    • Population Parameters:
      • Average Age = 21.3 years
      • Average IQ = 112.5
      • 47% female, 48% male, 5% other
  • Sample #1: Adam, Brad, Chelsea, Derrick, Elisa
    • Sample Statistics:
      • Average Age = 19.8
      • Average IQ = 104.6
      • 40% Female, 60% Male, 0% other
    • Sampling Error for #1:
      • 19.8 vs. 21.3 years
      • 104.6 vs. 112.5 IQ
      • 40 vs. 47% female
      • 60 vs. 48% male
      • 0 vs. 5% other
  • Sample #2: Amy, Bryan, Chris, Deanna, Eric
    • Sample Statistics:
      • Average Age = 20.4
      • Average IQ = 114.2
      • 40% Female, 40% Male, 10% other
    • Sampling Error for #2:
      • 20.4 vs. 21.3 years
      • 114.2 vs. 112.5 IQ
      • 60 vs. 47% female
      • 40 vs. 48% male
      • 10 vs. 5% other

Probability and Samples

  • Each independent sample from the population will exhibit some sampling error.
  • Key questions:
    • How well does a sample (on average) represent the population from which it was drawn?
    • How likely is it that we draw a sample with particular characteristics?

Probability and Samples

  • Detailed question: Given a population with a set µ and σ, how likely is it to obtain a certain sample mean (M) when we take a sample of size n?
  • Many possible samples can be obtained from a given population, each with different individuals, scores, and means.
  • These possible samples form an orderly pattern: The Distribution of Sample Means (DSM).

Sample Means

  • The distribution of sample means is the collection of the means of all the possible random samples of a particular size (n) that can be obtained from a population.
  • This distribution differs from individual score distributions because it is composed of statistics (sample means), not individual scores.
  • Referred to as a sampling distribution (or “Sampling distribution of M”).

Sample Means

  • Example: A population of 4 scores: 2, 4, 6, 8. (X: 2, 4, 6, 8)

Sample Means

  • Construct the distribution of sample means for samples of size n = 2.
  • Population (N = 4): 2, 4, 6, 8
  • Procedure:
    1. Write down all 16 possible samples and the sample mean (M) for each.
    2. Place all the obtained sample means in a frequency distribution and/or histogram.

Sample Values (n=2 from population 2, 4, 6, 8)

  • Population: 2, 4, 6, 8
  • Number of possible samples = N^n = 4^2 = 16
  • Shows a table of all 16 samples, first score, second score, and sample mean.

Sample Means

  • Things to note about the distribution:
    1. Mean of sample means = mean of population.
    2. Shape looks (sort of) normal.
    3. This distribution can be used to answer questions about probabilities of sample means.
  • µ = 5

Sample Means

  • We can use this distribution to answer questions about probabilities of sample means.
  • If you take a sample of n = 2 scores from the original population, what is the probability of obtaining a sample mean greater than 6?
  • In symbols: p(M > 6) = ?
  • Probability = 3/16 = 0.1875 (3 of the 16 possible sample means are greater than 6)

Central Limit Theorem

  • What about situations with larger populations and larger samples where calculating all possible sample means is unrealistic?
  • Use the Central Limit Theorem:
  • For any population with mean µ and standard deviation σ, the distribution of sample means for sample size n will have a mean of µ, a standard deviation of σ /

√n, and will approach a normal distribution as n approaches infinity.

Central Limit Theorem

  • In table form:
    • Original Population (OP)
    • Distribution of Sample Means (DSM)
    • Sample Size (n)
    • | | OP | DSM |
    • | :---- | :----- | :----------- |
    • | Mean | µ | µ_M = µ |
    • | S.D. | σ | σ_M = σ / √n |
    • | Shape | any | normal if n >=30 or normal always normal |

Central Limit Theorem: Mean of DSM

  • Mean of the distribution of sample means is µ_M and always has a value equal to the mean of the population of scores, µ.
  • Mean of the distribution of sample means (µ_M) is called the expected value of M.
  • M is an unbiased statistic because µ_M, the expected value of M, is equal to the population mean, µ.

Central Limit Theorem: S.D. of DSM

  • Variability of a distribution of scores is measured by the standard deviation (σ).
  • Variability of a distribution of sample means is measured by the standard deviation of the sample means, and is called the standard error of M and written as σ_M.
  • In journal articles or other textbooks, the standard error of M might be identified as “standard error,” “SE,” or “SEM”.

Central Limit Theorem

  • Standard deviation: standard distance between a score X and the population mean µ.
  • Standard error: standard distance between a sample mean M and the mean of the distribution of sample means µ_M.

Standard Error: σ_M = σ / √n

  • Magnitude determined by two factors.
    1. Size of sample
      • Law of large numbers: as the sample size increases, the error between the sample mean and the population mean should decrease.
    2. Population standard deviation:
      • Standard deviation is “starting point” for standard error.
      • n=1: σ_M = σ
      • n>1: σ_M < σ
      • The smaller the population variance (S.D.), the less error between M and µ.

“Law of Large Numbers”

  • The larger a sample, the better its mean approximates the mean of the population (and thus the smaller σ_M will be).
  • For a normal population with µ = 10, σ = 5, take 100 samples of size n, and plot the means of those 100 samples.

Relationship between S.E. and n

  • Standard Error as a function of sample size.
  • The graph illustrates how standard error decreases as sample size (n) increases, given σ = 10.
  • Standard distance between a sample mean and the population mean.

Population variance

  • The smaller the population variance, the better the sample mean (M) approximates the population mean (µ) (and thus the smaller σ_M will be).

Central Limit Theorem: Shape of DSM

  • The DSM is almost perfectly normal if either of the following two conditions are satisfied:
    • The population from which the samples are drawn is normal.
    • OR
    • The number scores in each sample (n) is 30 or more.

Central Limit Theorem Proof: DSM Shape

  • Illustrates how the shape of the distribution of sample means (DSM) approaches a normal distribution as the sample size (n) increases.
  • Several graphs demonstrate the transition from non-normal to normal distributions as 'n' goes up.

Central Limit Theorem Proof: DSM Shape

  • Illustrates how the shape of the distribution of sample means (DSM) approaches a normal distribution as the sample size (n) increases for different populations.
  • Several graphs demonstrate the transition from non-normal to normal distributions as 'n' goes up for different populations and their corresponding DSM.

Central Limit Theorem Proof: DSM Shape

  • The mean of each sampling distribution is equal to 0.94
  • Population Parameters given by: μ = 0.94 and σ = 0.05
  • The standard deviation of each DSM gets smaller as n gets larger
    • When n = 2 : μx = 0.94 and σx = 0.038
    • When n = 4 : μx = 0.94 and σx = 0.027
    • When n = 16 : μx = 0.94 and σx = 0.013
    • When n = 64 : μx = 0.94 and σx = 0.006

The Distribution Triad: Population, Single Sample, and DSM

  • (a) Original population of IQ scores.
    • μ = 100
    • σ = 15
  • (b) A sample of n = 25 IQ scores.
    • M = 101.2
    • S=11.5
  • (c) The distribution of sample means. Sample means for all the possible random samples of n = 25 IQ scores.

Interactive learning!

  • https://shiny.rit.albany.edu/stat/sampdist/

Learning Check

  • Question 1: A population of unknown shape has a mean of μ = 60 with σ = 5. The mean of the distribution of sample means for samples of size n = 4 selected from this population would have a value of .
    • A) 5
    • B) 15
    • C) 30
    • D) 60
  • Question 2: A population of unknown shape has a mean of μ = 60 with σ = 5. The distribution of sample means for samples of size n = 4 selected from this population would have a standard deviation of .
    • A) 1.25
    • B) 2.5
    • C) 5
    • D) 15
  • Question 3: A population of unknown shape has a mean of μ = 60 with σ = 5. The shape of the distribution of sample means for samples of size n = 4 selected from this population would be .
    • A) Normal
    • B) Positively Skewed
    • C) Negatively Skewed
    • D) Cannot determine from the information given
  • Use the password provided by Dr. Lowe to fill out the Canvas survey: Chapter 7 Question Set 1
  • Password: quentin

Central Limit Theorem: Review

  • Original Population = OP
  • Distribution of Sample Means = DSM
  • Sample Size = n
  • | | OP | DSM |
  • | :---- | :----- | :----------- |
  • | Mean | µ | µ_M = µ |
  • | S.D. | σ | σ_M = σ / √n |
  • | Shape | any | normal if n >=30 or normal always normal |
  • Put this information on your cheat sheet!!!!!

Probability and the Distribution of Sample Means

  • We can use the distribution of sample means to find out probabilities (= proportions!).
  • For example: Given a population, how likely is it to obtain a sample of size n with a certain M?

Probability and the Sample Means

  • Single score vs Sample mean
  • z = (X - µ) / σ
    • X = score
    • µ = population mean
    • σ = Standard Dev.
    • Interpretation: Given a population, how likely is it to obtain a score with a certain value X?
    • Or…proportion of individuals within our population with a score of X.
  • z = (M - µM) / σM
    • M = sample mean
    • µ_M = mean of DSM ( = µ)
    • σ_M = Standard Error
    • Interpretation: Given a population, how likely is it to obtain a sample of size n with a certain M?
    • Or…proportion of samples (with size n) with that mean out of all the total possible samples (with size n) from our population.

Probability and the Sample Means

  • Example:
    • SAT-scores (normal, μ=500, σ=100).
    • Take a sample n=25.
    • What is p(M>540)?

Example:

  • SAT scores are normally distributed and have µ = 500, σ =100
  • Probability of drawing one score > 540?
  • Probability of drawing one sample mean > 540? when n = 36
  • Probability of drawing one sample mean > 540? when n = 100

Probability and the Sample Means

  • Another Example: SAT-scores (normal, µ=500, σ=100). Take sample n=25. What range of values for M can be expected 80% of the time (in other words, what are the boundaries of the middle 80%)?

Review: Sampling Error

  • There will almost always be discrepancy between a sample mean and the true population mean
  • This discrepancy is called sampling error
  • The amount of sampling error varies across samples
  • The variability of sampling error is measured by the standard error of the mean

Review: Standard Error & n

  • The standard error tells us how much error, on average, should exist between a sample mean and the population mean.
  • As the sample size n increases, the standard error decreases.

Importance of large sample sizes

  • A meme is presented to demonstrate the importance of sample size.

In the Literature

  • Journals vary in how they refer to the standard error but frequently use:
    • “SE”
    • “SEM”
  • Often reported in a table along with n and M for the different groups in the experiment
  • Example table:
    • | Group | n | Mean | SE |
    • | :---- | :- | :---- | :--- |
    • | A | 17 | 32.23 | 2.31 |
    • | B | 15 | 45.17 | 2.78 |

In the Literature

  • Graphs often include error bars representing standard error.
  • Example: Mscore (±SE), M number of mistakes (±SE)

Using the Standard Error

  • The standard error can help us decide which of the two alternatives is more likely.
  • Imagine an experiment:
    • Difference between sample and population:
      • due to treatment?
      • due to sampling error?

Using the Standard Error

  • 95% of all the possible sample means (from the untreated population!) for n = 25 fall between 392.16 and 407.84.
  • Suppose we take a sample of n = 25 rats and obtain M = 404. Did the growth hormone work?
  • M = 404

Using the Standard Error

  • 95% of all the possible sample means (from the untreated population!) for n = 25 fall between 392.16 and 407.84.
  • Suppose we take a sample of n = 25 rats and obtain M = 409. Did the growth hormone work?
  • M = 409

Learning Check

  • Question 1: A population forms a normal distribution with μ = 80 and σ = 20. If a single score is selected from this population, how much distance, on average, would you expect between the score and the population mean?
    • A) 0
    • B) 5
    • C) 10
    • D) 20
    • E) 80
  • Question 2: A population forms a normal distribution with μ = 80 and σ = 20. If a sample of n = 4 is selected from this population, how much distance, on average, would you expect between the sample mean and the population mean?
    • A) 0
    • B) 5
    • C) 10
    • D) 20
    • E) 80
  • Use the password provided by Dr. Lowe to fill out the Canvas survey: Chapter 7 Question Set 2
  • Password: gone2

More Examples…

  • For a population mean of µ = 70 and a standard deviation of σ = 20, how much error, on average, would you expect between the sample mean (M) and the population mean for each of the following sample sizes?
    • n = 4
    • n = 16
    • n = 25

More Examples…

  • For a population with σ = 12, how large a sample is necessary to have a standard error that is:
    • less than 4 points?
    • less than 3 points?
    • less than 2 points?

More Examples…

  • A normal distribution has a mean of µ = 54 and a standard deviation of σ = 6.
    • What is the probability of randomly selecting a score less than X = 51?
    • What is the probability of selecting a sample of n = 4 scores with a mean less than M = 51?
    • What is the probability of selecting a sample of n = 36 scores with a mean less than M = 51?