Exam Preparation Notes on Distribution of Sample Means

Announcements

  • Exam scores will be visible on Canvas soon.
  • Mean, median, and standard deviation (SD) will be announced (TBA).
  • Exam results (missed questions) can be viewed during office hours or in discussion sections, not SI sessions.
  • Overall course grade will be curved; a calculator will be posted after Exam 2.

Academic Achievement - Online Tutoring

  • Online tutoring is available for PSYC 60.
  • Schedule in-person or online tutoring sessions at the provided URL: https://aah.ucsd.edu/content-tutoring/index.html

Distribution of Sample Means (Chapter 7)

  • Based on Chapter 7 in Gravetter & Wallnau.

Probability and Samples

  • Initially, the discussion focuses on samples of size 1.
  • Example: Given a normal distribution with µ=68µ=68 and σ=6σ=6, find the probability of selecting a person taller than 80 inches.
  • Answer: Convert 80 inches to a z-score and find the proportion greater than that z-score in the unit normal table.
  • P(X>80) = 0.0228 (for z=2.0z = 2.0).
  • Most research, however, involves samples with n>1.
  • For n=1n = 1, P(X>80) = 0.0228.
  • For n=2n = 2, P( (X1 + X2) / 2 > 80 ) = ???
  • For n=100n = 100, P( (X1 + X2 + ··· + X_{100}) / 100 > 80 ) = ???
  • The probabilities are not equal; therefore, the single-score method must be modified for samples with n>1.
  • Sampling error: Natural differences that exist by chance between a sample statistic and a population parameter.

Sampling Error: Review

  • Population of UCSD students:
    • Population Parameters:
      • Average Age = 21.3 years
      • Average IQ = 112.5
      • 47% female, 48% male, 5% other
  • Sample #1: Adam, Brad, Chelsea, Derrick, Elisa
    • Sample Statistics:
      • Average Age = 19.8
      • Average IQ = 104.6
      • 40% Female, 60% Male, 0% other
    • Sampling Error for #1:
      • 19.8 vs. 21.3 years
      • 104.6 vs. 112.5 IQ
      • 40 vs. 47% female
      • 60 vs. 48% male
      • 0 vs. 5% other
  • Sample #2: Amy, Bryan, Chris, Deanna, Eric
    • Sample Statistics:
      • Average Age = 20.4
      • Average IQ = 114.2
      • 40% Female, 40% Male, 10% other
    • Sampling Error for #2:
      • 20.4 vs. 21.3 years
      • 114.2 vs. 112.5 IQ
      • 60 vs. 47% female
      • 40 vs. 48% male
      • 10 vs. 5% other

Probability and Samples

  • Each independent sample from the population will exhibit some sampling error.
  • Key questions:
    • How well does a sample (on average) represent the population from which it was drawn?
    • How likely is it that we draw a sample with particular characteristics?

Probability and Samples

  • Detailed question: Given a population with a set µµ and σσ, how likely is it to obtain a certain sample mean (M) when we take a sample of size n?
  • Many possible samples can be obtained from a given population, each with different individuals, scores, and means.
  • These possible samples form an orderly pattern: The Distribution of Sample Means (DSM).

Sample Means

  • The distribution of sample means is the collection of the means of all the possible random samples of a particular size (n) that can be obtained from a population.
  • This distribution differs from individual score distributions because it is composed of statistics (sample means), not individual scores.
  • Referred to as a sampling distribution (or “Sampling distribution of M”).

Sample Means

  • Example: A population of 4 scores: 2, 4, 6, 8. (X: 2, 4, 6, 8)

Sample Means

  • Construct the distribution of sample means for samples of size n=2n = 2.
  • Population (N=4N = 4): 2, 4, 6, 8
  • Procedure:
    1. Write down all 16 possible samples and the sample mean (M) for each.
    2. Place all the obtained sample means in a frequency distribution and/or histogram.

Sample Values (n=2 from population 2, 4, 6, 8)

  • Population: 2, 4, 6, 8
  • Number of possible samples = Nn=42=16N^n = 4^2 = 16
  • Shows a table of all 16 samples, first score, second score, and sample mean.

Sample Means

  • Things to note about the distribution:
    1. Mean of sample means = mean of population.
    2. Shape looks (sort of) normal.
    3. This distribution can be used to answer questions about probabilities of sample means.
  • µ=5µ = 5

Sample Means

  • We can use this distribution to answer questions about probabilities of sample means.
  • If you take a sample of n=2n = 2 scores from the original population, what is the probability of obtaining a sample mean greater than 6?
  • In symbols: p(M > 6) = ?
  • Probability = 3/16 = 0.1875 (3 of the 16 possible sample means are greater than 6)

Central Limit Theorem

  • What about situations with larger populations and larger samples where calculating all possible sample means is unrealistic?
  • Use the Central Limit Theorem:
  • For any population with mean µµ and standard deviation σσ, the distribution of sample means for sample size n will have a mean of µµ, a standard deviation of σ/</li></ul><p>nσ / </li> </ul> <p>√n, and will approach a normal distribution as n approaches infinity.

    Central Limit Theorem

    • In table form:
      • Original Population (OP)
      • Distribution of Sample Means (DSM)
      • Sample Size (n)
      • | | OP | DSM |
      • | :---- | :----- | :----------- |
      • | Mean | µµ | µM=µµ_M = µ |
      • | S.D. | σσ | σM=σ/nσ_M = σ / √n |
      • | Shape | any | normal if n >=30 or normal always normal |

    Central Limit Theorem: Mean of DSM

    • Mean of the distribution of sample means is µMµ_M and always has a value equal to the mean of the population of scores, µµ.
    • Mean of the distribution of sample means (µMµ_M) is called the expected value of M.
    • M is an unbiased statistic because µMµ_M, the expected value of M, is equal to the population mean, µµ.

    Central Limit Theorem: S.D. of DSM

    • Variability of a distribution of scores is measured by the standard deviation (σσ).
    • Variability of a distribution of sample means is measured by the standard deviation of the sample means, and is called the standard error of M and written as σMσ_M.
    • In journal articles or other textbooks, the standard error of M might be identified as “standard error,” “SE,” or “SEM”.

    Central Limit Theorem

    • Standard deviation: standard distance between a score X and the population mean µµ.
    • Standard error: standard distance between a sample mean M and the mean of the distribution of sample means µMµ_M.

    Standard Error: σM=σ/nσ_M = σ / √n

    • Magnitude determined by two factors.
      1. Size of sample
        • Law of large numbers: as the sample size increases, the error between the sample mean and the population mean should decrease.
      2. Population standard deviation:
        • Standard deviation is “starting point” for standard error.
        • n=1:σM=σn=1: σ_M = σ
        • n>1: σ_M < σ
        • The smaller the population variance (S.D.), the less error between M and µµ.

    “Law of Large Numbers”

    • The larger a sample, the better its mean approximates the mean of the population (and thus the smaller σMσ_M will be).
    • For a normal population with µ=10µ = 10, σ=5σ = 5, take 100 samples of size n, and plot the means of those 100 samples.

    Relationship between S.E. and n

    • Standard Error as a function of sample size.
    • The graph illustrates how standard error decreases as sample size (n) increases, given σ=10σ = 10.
    • Standard distance between a sample mean and the population mean.

    Population variance

    • The smaller the population variance, the better the sample mean (M) approximates the population mean (µµ) (and thus the smaller σMσ_M will be).

    Central Limit Theorem: Shape of DSM

    • The DSM is almost perfectly normal if either of the following two conditions are satisfied:
      • The population from which the samples are drawn is normal.
      • OR
      • The number scores in each sample (n) is 30 or more.

    Central Limit Theorem Proof: DSM Shape

    • Illustrates how the shape of the distribution of sample means (DSM) approaches a normal distribution as the sample size (n) increases.
    • Several graphs demonstrate the transition from non-normal to normal distributions as 'n' goes up.

    Central Limit Theorem Proof: DSM Shape

    • Illustrates how the shape of the distribution of sample means (DSM) approaches a normal distribution as the sample size (n) increases for different populations.
    • Several graphs demonstrate the transition from non-normal to normal distributions as 'n' goes up for different populations and their corresponding DSM.

    Central Limit Theorem Proof: DSM Shape

    • The mean of each sampling distribution is equal to 0.94
    • Population Parameters given by: μ=0.94μ = 0.94 and σ=0.05σ = 0.05
    • The standard deviation of each DSM gets smaller as n gets larger
      • When n = 2 : μ<em>x=0.94μ<em>x = 0.94 and σ</em>x=0.038σ</em>x = 0.038
      • When n = 4 : μ<em>x=0.94μ<em>x = 0.94 and σ</em>x=0.027σ</em>x = 0.027
      • When n = 16 : μ<em>x=0.94μ<em>x = 0.94 and σ</em>x=0.013σ</em>x = 0.013
      • When n = 64 : μ<em>x=0.94μ<em>x = 0.94 and σ</em>x=0.006σ</em>x = 0.006

    The Distribution Triad: Population, Single Sample, and DSM

    • (a) Original population of IQ scores.
      • μ=100μ = 100
      • σ=15σ = 15
    • (b) A sample of n = 25 IQ scores.
      • M=101.2M = 101.2
      • S=11.5S=11.5
    • (c) The distribution of sample means. Sample means for all the possible random samples of n = 25 IQ scores.

    Interactive learning!

    • https://shiny.rit.albany.edu/stat/sampdist/

    Learning Check

    • Question 1: A population of unknown shape has a mean of μ=60μ = 60 with σ=5σ = 5. The mean of the distribution of sample means for samples of size n=4n = 4 selected from this population would have a value of .
      • A) 5
      • B) 15
      • C) 30
      • D) 60
    • Question 2: A population of unknown shape has a mean of μ=60μ = 60 with σ=5σ = 5. The distribution of sample means for samples of size n=4n = 4 selected from this population would have a standard deviation of .
      • A) 1.25
      • B) 2.5
      • C) 5
      • D) 15
    • Question 3: A population of unknown shape has a mean of μ=60μ = 60 with σ=5σ = 5. The shape of the distribution of sample means for samples of size n=4n = 4 selected from this population would be .
      • A) Normal
      • B) Positively Skewed
      • C) Negatively Skewed
      • D) Cannot determine from the information given
    • Use the password provided by Dr. Lowe to fill out the Canvas survey: Chapter 7 Question Set 1
    • Password: quentin

    Central Limit Theorem: Review

    • Original Population = OP
    • Distribution of Sample Means = DSM
    • Sample Size = n
    • | | OP | DSM |
    • | :---- | :----- | :----------- |
    • | Mean | µµ | µM=µµ_M = µ |
    • | S.D. | σσ | σM=σ/nσ_M = σ / √n |
    • | Shape | any | normal if n >=30 or normal always normal |
    • Put this information on your cheat sheet!!!!!

    Probability and the Distribution of Sample Means

    • We can use the distribution of sample means to find out probabilities (= proportions!).
    • For example: Given a population, how likely is it to obtain a sample of size n with a certain M?

    Probability and the Sample Means

    • Single score vs Sample mean
    • z=(Xµ)/σz = (X - µ) / σ
      • X = score
      • µ = population mean
      • σσ = Standard Dev.
      • Interpretation: Given a population, how likely is it to obtain a score with a certain value X?
      • Or…proportion of individuals within our population with a score of X.
    • z=(Mµ<em>M)/σ</em>Mz = (M - µ<em>M) / σ</em>M
      • M = sample mean
      • µMµ_M = mean of DSM ( = µ)
      • σMσ_M = Standard Error
      • Interpretation: Given a population, how likely is it to obtain a sample of size n with a certain M?
      • Or…proportion of samples (with size n) with that mean out of all the total possible samples (with size n) from our population.

    Probability and the Sample Means

    • Example:
      • SAT-scores (normal, μ=500,σ=100μ=500, σ=100).
      • Take a sample n=25.
      • What is p(M>540)?

    Example:

    • SAT scores are normally distributed and have µ=500,σ=100µ = 500, σ =100
    • Probability of drawing one score > 540?
    • Probability of drawing one sample mean > 540? when n = 36
    • Probability of drawing one sample mean > 540? when n = 100

    Probability and the Sample Means

    • Another Example: SAT-scores (normal, µ=500,σ=100µ=500, σ=100). Take sample n=25n=25. What range of values for M can be expected 80% of the time (in other words, what are the boundaries of the middle 80%)?

    Review: Sampling Error

    • There will almost always be discrepancy between a sample mean and the true population mean
    • This discrepancy is called sampling error
    • The amount of sampling error varies across samples
    • The variability of sampling error is measured by the standard error of the mean

    Review: Standard Error & n

    • The standard error tells us how much error, on average, should exist between a sample mean and the population mean.
    • As the sample size n increases, the standard error decreases.

    Importance of large sample sizes

    • A meme is presented to demonstrate the importance of sample size.

    In the Literature

    • Journals vary in how they refer to the standard error but frequently use:
      • “SE”
      • “SEM”
    • Often reported in a table along with n and M for the different groups in the experiment
    • Example table:
      • | Group | n | Mean | SE |
      • | :---- | :- | :---- | :--- |
      • | A | 17 | 32.23 | 2.31 |
      • | B | 15 | 45.17 | 2.78 |

    In the Literature

    • Graphs often include error bars representing standard error.
    • Example: Mscore (±SE), M number of mistakes (±SE)

    Using the Standard Error

    • The standard error can help us decide which of the two alternatives is more likely.
    • Imagine an experiment:
      • Difference between sample and population:
        • due to treatment?
        • due to sampling error?

    Using the Standard Error

    • 95% of all the possible sample means (from the untreated population!) for n = 25 fall between 392.16 and 407.84.
    • Suppose we take a sample of n = 25 rats and obtain M = 404. Did the growth hormone work?
    • M = 404

    Using the Standard Error

    • 95% of all the possible sample means (from the untreated population!) for n = 25 fall between 392.16 and 407.84.
    • Suppose we take a sample of n = 25 rats and obtain M = 409. Did the growth hormone work?
    • M = 409

    Learning Check

    • Question 1: A population forms a normal distribution with μ=80μ = 80 and σ=20σ = 20. If a single score is selected from this population, how much distance, on average, would you expect between the score and the population mean?
      • A) 0
      • B) 5
      • C) 10
      • D) 20
      • E) 80
    • Question 2: A population forms a normal distribution with μ=80μ = 80 and σ=20σ = 20. If a sample of n=4n = 4 is selected from this population, how much distance, on average, would you expect between the sample mean and the population mean?
      • A) 0
      • B) 5
      • C) 10
      • D) 20
      • E) 80
    • Use the password provided by Dr. Lowe to fill out the Canvas survey: Chapter 7 Question Set 2
    • Password: gone2

    More Examples…

    • For a population mean of µ=70µ = 70 and a standard deviation of σ=20σ = 20, how much error, on average, would you expect between the sample mean (M) and the population mean for each of the following sample sizes?
      • n = 4
      • n = 16
      • n = 25

    More Examples…

    • For a population with σ=12σ = 12, how large a sample is necessary to have a standard error that is:
      • less than 4 points?
      • less than 3 points?
      • less than 2 points?

    More Examples…

    • A normal distribution has a mean of µ=54µ = 54 and a standard deviation of σ=6σ = 6.
      • What is the probability of randomly selecting a score less than X=51X = 51?
      • What is the probability of selecting a sample of n=4n = 4 scores with a mean less than M=51M = 51?
      • What is the probability of selecting a sample of n=36n = 36 scores with a mean less than M=51M = 51?