lecture 2

Page 1:

  • Module 2: Standard Normal Distribution, Standardization, Probability

Page 2:

  • Descriptive Statistics:

    • Measures of central tendency (mean, median, mode)

    • Measures of variability (IQV, IQR, SIQ, Variance, Standard Deviation)

  • Inferential Statistics:

    • z-tests

    • ANOVAS

    • Regression

    • Etc.

  • Probability, Normal Curve, Z-scores, Sampling Distributions

Page 3:

  • Probability + Distributions + z-Scores

Page 4:

  • Probability

Page 5:

  • Brussels Wednesday: Brussels Thursday

  • Weather forecast: 29°, 4 km/h, 18°, 76%, 95 km/h

Page 7:

  • Probability Definition:

    • The chance that an event will occur

    • Frequency of times an outcome occurs divided by the total number of possible outcomes (symbolized as p)

  • Random Events:

    • Outcome can vary

    • Examples: getting a cell phone before a certain age, having a car accident

  • Fixed Events:

    • Observed outcome will always be the same

    • Examples: death

  • Probability can only be calculated with random events

Page 8:

  • Calculating Probability:

    • p(x) = p = probability

    • x = outcome

    • f = frequency

    • N = Total possible outcomes (sample space)

    • f(x)/N

Page 9:

  • Example:

    • 34 students in Statistics 1

    • 2 are freshman, 13 are sophomores, 15 are juniors, and 4 are seniors

    • Calculate the probability of selecting a student from each classification

    • p(freshman) = 2/34 = .06

    • p(sophomore) = 13/34 = .38

    • p(junior) = .44

    • p(senior) = 4/34 = .12

Page 11:

  • Normal Distribution

Page 12:

  • Types of Distributions:

    • Theoretical Distributions

    • Real Distributions

Page 13:

  • Theoretical Distributions (Mathematical Distributions):

    • Determined by an equation

    • Based on an infinite number of scores

    • Appear in a smooth curve

    • Normal Distribution is the most common theoretical distribution, but not the only one

  • Inflection Point: Where the shape of the distribution changes

Page 14:

  • Normal Distribution Characteristics:

    • Theoretical distribution based on an infinite number of scores

    • Defined by an equation

    • Family/group of distributions

    • Can have various means and standard deviations

    • Mean, median, and mode fall in the same place (50th percentile)

    • Symmetric

    • Area under the curve = 1

    • Tails are asymptotic (never reach the x-axis)

  • Formula for Normal Distribution

Page 15:

  • Empirical Rule:

    • 68.26% within 1 standard deviation

    • 95.44% within 2 standard deviations

    • 99.72% within 3 standard deviations

Page 16:

  • Real Distributions:

    • Based on a set of numbers, scores, or responses from individuals or objects that make up the population

    • Do not vary too drastically from theoretical distributions

Page 18:

  • Probability and Distributions:

    • Discrete Probability Distributions:

      • Computed when a random variable is a discrete variable

      • Example: predicting the likelihood of selecting a freshman from a set of 34 students

    • Continuous Probability Distributions:

      • Computed when a random variable is a continuous variable

      • When dealing with a true normal distribution, the assumption is that we have an infinite number of scores in the population

      • Therefore, we do NOT compute the probability of a particular outcome

      • We compute the probability associated with a range of possible outcomes

      • Example: The probability of selecting a woman who is between 62 and 65 inches

Page 21:

  • Z-scores

Page 22:

  • Relative Standing (Finding the Standard Score):

    • Z-scores: Number of standard deviations that a particular score is from the mean

    • Used to determine relative standing of a given data point within a set of data

    • Also called a z transformation or standardized score

    • By definition, z-scores are not associated with a unit of measurement

Page 23:

  • Formulas for Z-scores:

    • Population: z = (X - Mean) / Standard Deviation

    • Sample: z = (X - Mean) / (Standard Deviation / √Sample Size)

  • Data Needed to Compute Z-scores:

    • X = Raw Score

    • Mean

    • Standard Deviation

Page 24:

  • Relative Standing (Finding the Standard Score):

    • Sign: Tells if the raw score is above or below the mean

    • (+) means above

    • (-) means below

    • Magnitude: Indicates how far the raw score is from the mean

Page 25:

  • Standardize the following scores:

    • Raw SAT Scores: 200, 400, 700, 800, 500, 300, 600

Page 26:

  • Standardize the following scores:

    • Raw SAT Scores: 200, 400, 700, 800, 500, 300, 600

    • The sum of a set of z-scores will always equal 0

    • The mean for a set of z-scores will always equal 0

    • The standard deviation for a set of z-scores will always equal 1

  • Compute the mean and standard deviation for the z-scores

Page 27:

  • Convert Z-scores into Raw Scores:

    • X = (Z * Standard Deviation) + Mean

  • Used with population data

  • Used with sample data

Page 28:

  • Examples (Raw scores to z-scores):

    • Convert the raw scores to z-scores

    • Average grade for all students is 65, standard deviation is 6

    • Raw scores: 62, 71, 50, 64

Page 29:

  • Examples (z-scores to raw scores):

    • Convert the z-scores to raw scores

    • Average grade for all students is 65, standard deviation is 6

    • Z-scores: -3, +1.7, -1.33, +2.33

Page 30:

  • Example:

    • High school student wants to compare academic standing in three courses: Psychology, Mathematics, and Geology

    • Convert current grade in each class to z-scores

    • Assume data in each course is normally distributed

    • X: Current grade

    • 𝜇: Mean

    • 𝜎: Standard Deviation

    • Z: Z-score

    • Psychology: X = 68, 𝜇 = 65, 𝜎 = 6, Z = .5

    • Mathematics: X = 77, 𝜇 = 77, 𝜎 = 9, Z = 0

    • Geology: X = 83, 𝜇 = 89, 𝜎 = 8, Z = -.75

Page 32:

  • Example:

    • Use the data from the previous example

    • What raw score is one half standard deviation above the mean in the Geology course?

    • 𝜇 = 89, 𝜎 = 8

Page 33:

  • Example:

    • Use the data from the previous example

    • A raw score of 65 is equal to what z-score in the Geology course?

    • 𝜇 = 89, 𝜎 = 8

Page 34:

  • Z-score of 0 is equivalent to what raw score in the Geology course?

  • Data needed: µ (mean) and σ (standard deviation) for Geology course

  • Geology course: µ = 89, σ = 8

Page 35:

  • Standard Normal Distribution:

    • Graph/distribution of z-scores

    • x-axis: z-scores

    • y-axis: frequency of z-scores

    • Area under the curve equals 1 (probability)

    • Distributed in z-score units along the x-axis

Page 36:

  • Example: Comparing academic standing in three courses (Psychology, Mathematics, Geology)

  • Convert current grades to z-scores

  • Assume data in each course is normally distributed

  • Psychology: X = 68, µ = 65, σ = 6, Z = 0.50

  • Mathematics: X = 77, µ = 77, σ = 9, Z = 0

  • Geology: X = 83, µ = 89, σ = 8, Z = -0.75

Page 37:

  • Plot the answers from the first 3 questions on a graph:

    • Psychology (z = 0.50)

    • Mathematics (z = 0)

    • Geology (z = -0.75)

Page 38:

  • Plot the answers from the first 3 questions on a graph:

    • Psychology (z = 0.50)

    • x-axis: -3, -2, -1, 0, 1, 2, 3

Page 39:

  • Plot the answers from the first 3 questions on a graph:

    • Psychology (z = 0.50)

    • x-axis: -3, -2, -1, 0, 1, 2, 3

Page 40:

  • Plot the answers from the first 3 questions on a graph:

    • Psychology (z = 0.50)

    • Geology (z = -0.75)

Page 41:

  • Reminders:

    1. Relationship between raw scores and z-scores

    • Know how to go back and forth between raw scores and z-scores

    1. Location of area under the curve/proportion on the standard normal distribution

    2. Location of z-scores on the standard normal distribution

    3. Directionality:

    • Is the question asking about areas or values less than or below a score?

    • Is the question asking about areas or values greater than or above a score?

    • Proportion/Area under the curve (Entire area = 1)

    • Z-scores

Page 42:

  • Z-Table:

    • Column 1: z-scores

    • Column 2: Area under the curve between the mean and the z-score

    • Column 3: Area under the curve beyond the z-score

    • Column 4, Column 5, and Column 6 have the same information from Column 1, 2, and 3 with respect to a new set of z-scores

Page 43:

  • Areas under the curve:

    • What proportion of scores fall above or below a z-score

    1. Drawing the normal distribution

    2. Locate the z-score(s)

    3. Determine whether the question is asking about above, below, or within a certain range

    4. Shade the appropriate region under the normal curve

    5. Look up the z-score in Table 1 (Appendix B)

    6. Find the area

    • Example: What proportion of scores fall above a z-score of 1.50?

Page 44:

  • Areas under the curve:

    • What proportion of scores fall above or below a z-score

    1. Drawing the normal distribution

    2. Locate the z-score(s)

    3. Determine whether the question is asking about above, below, or within a certain range

    4. Shade the appropriate region under the normal curve

    5. Look up the z-score in Table 1 (Appendix B)

    6. Find the area

    • Example: What proportion of scores fall below a z-score of -2?

Page 45:

  • Areas under the curve:

    • What proportion of scores fall above or below a z-score

    1. Drawing the normal distribution

    2. Locate the z-score(s)

    3. Determine whether the question is asking about above, below, or within a certain range

    4. Shade the appropriate region under the normal curve

    5. Look up the z-score in Table 1 (Appendix B)

    6. Find the area

    • Example: What proportion of scores fall below a z-score of 0?

Page 46:

  • Areas under the curve:

    • What proportion of scores fall above or below a z-score

    1. Drawing the normal distribution

    2. Locate the z-score(s)

    3. Determine whether the question is asking about above, below, or within a certain range

    4. Shade the appropriate region under the normal curve

    5. Look up the z-score in Table 1 (Appendix B)

    6. Find the area

    • Examples: What proportion of scores fall between a z = 0 and z = 1?

Page 47:

  • Starting with raw scores:

    • Sometimes given raw scores and asked to find certain areas under the curve

    • Follow the same steps as when given z-scores, but first convert the raw score to a z-score

    • Fill in the rest of the table to practice

    • Psychology: X = 68, µ = 65, σ = 6, Z = 0.50, p between X and µ = 0.1915, p of score < X = 0.6915, p of score > X = 0.3085

    • Mathematics: X = 77, µ = 77, σ = 9, Z = 0, p between X and µ = 0, p of score < X = 0.5000, p of score > X = 0.5000

    • Geology: X = 83, µ = 89, σ = 8, Z = -0.75, p between X and µ = 0.2734, p of score < X = 0.2266, p of score > X = 0.7734

Page 48:

  • Starting with raw scores:

    • Sometimes given raw scores and asked to find certain areas under the curve

    • Follow the same steps as when given z-scores, but first convert the raw score to a z-score

    • Fill in the rest of the table to practice

    • Psychology: X = 68, µ = 65, σ = 6, Z = 0.50, p between X and µ = 0.1915, p of score < X = 0.6915, p of score > X = 0.3085

    • Mathematics: X = 77, µ = 77, σ = 9, Z = 0, p between X and µ = 0, p of score < X = 0.5000, p of score > X = 0.5000

    • Geology: X = 83, µ = 89, σ = 8, Z = -0.75, p between X and µ = 0.2734, p of score < X = 0.2266, p of score > X = 0.7734

Page 49:

  • Example: Cut-off grade for determining whether or not a student needs to see their Geology professor during office hours

  • Given: Area under the curve (percentage or probability)

  • Steps:

    1. Draw the curve

    2. Shade the corresponding area under the curve

    3. Find the z-score associated with that area

    4. (If asked, convert that z-score to a raw score)

  • Geology: µ = 89, σ = 8

Page 50:

  • Example: Cut-off grade for determining an A in the Psychology class

  • Steps:

    1. Draw the curve

    2. Shade the corresponding area under the curve

    3. Find the z-score associated with that area

    4. (If asked, convert that z-score to a raw score)

  • Psychology: µ = 65, σ = 6

Page 51:

  • Example: Cut-off grade for determining whether or not a student needs to retake the last exam in the Geology class

  • Given: Area under the curve (percentage or probability)

  • Steps:

    1. Draw the curve

    2. Shade the corresponding area under the curve

    3. Find the z-score associated with that area

    4. (If asked, convert that z-score to a raw score)

  • Geology: µ = 89, σ = 8

Page 52:

  • Geology professor told students to retake the last exam if their grade was in the bottom 75.80% of the class

  • Calculate the cut-off grade for determining if a student needs to retake the exam

  • Steps to find the cut-off grade:

    • Draw the curve

    • Shade the corresponding area under the curve

    • Find the z-score associated with that area

    • Convert the z-score to a raw score

Page 53:

  • Problems can be worded differently

Page 54:

  • Find the z-critical (c) given the probability of obtaining a z-score less than or greater than c

  • Steps to find the z-critical:

    • Draw the curve

    • Shade the corresponding area under the curve

    • Find the z-score associated with that area

    • Convert the z-score to a raw score

Page 55:

  • Example: Find c such that the probability that z is less than c is 0.1711

  • Steps to find c:

    • Draw the curve

    • Shade the corresponding area under the curve

    • Find the z-score associated with that area

    • Convert the z-score to a raw score

Page 56:

  • Example: Find c such that the probability that z is greater than c is 0.0250

  • Steps to find c:

    • Draw the curve

    • Shade the corresponding area under the curve

    • Find the z-score associated with that area

    • Convert the z-score to a raw score

Page 57:

  • Example: Find c such that z is greater than c is 0.5000

  • Steps to find c:

    • Draw the curve

    • Shade the corresponding area under the curve

    • Find the z-score associated with that area

    • Convert the z-score to a raw score

Page 58:

  • Sampling distribution of the mean

Page 59:

  • Raw scores

  • Sample means (individuals)

  • Sample means (groups)

Page 60:

  • Population of 3 people (N = 3)

  • Given scores for Amber, Brittany, and Christina

Page 61:

  • Sampling distribution of the mean for samples where n = 2

  • Scores for Amber, Brittany, and Christina

Page 62:

  • Sampling distribution of the mean for samples where n = 2

  • Scores and sample means for each participant

Page 63:

  • Plotting the frequencies of sample means on a normal distribution

  • Characteristics of a normal distribution

  • Computing the probability of selecting another sample mean

Page 64:

  • Key points about the sampling distribution of the mean:

    • The average of sample means equals the population mean

    • Even if raw scores are not normally distributed, sample means will approximate a normal distribution

    • The more samples drawn, the more the distribution looks like a normal distribution

Page 65:

  • Standard error of the mean

  • Given scores for Person A, Person B, and Person C

Page 66:

  • Standard error of the mean for samples where n = 2

  • Scores and sample means for each participant

Page 67:

  • Standard deviation of the raw scores is not equal to the standard error of the mean

  • Standard error of the mean is the standard deviation of the sampling distribution of the mean

  • Formula for standard error of the mean

Page 68:

  • Key points about the standard error of the mean:

    • Standard error can increase or decrease depending on the population standard deviation and sample size

    • As sigma decreases, the standard error decreases

    • As sample size increases, the standard error decreases

    • Sample means deviate closer to the population mean as sample size increases

Page 69:

  • In-class example

Page 70:

  • Reminders and observations:

    • Relationship between raw scores and z-scores

    • Location of area under the curve/proportion on the standard normal distribution

    • Location of z-scores on the standard normal distribution

    • Directionality of the question

    • As z-scores get larger, the area under the curve gets smaller

Page 71:

  • Comparing individuals to the population

  • Example: Where does Candace Parker, who is 76 inches tall, fall in relation to the average women in terms of height?

Page 72:

  • Comparing individuals to the population

  • Example: What proportion of women in the population are taller than Serena Williams, who is 69 inches tall?

Page 73:

  • Comparing individuals to the population

  • Example: What percentage of women are shorter than Misty Copeland?

Page 74:

  • Comparing groups to the population

  • Example: Where do WNBA players fall in relation to the average women in terms of height?

Page 75:

  • Comparing groups to the population

  • Example: What proportion of women in the population are taller than women tennis players?

Page 76:

  • Comparing groups to the population

  • Example: What percentage of women are shorter than professional ballet dancers?

Page 77:

  • Comparing groups to the population

  • Example: Probability of selecting a group of women with a mean greater than the average height for a group of women in the study

Page 78:

  • Null hypothesis testing

Page 79:

  • Descriptive Statistics:

    • Measures of central tendency (mean, median, mode)

    • Measures of Variability (IQV, IQR, SIQ, Variance, Standard Deviation)

    • Measures of Association (Cramer’s Phi, Point Biserial, Pearson’s r, Spearman’s rho)

  • Inferential Statistics:

    • z-tests

    • ANOVAS

    • Regression

    • Etc.

  • Normal Curve, Z-scores, Probability, Standard Error of the Mean

Page 80:

  • Goal of quantitative research is to describe the distribution of sample characteristics and make inferences about the population

  • Examples of situations where hypothesis testing is used

Page 81:

  • Hypothesis testing is a method for testing a claim or hypothesis about a parameter in a population using data from a sample

  • Steps in hypothesis testing:

    1. State the statistical hypotheses

    2. Select the statistical test and level of significance

    3. Select the sample and collect the data

    4. Find the region(s) of rejection

    5. Calculate the test statistic

    6. Make the statistical decision

    7. Interpret and report the findings

Page 82:

  • Key concepts in statistics

  • Research question, research hypothesis, independent variable, dependent variable, scales of measurement, symbols for descriptive statistics and population parameters, characteristics of the normal distribution, using the z-table, locating the z-score on the normal curve

Page 83:

  • Statistical hypotheses are determined based on the research question and research hypothesis

  • Two types of hypotheses: Null and Alternative

  • Null hypothesis states no relationship or no statistically significant difference among groups

  • Denoted as H0: 𝝁 = population parameter (null hypothesized value)

Page 84:

  • Alternative hypotheses predict statistically significant relationships or differences among groups

  • Denoted as Ha

Page 85:

  • Directional alternative hypotheses specify the type of effect or relationship between the independent and dependent variables

  • Examples of directional alternative hypotheses

Page 86:

  • Nondirectional alternative hypotheses recognize the relationship between the independent and dependent variables without specifying the type of relationship

Page 87:

  • Different types of statistical tests and their conceptual meanings

  • One-sample tests, two-sample tests, two or more sample tests

Page 88:

  • Level of significance represents the probability that the observed relationships between variables happened by chance or sampling error

  • Symbolized using the Greek symbol alpha (α)

  • Chosen by the researcher prior to data collection and analysis

Page 89:

  • Explanation of why the level of significance is often set at .05

  • The empirical rule and the probability of selecting a mean that is greater or less than 2 standard deviations from the population mean

Page 90:

  • Selecting a representative sample from the population and collecting data

  • Caution against selection bias

Page 91:

  • Rejection region(s) in the sampling distribution that determine whether to reject or retain the null hypothesis

  • One-tailed hypotheses have one rejection region, two-tailed hypotheses have two rejection regions

Page 92:

  • Finding the rejection region for a one-tailed test

  • Finding the area under the curve and the corresponding z-score (z-critical)

Page 93:

  • Directional (one-tailed) hypothesis with one rejection region

Page 94:

  • Directional (one-tailed) hypothesis with one rejection region

Page 95:

  • Finding the rejection regions for a two-tailed test

  • Dividing alpha by 2 to create two rejection regions

  • Finding the area under the curve and the corresponding z-scores (z-criticals)

Page 96:

  • Nondirectional (two-tailed) hypothesis with two rejection regions

Page 97:

  • Calculating the test statistic by entering the data into the formula for the statistical test

Page 98:

  • P-value

    • Definition: probability of obtaining a sample mean, given that value stated in the null hypothesis is true.

    • Probability of obtaining a value as extreme or more extreme as the calculated test statistic

    • Each test statistic (z, t, F, etc.) has an associated p-value

    • For z statistics, we can determine the exact p-value from the z-table

    • For one-tailed tests

    • Two-tailed tests require that you multiply this value by 2

    • For t-tests we can determine a range for the p-value from the t-table

    • Due to the fact that we have a family of t-distributions

    • Statistical software programs allow us to determine the exact p-value

Page 99, 100, 101, 102:

  • MAKE THE STATISTICAL DECISION

    • If the calculated test statistic falls in the rejection region, then we reject the null hypothesis.

    • If the absolute value of the test statistic is greater than the critical value, then reject the null hypothesis.

Page 103:

  • INTERPRET THE FINDINGS

    • The process of explaining the statistical decision in relation to the research hypothesis.

Page 104:

  • ONE SAMPLE TESTS

Page 105:

  • Population Sample

Page 106:

  • Population 1 Sample Population 2

Page 107:

  • ONE SAMPLE STATISTICAL TESTS

    • How many samples do I have?

    • Do I know sigma (σ)?

    • Do I have a large sample? (n > 30)

    • ONE YES NO NO YES ONE SAMPLE Z-TEST (NORMAL DEVIATE Z-TEST) LARGE SAMPLE Z-TEST ONE SAMPLE T-TEST

Page 108:

  • TRIP DOWN MEMORY LANE….TO THE LAND OF Z-SCORES

    • Use these formulas, when we want to know how far a score is from the mean in standard deviation units.

    • z = Population

    • z = Sample

Page 109:

  • FORMULA FOR ONE-SAMPLE Z-TEST (NORMAL DEVIATE Z-TEST)

    • Notice this formula is very similar to the formula for individual raw scores.

    • Denominator is called the standard error of the mean

    • z = Data Bank: : Sample mean : Population Mean : Population Standard Deviation n = sample size

Page 110:

  • FORMULA FOR LARGE SAMPLE Z-TEST (NORMAL DEVIATE Z-TEST)

    • Notice this formula is very similar to the formula for individual raw scores.

    • Denominator is called the standard error of the mean

    • z = Data Bank: : Sample mean : Population Mean : Sample Standard Deviation n = sample size

Page 111:

  • FORMULA FOR ONE-SAMPLE T-TEST (NORMAL DEVIATE Z-TEST)

    • Notice this formula is very similar to the formula for individual raw scores.

    • Denominator is called the standard error of the mean

    • t = Data Bank: : Sample mean : Population Mean : Sample Standard Deviation n = sample size

Page 112:

  • WHY T?

    • Reason 1: As sample size (n) gets smaller, the less sample standard deviation (s) becomes a good estimate of the population standard deviation (σ).

    • Sample Size

    • Sample Standard Deviation

    • Population Standard Deviation

Page 113:

  • WHY T?

    • Data Set 1

    • Data Set 2

    • Data Set 3

    • Data Set 4

    • Standard Deviation

    • Reason 2: When the sample size stays the same, the standard deviation can fluctuate a lot.

Page 114:

  • WHY T?

    • Reason 3: As the sample size gets smaller, the more the sampling distribution becomes less normally distributed.

    • So we cannot assume that the sampling distribution of the mean is normally distributed

    • As a result, we no longer use the z-statistic and the z-distribution.

    • We use the t-statistic and one of the t-distributions instead.

Page 115:

  • T-DISTRIBUTIONS

    • Family of distributions

    • There are multiple t-distributions that are determined based on the sample size

    • As the sample size gets larger, the more the t-distributions look like the normal distribution

    • Leptokurtic (peaked in the center)

    • The tails are slightly raised

    • Symmetric

    • Unimodal

Page 116:

  • Identical with normally distributed Z

  • As the sample size increases, the more N = 8 the t-distributions are N= 15 normally distributed.

  • N= 5 f Sample Size

Page 117:

  • INFORMATION ABOUT T

    • Small sample statistic

    • Used when we have small samples (n < 30)

    • We determine which t-distribution to use based on degrees of freedom

    • Definition: how many numbers are free to change in a calculation sequence.

Page 118:

  • X Mean Deviations (x-mean) 2

  • In this example, one cell was free to vary.

  • Formula for degrees of freedom (df) for one-sample t-test is n -1

  • Note: Degrees of Freedom is not new! We divide by (n-1) in our unbiased variance and standard deviation formulas.

Page 119:

  • INFORMATION ABOUT T-TABLE

    • We use degrees of freedom and alpha to tell us the critical values for t.

    • Critical values for t are found in a t-table.

    • Degrees of freedom (df) are in the first column.

    • Alphas (in the first few rows).

    • When you have a one-tailed hypothesis, alpha is in the first row (Level of significance for one-tailed test)

    • When you have a two-tailed hypothesis, alpha is in the second row (Level of significance for two-tailed test)

Page 120:

  • Alpha N Degrees of Freedom T-critical for one-tailed test T-criticals for two-tailed test

  • .05 6 5 2.015 +/-2.571

  • .01 20 19 2.539 +/- 2.861

  • .05 30 29 1.699 +/- 2.045

Page 121:

  • EXAMPLES

Page 122:

  • EXAMPLE

  • In 2013, the average (μ) Math SAT Score was 488 (σ = 114).

  • Let’s say we are educational psychologists and we want to develop a program to improve student performance on the math portion of the SAT.

  • We recruit 50 students from Petersburg High School and ask them to participate in our Advanced Math Program.

  • At the end of our 6-week program, the students take the math SAT.

  • We are proud of them because the average score for all 50 students was 524!

  • Is their average significantly higher than average score on the Math SAT?

  • Research Question:

  • Research Hypothesis:

  • Independent Variable:

  • Dependent Variable:

  • SOM Independent Variable:

  • SOM Dependent Variable:

  • How many samples?

  • Population Mean = 488

  • Population Standard deviation = 114

  • Sample mean = 524

Page 123:

  • STATE THE NULL AND ALTERNATIVE HYPOTHESES (STATISTICAL HYPOTHESES)

  • Null hypothesis: There is no difference between the average math SAT score for our sample and the average math SAT score for the population.

  • Directional Alternative Hypothesis: The average math SAT score the students in the Advanced Math Program (our sample) is significantly higher the average math SAT score for the population.

Page 124:

  • Statistical hypotheses:

    • Null hypothesis (H0): μAdvancedMathProgram = μPopulation

    • Alternative hypothesis (Ha): μAdvancedMathProgram > μPopulation

  • Population mean (μ) = 488

Page 125:

  • Select the statistical test:

    • One sample tests

  • Determine the number of samples, knowledge of sigma, and sample size:

    • One sample

    • No knowledge of sigma

    • Sample size (n) > 30

  • Possible tests:

    • One sample z-test

    • Large sample z-test

    • One sample t-test

Page 126:

  • Select the level of significance (alpha level):

    • Convention in Psychology: alpha (𝝰) level of .05

Page 127:

  • Select the sample and collect data:

    • 50 students from Petersburg High School

Page 128:

  • Find the rejection regions:

    • Determined based on the level of significance and the alternative hypothesis

  • One-tailed test:

    • Find the area under the curve that equals the level of significance (e.g., .05)

    • Find the corresponding z-score (z-critical)

  • Two-tailed test:

    • Divide alpha into two to create two regions of rejection

    • Look for an area under the curve that equals half of the level of significance (e.g., .025)

  • The rejection regions change with the level of significance

Page 129:

  • Calculate the test statistic:

    • Data: 488, 114, 524

    • Sample size (n) = 50

    • Plug the data into the formula

Page 130:

  • Make the statistical decision:

    • If the test statistic falls in the rejection region, reject the null hypothesis

    • If the test statistic does not fall in the rejection region, retain the null hypothesis

Page 131:

  • Statistical decision for directional research hypothesis:

    • Z-critical values: 1.645, 2.23

    • Region of retention, region of rejection

    • Decision: Reject the null hypothesis

Page 132:

  • Interpret the findings:

    • The average Math SAT score for students in the Advanced Math Program (524) was statistically significantly higher than the average math SAT score for the population (μ = 488)

Page 133:

  • Example 2: Researchers do not know the population standard deviation

  • Claim: Mean salary of the company's mechanical engineers is different than the national average ($68,000)

  • Sample data: 30 mechanical engineers, mean salary = $66,900, standard deviation = $5,500

  • Test the employees' claim at α = 0.05

Page 134:

  • State the statistical hypotheses:

    • Research hypothesis: Mean salary of the company's mechanical engineers is different than the national average

    • Statistical hypotheses:

      • Null hypothesis (H0): μCompanyChemicalEngineers = 68,000

      • Alternative hypothesis (Ha): μCompanyChemicalEngineers ≠ 68,000

Page 135:

  • Select the statistical test:

    • One sample tests

  • Determine the number of samples, knowledge of sigma, and sample size:

    • One sample

    • No knowledge of sigma

    • Sample size (n) > 30

  • Possible test: Large sample z-test

Page 136:

  • Select the statistical test and level of significance:

    • Large sample z-test

    • Use sample standard deviation as an estimate of sigma

Page 137:

  • Find regions of rejection:

    • Z-critical values: -1.960, 1.960

    • 𝝰/2 = .025

Page 138:

  • Calculate the test statistic:

    • Data: $68,000, $5,500, $66,900

    • Sample size (n) = 30

    • Calculate z

Page 139:

  • Make the statistical decision:

    • Z-critical values: -1.960, 1.960

    • Z-obt = -1.09

    • Decision: Retain the null hypothesis

Page 140:

  • Interpret the findings:

    • The claim that the mean salary of the company's mechanical engineers is different than the national average is false

    • The average salary for the company's employees ($66,900) is not statistically significantly different than the national average ($68,000)

Page 141:

  • One-sample t-test

Page 142:

  • Select the statistical test:

    • One sample tests

  • Determine the number of samples, knowledge of sigma, and sample size:

    • One sample

    • No knowledge of sigma

    • Sample size (n) > 30

  • Possible tests:

    • One sample z-test

    • Large sample z-test

    • One sample t-test

Page 143:

  • Reasons for using t-test:

    • Sample size (n) gets smaller

    • Sample standard deviation (s) becomes a better estimate of population standard deviation (σ)

    • Standard deviation can fluctuate with a constant sample size

Page 144:

  • Reasons for using t-test:

    • Sample size gets smaller

    • Sampling distribution becomes less normally distributed

    • Cannot assume normal distribution, use t-statistic and t-distributions instead

Page 145:

  • T-distributions:

    • Family of distributions

    • Multiple t-distributions based on sample size

    • As sample size increases, t-distributions resemble normal distribution

    • Leptokurtic, slightly raised tails, symmetric, unimodal

Page 146:

  • Information about t:

    • Used for small samples (n < 30)

    • Determine which t-distribution to use based on degrees of freedom

    • Degrees of freedom (df) = n - 1

Page 149:

  • Degrees of freedom (df) for one-sample t-test is n - 1

  • Degrees of Freedom is not new, used in unbiased variance and standard deviation formulas

Note

Page 150

  • Degrees of freedom (df) and alpha are used to determine critical values for t.

  • Critical values for t can be found in a t-table.

  • The first column of the t-table contains degrees of freedom (df).

  • The first few rows of the t-table contain alphas (area under the curve).

  • For a one-tailed hypothesis, alpha is in the first row.

  • For a two-tailed hypothesis, alpha is in the second row.

Page 151

  • Examples of t-critical values for different alphas and degrees of freedom:

    • Alpha = 0.05, df = 6, t-critical for one-tailed test = 2.015, t-critical for two-tailed test = +/-2.571

    • Alpha = 0.01, df = 20, t-critical for one-tailed test = 2.539, t-critical for two-tailed test = +/-2.861

    • Alpha = 0.05, df = 30, t-critical for one-tailed test = 1.699, t-critical for two-tailed test = +/-2.045

Page 152

  • Example 4: A schoolteacher wants to test the hypothesis that her students watch more TV than the average American child.

  • She records the number of hours of TV each of her 15 students watch per day.

  • The average number of hours was 5.98 and the standard deviation was 1.21.

  • She wants to test the hypothesis using a significance level of 0.01.

Page 154

  • Differences vs. Relationships:

    • Are faculty in one department more satisfied than faculty in another?

    • What is the average in the population?

    • Do children consume more red cookies than blue cookies?

Page 155

  • Differences vs. Relationships:

    • Is there a relationship between midterm scores and final exam scores?

    • Is caloric intake related to weight?

    • Is there a relationship between hours spent exercising and weight?

Page 156

  • Correlation is a statistical procedure used to describe the strength and direction of the relationship between two factors.

  • Correlation measures the tendency for two variables to vary or change together.

  • Correlation can be used to describe the pattern of change in values of two factors and determine if the pattern is present in the population.

Page 158

  • Correlation coefficient is a measure used to quantify the relationship between variables.

  • It measures the strength and direction of the relationship.

  • Correlation coefficient can be used to determine if the observed pattern in a sample is present in the population.

Page 159

  • Examples of correlation coefficients:

    • Pearson's r product moment correlation coefficient

    • Spearman's rho correlation coefficient

    • Point bi-serial correlation coefficient

    • Pearson's chi-square correlation coefficient

Page 160

  • Correlation does not imply causation.

  • There may be a relationship between variables, but it does not mean that one variable caused a change in the other variable.

Page 161

  • Pearson's r product moment correlation coefficient is a measure of the linear relationship between two factors.

  • It is used when the data for both factors are measured on an interval or ratio scale.

Page 162

  • Assumptions for Pearson's r correlation coefficient:

    • Linearity: The best way to describe the pattern of data is using a straight line.

    • Normality: The data points in the population for both variables are normally distributed.

    • Bivariate normal distribution: When the data from both variables are plotted together, they form a normal distribution.

Page 163

  • Pearson's r correlation coefficient is the ratio of how much the variables change together to how much they vary separately.

  • Covariance measures the extent to which the values of two factors vary together.

Page 164

  • Interpretation of Pearson's r:

    • Positive (+): As one variable increases, the other increases. As one variable decreases, the other decreases.

    • Negative (-): As one variable increases, the other decreases. As one variable decreases, the other increases.

Page 165

  • Interpretation of Pearson's r:

    • Stronger: The closer the value of r is to -1 or +1, the stronger the relationship.

    • Weaker: The closer the value of r is to 0, the weaker the relationship.

Page 166

  • Interpretation of Pearson's r:

    • Magnitude interpretation:

      • 0.0 < |r| < 0.10: Little if any relationship.

      • 0.10 < |r| < 0.30: Weak relationship.

      • 0.30 < |r| < 0.50: Moderate relationship.

      • 0.50 < |r| < 1.0: Strong relationship.

Page 168

  • Testing the null hypothesis:

    • Null hypothesis: There is no linear relationship between the variables.

    • Nondirectional alternative hypothesis: There is a linear relationship between the variables.

    • Directional alternative hypothesis: The relationship is negative or positive.

Page 172

  • Regions of rejection:

    • Information needed: Alpha, direction or non-directional hypothesis, degrees of freedom.

    • For a directional hypothesis, there is one region of rejection.

    • For a non-directional hypothesis, there are two regions of rejection.

Page 176

  • Pearson's correlation coefficient is not the best correlation coefficient if the relationship between variables is not linear.

  • Curvilinear relationship: A relationship between variables that can be best described with a curved line.

Page 178

  • One sample tests:

    • Compare population with one independent variable (nominal scale) and one dependent variable (interval or ratio scale).

Page 179

  • One sample tests:

    • Compare population 1 with one independent variable (nominal scale) and one dependent variable (interval or ratio scale).

Page 180

  • Measures of association:

    • Pearson's r correlation coefficient for one sample population.

Page 182

  • Example: A social scientist wants to study the relationship between computer use and daily exercise.

  • She asks 4 participants to record the number of hours they spend using a computer and the average amount of time per week they spend exercising.

  • The data is recorded in a table.

Page 184

  • Normal distribution:

    • Theoretical distribution based on an infinite number of scores.

    • Defined by an equation.

    • Can have various means and standard deviations.

    • Mean, median, and mode fall in the same place (50th percentile).

    • Symmetric.

    • Area under the curve = 1.

    • Tails are asymptotic (never reach the x-axis).

Empirical Rule

Page 185:

  • Empirical Rule percentages:

    • 34.13%

    • 34.13%

    • 13.59%

    • 13.59%

    • 0.13%

    • 2.14%

    • 2.14%

    • 0.13%

  • Values:

    • -3

    • -2

    • -1

    • 0

    • 1

    • 2

    • 3

  • Corresponding percentages:

    • 68.26%

    • 95.44%