lecture 2
Page 1:
Module 2: Standard Normal Distribution, Standardization, Probability
Page 2:
Descriptive Statistics:
Measures of central tendency (mean, median, mode)
Measures of variability (IQV, IQR, SIQ, Variance, Standard Deviation)
Inferential Statistics:
z-tests
ANOVAS
Regression
Etc.
Probability, Normal Curve, Z-scores, Sampling Distributions
Page 3:
Probability + Distributions + z-Scores
Page 4:
Probability
Page 5:
Brussels Wednesday: Brussels Thursday
Weather forecast: 29°, 4 km/h, 18°, 76%, 95 km/h
Page 7:
Probability Definition:
The chance that an event will occur
Frequency of times an outcome occurs divided by the total number of possible outcomes (symbolized as p)
Random Events:
Outcome can vary
Examples: getting a cell phone before a certain age, having a car accident
Fixed Events:
Observed outcome will always be the same
Examples: death
Probability can only be calculated with random events
Page 8:
Calculating Probability:
p(x) = p = probability
x = outcome
f = frequency
N = Total possible outcomes (sample space)
f(x)/N
Page 9:
Example:
34 students in Statistics 1
2 are freshman, 13 are sophomores, 15 are juniors, and 4 are seniors
Calculate the probability of selecting a student from each classification
p(freshman) = 2/34 = .06
p(sophomore) = 13/34 = .38
p(junior) = .44
p(senior) = 4/34 = .12
Page 11:
Normal Distribution
Page 12:
Types of Distributions:
Theoretical Distributions
Real Distributions
Page 13:
Theoretical Distributions (Mathematical Distributions):
Determined by an equation
Based on an infinite number of scores
Appear in a smooth curve
Normal Distribution is the most common theoretical distribution, but not the only one
Inflection Point: Where the shape of the distribution changes
Page 14:
Normal Distribution Characteristics:
Theoretical distribution based on an infinite number of scores
Defined by an equation
Family/group of distributions
Can have various means and standard deviations
Mean, median, and mode fall in the same place (50th percentile)
Symmetric
Area under the curve = 1
Tails are asymptotic (never reach the x-axis)
Formula for Normal Distribution
Page 15:
Empirical Rule:
68.26% within 1 standard deviation
95.44% within 2 standard deviations
99.72% within 3 standard deviations
Page 16:
Real Distributions:
Based on a set of numbers, scores, or responses from individuals or objects that make up the population
Do not vary too drastically from theoretical distributions
Page 18:
Probability and Distributions:
Discrete Probability Distributions:
Computed when a random variable is a discrete variable
Example: predicting the likelihood of selecting a freshman from a set of 34 students
Continuous Probability Distributions:
Computed when a random variable is a continuous variable
When dealing with a true normal distribution, the assumption is that we have an infinite number of scores in the population
Therefore, we do NOT compute the probability of a particular outcome
We compute the probability associated with a range of possible outcomes
Example: The probability of selecting a woman who is between 62 and 65 inches
Page 21:
Z-scores
Page 22:
Relative Standing (Finding the Standard Score):
Z-scores: Number of standard deviations that a particular score is from the mean
Used to determine relative standing of a given data point within a set of data
Also called a z transformation or standardized score
By definition, z-scores are not associated with a unit of measurement
Page 23:
Formulas for Z-scores:
Population: z = (X - Mean) / Standard Deviation
Sample: z = (X - Mean) / (Standard Deviation / √Sample Size)
Data Needed to Compute Z-scores:
X = Raw Score
Mean
Standard Deviation
Page 24:
Relative Standing (Finding the Standard Score):
Sign: Tells if the raw score is above or below the mean
(+) means above
(-) means below
Magnitude: Indicates how far the raw score is from the mean
Page 25:
Standardize the following scores:
Raw SAT Scores: 200, 400, 700, 800, 500, 300, 600
Page 26:
Standardize the following scores:
Raw SAT Scores: 200, 400, 700, 800, 500, 300, 600
The sum of a set of z-scores will always equal 0
The mean for a set of z-scores will always equal 0
The standard deviation for a set of z-scores will always equal 1
Compute the mean and standard deviation for the z-scores
Page 27:
Convert Z-scores into Raw Scores:
X = (Z * Standard Deviation) + Mean
Used with population data
Used with sample data
Page 28:
Examples (Raw scores to z-scores):
Convert the raw scores to z-scores
Average grade for all students is 65, standard deviation is 6
Raw scores: 62, 71, 50, 64
Page 29:
Examples (z-scores to raw scores):
Convert the z-scores to raw scores
Average grade for all students is 65, standard deviation is 6
Z-scores: -3, +1.7, -1.33, +2.33
Page 30:
Example:
High school student wants to compare academic standing in three courses: Psychology, Mathematics, and Geology
Convert current grade in each class to z-scores
Assume data in each course is normally distributed
X: Current grade
𝜇: Mean
𝜎: Standard Deviation
Z: Z-score
Psychology: X = 68, 𝜇 = 65, 𝜎 = 6, Z = .5
Mathematics: X = 77, 𝜇 = 77, 𝜎 = 9, Z = 0
Geology: X = 83, 𝜇 = 89, 𝜎 = 8, Z = -.75
Page 32:
Example:
Use the data from the previous example
What raw score is one half standard deviation above the mean in the Geology course?
𝜇 = 89, 𝜎 = 8
Page 33:
Example:
Use the data from the previous example
A raw score of 65 is equal to what z-score in the Geology course?
𝜇 = 89, 𝜎 = 8
Page 34:
Z-score of 0 is equivalent to what raw score in the Geology course?
Data needed: µ (mean) and σ (standard deviation) for Geology course
Geology course: µ = 89, σ = 8
Page 35:
Standard Normal Distribution:
Graph/distribution of z-scores
x-axis: z-scores
y-axis: frequency of z-scores
Area under the curve equals 1 (probability)
Distributed in z-score units along the x-axis
Page 36:
Example: Comparing academic standing in three courses (Psychology, Mathematics, Geology)
Convert current grades to z-scores
Assume data in each course is normally distributed
Psychology: X = 68, µ = 65, σ = 6, Z = 0.50
Mathematics: X = 77, µ = 77, σ = 9, Z = 0
Geology: X = 83, µ = 89, σ = 8, Z = -0.75
Page 37:
Plot the answers from the first 3 questions on a graph:
Psychology (z = 0.50)
Mathematics (z = 0)
Geology (z = -0.75)
Page 38:
Plot the answers from the first 3 questions on a graph:
Psychology (z = 0.50)
x-axis: -3, -2, -1, 0, 1, 2, 3
Page 39:
Plot the answers from the first 3 questions on a graph:
Psychology (z = 0.50)
x-axis: -3, -2, -1, 0, 1, 2, 3
Page 40:
Plot the answers from the first 3 questions on a graph:
Psychology (z = 0.50)
Geology (z = -0.75)
Page 41:
Reminders:
Relationship between raw scores and z-scores
Know how to go back and forth between raw scores and z-scores
Location of area under the curve/proportion on the standard normal distribution
Location of z-scores on the standard normal distribution
Directionality:
Is the question asking about areas or values less than or below a score?
Is the question asking about areas or values greater than or above a score?
Proportion/Area under the curve (Entire area = 1)
Z-scores
Page 42:
Z-Table:
Column 1: z-scores
Column 2: Area under the curve between the mean and the z-score
Column 3: Area under the curve beyond the z-score
Column 4, Column 5, and Column 6 have the same information from Column 1, 2, and 3 with respect to a new set of z-scores
Page 43:
Areas under the curve:
What proportion of scores fall above or below a z-score
Drawing the normal distribution
Locate the z-score(s)
Determine whether the question is asking about above, below, or within a certain range
Shade the appropriate region under the normal curve
Look up the z-score in Table 1 (Appendix B)
Find the area
Example: What proportion of scores fall above a z-score of 1.50?
Page 44:
Areas under the curve:
What proportion of scores fall above or below a z-score
Drawing the normal distribution
Locate the z-score(s)
Determine whether the question is asking about above, below, or within a certain range
Shade the appropriate region under the normal curve
Look up the z-score in Table 1 (Appendix B)
Find the area
Example: What proportion of scores fall below a z-score of -2?
Page 45:
Areas under the curve:
What proportion of scores fall above or below a z-score
Drawing the normal distribution
Locate the z-score(s)
Determine whether the question is asking about above, below, or within a certain range
Shade the appropriate region under the normal curve
Look up the z-score in Table 1 (Appendix B)
Find the area
Example: What proportion of scores fall below a z-score of 0?
Page 46:
Areas under the curve:
What proportion of scores fall above or below a z-score
Drawing the normal distribution
Locate the z-score(s)
Determine whether the question is asking about above, below, or within a certain range
Shade the appropriate region under the normal curve
Look up the z-score in Table 1 (Appendix B)
Find the area
Examples: What proportion of scores fall between a z = 0 and z = 1?
Page 47:
Starting with raw scores:
Sometimes given raw scores and asked to find certain areas under the curve
Follow the same steps as when given z-scores, but first convert the raw score to a z-score
Fill in the rest of the table to practice
Psychology: X = 68, µ = 65, σ = 6, Z = 0.50, p between X and µ = 0.1915, p of score < X = 0.6915, p of score > X = 0.3085
Mathematics: X = 77, µ = 77, σ = 9, Z = 0, p between X and µ = 0, p of score < X = 0.5000, p of score > X = 0.5000
Geology: X = 83, µ = 89, σ = 8, Z = -0.75, p between X and µ = 0.2734, p of score < X = 0.2266, p of score > X = 0.7734
Page 48:
Starting with raw scores:
Sometimes given raw scores and asked to find certain areas under the curve
Follow the same steps as when given z-scores, but first convert the raw score to a z-score
Fill in the rest of the table to practice
Psychology: X = 68, µ = 65, σ = 6, Z = 0.50, p between X and µ = 0.1915, p of score < X = 0.6915, p of score > X = 0.3085
Mathematics: X = 77, µ = 77, σ = 9, Z = 0, p between X and µ = 0, p of score < X = 0.5000, p of score > X = 0.5000
Geology: X = 83, µ = 89, σ = 8, Z = -0.75, p between X and µ = 0.2734, p of score < X = 0.2266, p of score > X = 0.7734
Page 49:
Example: Cut-off grade for determining whether or not a student needs to see their Geology professor during office hours
Given: Area under the curve (percentage or probability)
Steps:
Draw the curve
Shade the corresponding area under the curve
Find the z-score associated with that area
(If asked, convert that z-score to a raw score)
Geology: µ = 89, σ = 8
Page 50:
Example: Cut-off grade for determining an A in the Psychology class
Steps:
Draw the curve
Shade the corresponding area under the curve
Find the z-score associated with that area
(If asked, convert that z-score to a raw score)
Psychology: µ = 65, σ = 6
Page 51:
Example: Cut-off grade for determining whether or not a student needs to retake the last exam in the Geology class
Given: Area under the curve (percentage or probability)
Steps:
Draw the curve
Shade the corresponding area under the curve
Find the z-score associated with that area
(If asked, convert that z-score to a raw score)
Geology: µ = 89, σ = 8
Page 52:
Geology professor told students to retake the last exam if their grade was in the bottom 75.80% of the class
Calculate the cut-off grade for determining if a student needs to retake the exam
Steps to find the cut-off grade:
Draw the curve
Shade the corresponding area under the curve
Find the z-score associated with that area
Convert the z-score to a raw score
Page 53:
Problems can be worded differently
Page 54:
Find the z-critical (c) given the probability of obtaining a z-score less than or greater than c
Steps to find the z-critical:
Draw the curve
Shade the corresponding area under the curve
Find the z-score associated with that area
Convert the z-score to a raw score
Page 55:
Example: Find c such that the probability that z is less than c is 0.1711
Steps to find c:
Draw the curve
Shade the corresponding area under the curve
Find the z-score associated with that area
Convert the z-score to a raw score
Page 56:
Example: Find c such that the probability that z is greater than c is 0.0250
Steps to find c:
Draw the curve
Shade the corresponding area under the curve
Find the z-score associated with that area
Convert the z-score to a raw score
Page 57:
Example: Find c such that z is greater than c is 0.5000
Steps to find c:
Draw the curve
Shade the corresponding area under the curve
Find the z-score associated with that area
Convert the z-score to a raw score
Page 58:
Sampling distribution of the mean
Page 59:
Raw scores
Sample means (individuals)
Sample means (groups)
Page 60:
Population of 3 people (N = 3)
Given scores for Amber, Brittany, and Christina
Page 61:
Sampling distribution of the mean for samples where n = 2
Scores for Amber, Brittany, and Christina
Page 62:
Sampling distribution of the mean for samples where n = 2
Scores and sample means for each participant
Page 63:
Plotting the frequencies of sample means on a normal distribution
Characteristics of a normal distribution
Computing the probability of selecting another sample mean
Page 64:
Key points about the sampling distribution of the mean:
The average of sample means equals the population mean
Even if raw scores are not normally distributed, sample means will approximate a normal distribution
The more samples drawn, the more the distribution looks like a normal distribution
Page 65:
Standard error of the mean
Given scores for Person A, Person B, and Person C
Page 66:
Standard error of the mean for samples where n = 2
Scores and sample means for each participant
Page 67:
Standard deviation of the raw scores is not equal to the standard error of the mean
Standard error of the mean is the standard deviation of the sampling distribution of the mean
Formula for standard error of the mean
Page 68:
Key points about the standard error of the mean:
Standard error can increase or decrease depending on the population standard deviation and sample size
As sigma decreases, the standard error decreases
As sample size increases, the standard error decreases
Sample means deviate closer to the population mean as sample size increases
Page 69:
In-class example
Page 70:
Reminders and observations:
Relationship between raw scores and z-scores
Location of area under the curve/proportion on the standard normal distribution
Location of z-scores on the standard normal distribution
Directionality of the question
As z-scores get larger, the area under the curve gets smaller
Page 71:
Comparing individuals to the population
Example: Where does Candace Parker, who is 76 inches tall, fall in relation to the average women in terms of height?
Page 72:
Comparing individuals to the population
Example: What proportion of women in the population are taller than Serena Williams, who is 69 inches tall?
Page 73:
Comparing individuals to the population
Example: What percentage of women are shorter than Misty Copeland?
Page 74:
Comparing groups to the population
Example: Where do WNBA players fall in relation to the average women in terms of height?
Page 75:
Comparing groups to the population
Example: What proportion of women in the population are taller than women tennis players?
Page 76:
Comparing groups to the population
Example: What percentage of women are shorter than professional ballet dancers?
Page 77:
Comparing groups to the population
Example: Probability of selecting a group of women with a mean greater than the average height for a group of women in the study
Page 78:
Null hypothesis testing
Page 79:
Descriptive Statistics:
Measures of central tendency (mean, median, mode)
Measures of Variability (IQV, IQR, SIQ, Variance, Standard Deviation)
Measures of Association (Cramer’s Phi, Point Biserial, Pearson’s r, Spearman’s rho)
Inferential Statistics:
z-tests
ANOVAS
Regression
Etc.
Normal Curve, Z-scores, Probability, Standard Error of the Mean
Page 80:
Goal of quantitative research is to describe the distribution of sample characteristics and make inferences about the population
Examples of situations where hypothesis testing is used
Page 81:
Hypothesis testing is a method for testing a claim or hypothesis about a parameter in a population using data from a sample
Steps in hypothesis testing:
State the statistical hypotheses
Select the statistical test and level of significance
Select the sample and collect the data
Find the region(s) of rejection
Calculate the test statistic
Make the statistical decision
Interpret and report the findings
Page 82:
Key concepts in statistics
Research question, research hypothesis, independent variable, dependent variable, scales of measurement, symbols for descriptive statistics and population parameters, characteristics of the normal distribution, using the z-table, locating the z-score on the normal curve
Page 83:
Statistical hypotheses are determined based on the research question and research hypothesis
Two types of hypotheses: Null and Alternative
Null hypothesis states no relationship or no statistically significant difference among groups
Denoted as H0: 𝝁 = population parameter (null hypothesized value)
Page 84:
Alternative hypotheses predict statistically significant relationships or differences among groups
Denoted as Ha
Page 85:
Directional alternative hypotheses specify the type of effect or relationship between the independent and dependent variables
Examples of directional alternative hypotheses
Page 86:
Nondirectional alternative hypotheses recognize the relationship between the independent and dependent variables without specifying the type of relationship
Page 87:
Different types of statistical tests and their conceptual meanings
One-sample tests, two-sample tests, two or more sample tests
Page 88:
Level of significance represents the probability that the observed relationships between variables happened by chance or sampling error
Symbolized using the Greek symbol alpha (α)
Chosen by the researcher prior to data collection and analysis
Page 89:
Explanation of why the level of significance is often set at .05
The empirical rule and the probability of selecting a mean that is greater or less than 2 standard deviations from the population mean
Page 90:
Selecting a representative sample from the population and collecting data
Caution against selection bias
Page 91:
Rejection region(s) in the sampling distribution that determine whether to reject or retain the null hypothesis
One-tailed hypotheses have one rejection region, two-tailed hypotheses have two rejection regions
Page 92:
Finding the rejection region for a one-tailed test
Finding the area under the curve and the corresponding z-score (z-critical)
Page 93:
Directional (one-tailed) hypothesis with one rejection region
Page 94:
Directional (one-tailed) hypothesis with one rejection region
Page 95:
Finding the rejection regions for a two-tailed test
Dividing alpha by 2 to create two rejection regions
Finding the area under the curve and the corresponding z-scores (z-criticals)
Page 96:
Nondirectional (two-tailed) hypothesis with two rejection regions
Page 97:
Calculating the test statistic by entering the data into the formula for the statistical test
Page 98:
P-value
Definition: probability of obtaining a sample mean, given that value stated in the null hypothesis is true.
Probability of obtaining a value as extreme or more extreme as the calculated test statistic
Each test statistic (z, t, F, etc.) has an associated p-value
For z statistics, we can determine the exact p-value from the z-table
For one-tailed tests
Two-tailed tests require that you multiply this value by 2
For t-tests we can determine a range for the p-value from the t-table
Due to the fact that we have a family of t-distributions
Statistical software programs allow us to determine the exact p-value
Page 99, 100, 101, 102:
MAKE THE STATISTICAL DECISION
If the calculated test statistic falls in the rejection region, then we reject the null hypothesis.
If the absolute value of the test statistic is greater than the critical value, then reject the null hypothesis.
Page 103:
INTERPRET THE FINDINGS
The process of explaining the statistical decision in relation to the research hypothesis.
Page 104:
ONE SAMPLE TESTS
Page 105:
Population Sample
Page 106:
Population 1 Sample Population 2
Page 107:
ONE SAMPLE STATISTICAL TESTS
How many samples do I have?
Do I know sigma (σ)?
Do I have a large sample? (n > 30)
ONE YES NO NO YES ONE SAMPLE Z-TEST (NORMAL DEVIATE Z-TEST) LARGE SAMPLE Z-TEST ONE SAMPLE T-TEST
Page 108:
TRIP DOWN MEMORY LANE….TO THE LAND OF Z-SCORES
Use these formulas, when we want to know how far a score is from the mean in standard deviation units.
z = Population
z = Sample
Page 109:
FORMULA FOR ONE-SAMPLE Z-TEST (NORMAL DEVIATE Z-TEST)
Notice this formula is very similar to the formula for individual raw scores.
Denominator is called the standard error of the mean
z = Data Bank: : Sample mean : Population Mean : Population Standard Deviation n = sample size
Page 110:
FORMULA FOR LARGE SAMPLE Z-TEST (NORMAL DEVIATE Z-TEST)
Notice this formula is very similar to the formula for individual raw scores.
Denominator is called the standard error of the mean
z = Data Bank: : Sample mean : Population Mean : Sample Standard Deviation n = sample size
Page 111:
FORMULA FOR ONE-SAMPLE T-TEST (NORMAL DEVIATE Z-TEST)
Notice this formula is very similar to the formula for individual raw scores.
Denominator is called the standard error of the mean
t = Data Bank: : Sample mean : Population Mean : Sample Standard Deviation n = sample size
Page 112:
WHY T?
Reason 1: As sample size (n) gets smaller, the less sample standard deviation (s) becomes a good estimate of the population standard deviation (σ).
Sample Size
Sample Standard Deviation
Population Standard Deviation
Page 113:
WHY T?
Data Set 1
Data Set 2
Data Set 3
Data Set 4
Standard Deviation
Reason 2: When the sample size stays the same, the standard deviation can fluctuate a lot.
Page 114:
WHY T?
Reason 3: As the sample size gets smaller, the more the sampling distribution becomes less normally distributed.
So we cannot assume that the sampling distribution of the mean is normally distributed
As a result, we no longer use the z-statistic and the z-distribution.
We use the t-statistic and one of the t-distributions instead.
Page 115:
T-DISTRIBUTIONS
Family of distributions
There are multiple t-distributions that are determined based on the sample size
As the sample size gets larger, the more the t-distributions look like the normal distribution
Leptokurtic (peaked in the center)
The tails are slightly raised
Symmetric
Unimodal
Page 116:
Identical with normally distributed Z
As the sample size increases, the more N = 8 the t-distributions are N= 15 normally distributed.
N= 5 f Sample Size
Page 117:
INFORMATION ABOUT T
Small sample statistic
Used when we have small samples (n < 30)
We determine which t-distribution to use based on degrees of freedom
Definition: how many numbers are free to change in a calculation sequence.
Page 118:
X Mean Deviations (x-mean) 2
In this example, one cell was free to vary.
Formula for degrees of freedom (df) for one-sample t-test is n -1
Note: Degrees of Freedom is not new! We divide by (n-1) in our unbiased variance and standard deviation formulas.
Page 119:
INFORMATION ABOUT T-TABLE
We use degrees of freedom and alpha to tell us the critical values for t.
Critical values for t are found in a t-table.
Degrees of freedom (df) are in the first column.
Alphas (in the first few rows).
When you have a one-tailed hypothesis, alpha is in the first row (Level of significance for one-tailed test)
When you have a two-tailed hypothesis, alpha is in the second row (Level of significance for two-tailed test)
Page 120:
Alpha N Degrees of Freedom T-critical for one-tailed test T-criticals for two-tailed test
.05 6 5 2.015 +/-2.571
.01 20 19 2.539 +/- 2.861
.05 30 29 1.699 +/- 2.045
Page 121:
EXAMPLES
Page 122:
EXAMPLE
In 2013, the average (μ) Math SAT Score was 488 (σ = 114).
Let’s say we are educational psychologists and we want to develop a program to improve student performance on the math portion of the SAT.
We recruit 50 students from Petersburg High School and ask them to participate in our Advanced Math Program.
At the end of our 6-week program, the students take the math SAT.
We are proud of them because the average score for all 50 students was 524!
Is their average significantly higher than average score on the Math SAT?
Research Question:
Research Hypothesis:
Independent Variable:
Dependent Variable:
SOM Independent Variable:
SOM Dependent Variable:
How many samples?
Population Mean = 488
Population Standard deviation = 114
Sample mean = 524
Page 123:
STATE THE NULL AND ALTERNATIVE HYPOTHESES (STATISTICAL HYPOTHESES)
Null hypothesis: There is no difference between the average math SAT score for our sample and the average math SAT score for the population.
Directional Alternative Hypothesis: The average math SAT score the students in the Advanced Math Program (our sample) is significantly higher the average math SAT score for the population.
Page 124:
Statistical hypotheses:
Null hypothesis (H0): μAdvancedMathProgram = μPopulation
Alternative hypothesis (Ha): μAdvancedMathProgram > μPopulation
Population mean (μ) = 488
Page 125:
Select the statistical test:
One sample tests
Determine the number of samples, knowledge of sigma, and sample size:
One sample
No knowledge of sigma
Sample size (n) > 30
Possible tests:
One sample z-test
Large sample z-test
One sample t-test
Page 126:
Select the level of significance (alpha level):
Convention in Psychology: alpha (𝝰) level of .05
Page 127:
Select the sample and collect data:
50 students from Petersburg High School
Page 128:
Find the rejection regions:
Determined based on the level of significance and the alternative hypothesis
One-tailed test:
Find the area under the curve that equals the level of significance (e.g., .05)
Find the corresponding z-score (z-critical)
Two-tailed test:
Divide alpha into two to create two regions of rejection
Look for an area under the curve that equals half of the level of significance (e.g., .025)
The rejection regions change with the level of significance
Page 129:
Calculate the test statistic:
Data: 488, 114, 524
Sample size (n) = 50
Plug the data into the formula
Page 130:
Make the statistical decision:
If the test statistic falls in the rejection region, reject the null hypothesis
If the test statistic does not fall in the rejection region, retain the null hypothesis
Page 131:
Statistical decision for directional research hypothesis:
Z-critical values: 1.645, 2.23
Region of retention, region of rejection
Decision: Reject the null hypothesis
Page 132:
Interpret the findings:
The average Math SAT score for students in the Advanced Math Program (524) was statistically significantly higher than the average math SAT score for the population (μ = 488)
Page 133:
Example 2: Researchers do not know the population standard deviation
Claim: Mean salary of the company's mechanical engineers is different than the national average ($68,000)
Sample data: 30 mechanical engineers, mean salary = $66,900, standard deviation = $5,500
Test the employees' claim at α = 0.05
Page 134:
State the statistical hypotheses:
Research hypothesis: Mean salary of the company's mechanical engineers is different than the national average
Statistical hypotheses:
Null hypothesis (H0): μCompanyChemicalEngineers = 68,000
Alternative hypothesis (Ha): μCompanyChemicalEngineers ≠ 68,000
Page 135:
Select the statistical test:
One sample tests
Determine the number of samples, knowledge of sigma, and sample size:
One sample
No knowledge of sigma
Sample size (n) > 30
Possible test: Large sample z-test
Page 136:
Select the statistical test and level of significance:
Large sample z-test
Use sample standard deviation as an estimate of sigma
Page 137:
Find regions of rejection:
Z-critical values: -1.960, 1.960
𝝰/2 = .025
Page 138:
Calculate the test statistic:
Data: $68,000, $5,500, $66,900
Sample size (n) = 30
Calculate z
Page 139:
Make the statistical decision:
Z-critical values: -1.960, 1.960
Z-obt = -1.09
Decision: Retain the null hypothesis
Page 140:
Interpret the findings:
The claim that the mean salary of the company's mechanical engineers is different than the national average is false
The average salary for the company's employees ($66,900) is not statistically significantly different than the national average ($68,000)
Page 141:
One-sample t-test
Page 142:
Select the statistical test:
One sample tests
Determine the number of samples, knowledge of sigma, and sample size:
One sample
No knowledge of sigma
Sample size (n) > 30
Possible tests:
One sample z-test
Large sample z-test
One sample t-test
Page 143:
Reasons for using t-test:
Sample size (n) gets smaller
Sample standard deviation (s) becomes a better estimate of population standard deviation (σ)
Standard deviation can fluctuate with a constant sample size
Page 144:
Reasons for using t-test:
Sample size gets smaller
Sampling distribution becomes less normally distributed
Cannot assume normal distribution, use t-statistic and t-distributions instead
Page 145:
T-distributions:
Family of distributions
Multiple t-distributions based on sample size
As sample size increases, t-distributions resemble normal distribution
Leptokurtic, slightly raised tails, symmetric, unimodal
Page 146:
Information about t:
Used for small samples (n < 30)
Determine which t-distribution to use based on degrees of freedom
Degrees of freedom (df) = n - 1
Page 149:
Degrees of freedom (df) for one-sample t-test is n - 1
Degrees of Freedom is not new, used in unbiased variance and standard deviation formulas
Note
Page 150
Degrees of freedom (df) and alpha are used to determine critical values for t.
Critical values for t can be found in a t-table.
The first column of the t-table contains degrees of freedom (df).
The first few rows of the t-table contain alphas (area under the curve).
For a one-tailed hypothesis, alpha is in the first row.
For a two-tailed hypothesis, alpha is in the second row.
Page 151
Examples of t-critical values for different alphas and degrees of freedom:
Alpha = 0.05, df = 6, t-critical for one-tailed test = 2.015, t-critical for two-tailed test = +/-2.571
Alpha = 0.01, df = 20, t-critical for one-tailed test = 2.539, t-critical for two-tailed test = +/-2.861
Alpha = 0.05, df = 30, t-critical for one-tailed test = 1.699, t-critical for two-tailed test = +/-2.045
Page 152
Example 4: A schoolteacher wants to test the hypothesis that her students watch more TV than the average American child.
She records the number of hours of TV each of her 15 students watch per day.
The average number of hours was 5.98 and the standard deviation was 1.21.
She wants to test the hypothesis using a significance level of 0.01.
Page 154
Differences vs. Relationships:
Are faculty in one department more satisfied than faculty in another?
What is the average in the population?
Do children consume more red cookies than blue cookies?
Page 155
Differences vs. Relationships:
Is there a relationship between midterm scores and final exam scores?
Is caloric intake related to weight?
Is there a relationship between hours spent exercising and weight?
Page 156
Correlation is a statistical procedure used to describe the strength and direction of the relationship between two factors.
Correlation measures the tendency for two variables to vary or change together.
Correlation can be used to describe the pattern of change in values of two factors and determine if the pattern is present in the population.
Page 158
Correlation coefficient is a measure used to quantify the relationship between variables.
It measures the strength and direction of the relationship.
Correlation coefficient can be used to determine if the observed pattern in a sample is present in the population.
Page 159
Examples of correlation coefficients:
Pearson's r product moment correlation coefficient
Spearman's rho correlation coefficient
Point bi-serial correlation coefficient
Pearson's chi-square correlation coefficient
Page 160
Correlation does not imply causation.
There may be a relationship between variables, but it does not mean that one variable caused a change in the other variable.
Page 161
Pearson's r product moment correlation coefficient is a measure of the linear relationship between two factors.
It is used when the data for both factors are measured on an interval or ratio scale.
Page 162
Assumptions for Pearson's r correlation coefficient:
Linearity: The best way to describe the pattern of data is using a straight line.
Normality: The data points in the population for both variables are normally distributed.
Bivariate normal distribution: When the data from both variables are plotted together, they form a normal distribution.
Page 163
Pearson's r correlation coefficient is the ratio of how much the variables change together to how much they vary separately.
Covariance measures the extent to which the values of two factors vary together.
Page 164
Interpretation of Pearson's r:
Positive (+): As one variable increases, the other increases. As one variable decreases, the other decreases.
Negative (-): As one variable increases, the other decreases. As one variable decreases, the other increases.
Page 165
Interpretation of Pearson's r:
Stronger: The closer the value of r is to -1 or +1, the stronger the relationship.
Weaker: The closer the value of r is to 0, the weaker the relationship.
Page 166
Interpretation of Pearson's r:
Magnitude interpretation:
0.0 < |r| < 0.10: Little if any relationship.
0.10 < |r| < 0.30: Weak relationship.
0.30 < |r| < 0.50: Moderate relationship.
0.50 < |r| < 1.0: Strong relationship.
Page 168
Testing the null hypothesis:
Null hypothesis: There is no linear relationship between the variables.
Nondirectional alternative hypothesis: There is a linear relationship between the variables.
Directional alternative hypothesis: The relationship is negative or positive.
Page 172
Regions of rejection:
Information needed: Alpha, direction or non-directional hypothesis, degrees of freedom.
For a directional hypothesis, there is one region of rejection.
For a non-directional hypothesis, there are two regions of rejection.
Page 176
Pearson's correlation coefficient is not the best correlation coefficient if the relationship between variables is not linear.
Curvilinear relationship: A relationship between variables that can be best described with a curved line.
Page 178
One sample tests:
Compare population with one independent variable (nominal scale) and one dependent variable (interval or ratio scale).
Page 179
One sample tests:
Compare population 1 with one independent variable (nominal scale) and one dependent variable (interval or ratio scale).
Page 180
Measures of association:
Pearson's r correlation coefficient for one sample population.
Page 182
Example: A social scientist wants to study the relationship between computer use and daily exercise.
She asks 4 participants to record the number of hours they spend using a computer and the average amount of time per week they spend exercising.
The data is recorded in a table.
Page 184
Normal distribution:
Theoretical distribution based on an infinite number of scores.
Defined by an equation.
Can have various means and standard deviations.
Mean, median, and mode fall in the same place (50th percentile).
Symmetric.
Area under the curve = 1.
Tails are asymptotic (never reach the x-axis).
Empirical Rule
Page 185:
Empirical Rule percentages:
34.13%
34.13%
13.59%
13.59%
0.13%
2.14%
2.14%
0.13%
Values:
-3
-2
-1
0
1
2
3
Corresponding percentages:
68.26%
95.44%