1/14
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
measurements of central tendency
mean affected by outliers
median unaffected by outliers,
mode unaffected by outliers
measures of dispersion
min
max
range
quartiles
IQR Q3-Q1 middle half of data
variance, how close/far away data points are to mean differ
std dev. spread how much deviation from mean
parameter vs statistic
parameter = fixed unknown numerical value that describes an entire population greek letter y/u, ex = gpa of all students
statistic =calculated value from a sample used to estimate that same parameter sample mean x bar, ex = gpa of sample of 50 students
descriptive vs inferential
descriptive = summarizes main features of dataset, statistic measures and visualization
inferential = make predictions or inferences about a population based on a sample of data, conf intervals, hypothesis testing, regression analysis
measure of shape
skewness = measure of asymmetry of a distribution, positive = right, negative = left
kurtosis = heaviness of tails in distribution, normal = 3, heavy >3, light <3
1 sample tests
z and t
z test
1-sample test on 𝝁: Z test
◼ Determine whether the sample mean is significantly different
from a known population mean
◼ 𝑍𝑠𝑡𝑎𝑡 = 𝑥−𝜇
𝜎/ 𝑛
◼ Use Case:
◼ The population standard deviation 𝜎 is known
◼ Sample size is large (𝑛 > 30)
2-Tailed Test 1-Tailed
Test
(right tail)
1-Tailed
Test (left
tail)
𝑯𝟎: 𝝁 = 𝒌 𝑯𝟎: 𝝁 ≤ 𝒌 𝑯𝟎: 𝝁 ≥ 𝒌
𝑯𝒂: 𝝁 ≠ 𝒌 𝑯𝒂: 𝝁 > 𝒌 𝑯𝒂: 𝝁 < 𝒌
t test
Determine whether the sample mean is significantly different
from a known population mean
◼ 𝑡𝑠𝑡𝑎𝑡 = 𝑥−𝜇
𝑠/ 𝑛
◼ Use Case:
◼ The population standard deviation 𝜎 is unknown
◼ Sample size is small (𝑛 ≤ 30)
2-Tailed Test 1-Tailed
Test
(right tail)
1-Tailed
Test (left
tail)
𝑯𝟎: 𝝁 = 𝒌 𝑯𝟎: 𝝁 ≤ 𝒌 𝑯𝟎: 𝝁 ≥ 𝒌
𝑯𝒂: 𝝁 ≠ 𝒌 𝑯𝒂: 𝝁 > 𝒌 𝑯𝒂: 𝝁 < 𝒌
2 sample tests on mean
independent t test
Determine if there’s a statistically significant difference between the means of
two independent samples.
◼ 𝑡𝑠𝑡𝑎𝑡 = 𝜇1 –𝜇2
𝑠12 𝑠22
𝑛1 + 𝑛2
◼ Null and Alternative Hypothesis
◼ Two-tailed: 𝐻0: 𝜇1 = 𝜇2; 𝐻𝑎: 𝜇1 ≠ 𝜇2
◼ Right-tailed: 𝐻0: 𝜇1 = 𝜇2; 𝐻𝑎: 𝜇1 > 𝜇2
◼ Left-tailed: 𝐻0: 𝜇1 = 𝜇2; 𝐻𝑎: 𝜇1 < 𝜇2
dependent z test
Determine if there’s a statistically significant difference
between the means of two related or dependent (paired)
groups.
◼ 𝑡𝑠𝑡𝑎𝑡 = 𝐷
𝑆𝐷/ 𝑛
◼ Null and Alternative Hypothesis
◼ Two-tailed: 𝐻0: 𝜇𝐷 = 0; 𝐻𝑎: 𝜇𝐷 ≠ 0
◼ Right-tailed: 𝐻0: 𝜇𝐷 = 0; 𝐻𝑎: 𝜇𝐷 > 0
◼ Left-tailed: 𝐻0: 𝜇𝐷 = 0; 𝐻𝑎: 𝜇𝐷 < 0
Hypothesis
Null Hypothesis: 𝐻0
◼ A statement that the value of a population parameter;
◼ Always contain an equality.
◼ Alternative Hypothesis: 𝐻𝑎
◼ Statement contradictory to the null hypothesis;
◼ Always contain an inequality.
f P-value ≤ 𝛼, reject 𝐻0 at 𝛼 significance level.
◼ If the 𝐻0 is rejected, the 𝐻𝑎 is accepted.
◼ If P-value > 𝛼, fail to reject 𝐻0 at 𝛼 significance level.
◼ If the 𝐻0 is accepted, the 𝐻𝑎 is rejected.
◼ Conclusion:
◼ Initial conclusion: acceptance or rejection of the 𝐻0.
◼ Final conclusion: expressed in terms of the original claim
2+ samples tests on mean
ANOVA
One way
Determine whether there are statistically significant differences between the means of
three or more independent groups based on one factor.
◼ 𝐻0: 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑘
◼ 𝐻𝑎: At least one of the group means is different
𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝐵𝑒𝑡M𝑒𝑒𝑛 (𝑀𝑆𝐵)
𝐹𝑠𝑡𝑎𝑡 = 𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝖶𝑖𝑡ℎ𝑖𝑛 (𝑀𝑆𝖶)
Two way
Determine whether there are statistically significant differences between the
means of groups based on two independent factors and also examines the
interaction between those factors.
◼ Main effects: The individual effect of each factor on the dependent variable.
◼ Interaction effects: Whether the effect of one factor depends on the level of the other factor
𝐻01 : The means of the first factor are equal (e.g., no effect of diet).
◼ 𝐻02 : The means of the second factor are equal (e.g., no effect of exercise).
◼ 𝐻0interaction: There is no interaction effect between thetwo factors (e.g., diet and
exercise do not interact to influence weight loss).
◼ 𝐻𝑎: At least one of the group means is different or there is an interaction.
and after ANOVA Tukey’s HSD
Tukey’s HSD makes pairwise comparisons between all possible pairs of group means.
◼ 𝐻0: 𝜇𝑖 = 𝜇j for all pairs of groups
◼ The difference between the means of the two groups being compared is not significant.
◼ 𝐻𝛼: 𝜇𝑖 ≠ 𝜇j for at least one pair
◼ The difference between the means of the two groups being compared is significant
hypothesis tests on variance
Chi squared = 1 sample
To determine if the variance of a single sample is equal to a
known or hypothesized population variance.
(𝑛–1)𝑠2
σ02
◼ One-tailed vs Two-tailed Test :
◼ Two-tailed: 𝐻0: 𝜎2 = 𝜎02; 𝐻𝑎: 𝜎2 ≠ 𝜎0 2
2 2 2◼ Right-tailed: 𝐻0: 𝜎 = 𝜎0 ; 𝐻𝑎: 𝜎 > 𝜎02
2 2 2 2
◼ Left-tailed: 𝐻0: 𝜎 = 𝜎0 ; 𝐻𝑎: 𝜎 < 𝜎0
F test = 2 sample
o determine if the variances of two independent
samples are equal.
𝑠12
◼ 𝐹𝑠𝑡𝑎𝑡 = 𝑠22
◼ One-tailed vs Two-tailed Test :
◼ Two-tailed: 𝐻0: 𝜎12 = 𝜎22; 𝐻𝑎: 𝜎12 ≠ 𝜎2 2
2 2 2◼ Right-tailed: 𝐻0: 𝜎1 = 𝜎2 ; 𝐻𝑎: 𝜎1 > 𝜎2 2
2 2 2 2
◼ Left-tailed: 𝐻0: 𝜎1 = 𝜎2 𝐻𝑎: 𝜎1 < 𝜎2
correlation
Pearson’s correlation coefficient (𝒓):
◼ Measuring a linear correlation between two continuous
variables.
𝑟 = ∑(𝑥𝑖–𝑥̅ )(𝑦𝑖–𝑦̅)
∑ 𝑥𝑖–𝑥̅ 2 ∑ 𝑦𝑖 –𝑦̅ 2
◼ 𝑟 ranges from -1 (perfect negative correlation) to +1 (perfect
positive correlation), with 0 indicating no correlation.
◼ Correlation Strength:
➢ Weak: Magnitude Below 0.3
➢ Moderate: Magnitude Between 0.3 and 0.6
➢ Strong: Magnitude Above 0.7
tests of independence
chi squared
𝐻0: Variable A and Variable B are independent in population.
◼ 𝐻𝑎: Variable A and Variable B are not independent in population.
2
◼ 𝜒 = ∑ (0𝑖j–𝐸𝑖j)2
𝐸𝑖j
◼ Use case:
◼ The data must be in a contingency table.
➢ Not only 2 by 2 but also be applicable to 𝒊×𝒋
◼ Expected frequency in each cell ≥ 𝟓 for reliable results.
Fisher’s exact test
P value is calculated directly: 𝑃 =
𝑎+𝑐 𝑏+𝑑
𝑎 𝑏
𝑛
𝑎+𝑏
◼ Use case:
◼ Data must be in a 𝟐×𝟐 contingency table.
◼ Expected frequency in each cell could be < 𝟓.
A1 A2 Row Total
B1 a b a+b
B2 c d c+d
Column Total a+c b+d a+b
goodness of fit
chi squared
𝐻0: The population follows a specified distribution.
◼ 𝐻𝑎: The population does not follow the specified distribution.
◼ The hypotheses will depend on the research question.
2
◼ 𝜒 = ∑ (0𝑖–𝐸𝑖)2
𝐸𝑖
◼ Use Case:
◼ The data must be in one categorical variable.
◼ Expected frequency in each category ≥ 𝟓 for reliable results.
◼ Each observation falls into only one category