1/15
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
LO7 - what is the HATPC framework for hypothesis testing
Hypothesis - null - change is because of chance
alternative - change is not because of chance
two-sided - tests the extremes on both ends, whether increasing or decreasing - an expected proportion
Assumptions - made through the context
independence
equal chance
Test Statistic - mimics a z-score - the standardisation
OV - EV / SE mimics : OV - mean/ SD
P-value - tests is the null is retained or rejected - many ways to calculate it depending on the test
Conclusion - compare p value with significance level - usually 0.05
LO7 - what is a proportion test?
we compare the observed % of results with the expected percetnage of results
a type of hypothesis test
LO8 - what are common assumptions about p values and how do they interact with two sided hypothesis
mistakes:
p value doesn’t mean that the chance null is true
large p value doesn’t mean the null is true
p values for 1 sided - specifies the direction of alternative hypothesis
p values for 2 sided - doesn’t specify direction of alternative, we need to double the p value
How to test hypothesis in r studio
set up a box model through manual calculations
run a test stat through the formula (manually created with the variables OV, EV, SE etc)
p value is: p_value = 2 * (1 - pnorm(test_stat)) - this is multiplied by two for a two sided hypothesis
# Plot the normal curve
# plot the 68, 95, and 99.7 thresholds through abline()
# plot the test statistic through abline()
# plot the absolute value of the test statistic, since its two sided through abline()
OR
We can use prop.test()
to perform the test automatically.
prop.test(OV (observed value), n(sample size), p(prob of success for the null), alt = "two.sided", correct = FALSE)
.
LO7 & LO8 outline the details and differences between one sample z-tests and one sample t-tests?
z tests:
used for the mean of a pop if the pop sd is known
H: remains the same
A: independence, normality, popSD known
T: find the test statistic to plot on the normal curve
P: as usual - below the significant threshold
C: use statistical (reference the p) and scientific (explain what this means effect or no effect)
in comparison with z test proportions:
continuous not binary
uses only the sample mean
must know the pop sd
t tests:
used when we don’t know the popsd, use t distribtions that look different based on the degrees of freedom - based on sample size - distribution: t(n-1) + accounts for extra variability
H: remains the same
A: normality and independence
T: use the sample SD instead
P: same but expected to be larger because of increased uncertainty (more degress of freedom = larger tails = uncertainty)
C: ~
LO7 & LO8 - outline the details of two sample t tests
has two groups - compares the mean of two populations
Hypothesis:
two sided: there is difference in groups, one sided: this group is more than the other
Assumptions:
2 samples are independent
2 samples have equal variance
2 populations are normal
Test Statistic
use the eqn just with double the group so double the mean
P Value
same
Conclusion:
same
LO7 & LO8 How to test the assumptions for two sample t tests? and what if they are not met?
comparative boxplots - normality and equal variance
levene’s test - f -tests - equal spread
tests the null that the two pops have equal variance
qq-plot - for normality
shapiro-wilk test - for normality
If not met":
welch 2 sample t test - more accurate
non-normality - transform non-normal data
non-independence - one group with a before and after
paired t-test - one sample t test on the differences in means not the mean difference
what are the r studio codes for 1 and 2 sample tests?
The t.test()
function takes the following inputs:
x, y: The vectors of data for group x and group y.
mu: The hypothesised mean (one-sample) or difference (two-sample, paired).
alternative: Which tail(s) you are interested in, depending on your alternative hypothesis.
var.equal: Whether the variances of group x and group y are the same.
paired: Whether the data is paired.
We can use pt()
to find p
robabilities using a t
-distribution, similar to how pnorm()
uses the Normal distribution:
pt(<QUANTILE VALUE>, <df = degrees of freedom>, lower.tail = <TRUE/FALSE>)
what are the r studio codes to check assumptions?
qq plot - Use the stat_qq()
and stat_qq_line()
functions
box-plot - ggplot type shit
What are chi-squared tests?
tests the relationships between qualitative variables
tests for:
goodness of fit → 1 qual
whether the variable distribution matches the expected theoretical distribution
tests for the over or under representation of variables
independence → 2 qual
whether variables effect one another
Describe the goodness of fit testing
compare the OF (observed frequency) with the EF (expected frequency) and look for gaps
H: same - there is or isnt a difference
A: independences , EV frequencies are <5 for 20% - Cochran’s Rule
T: X2 = sum ( (OF-EF)2/EF) - testing for chi-squared
P: degrees of freedom = k - 1 - k = categories, X2k-1 distribution - uppertail
C: usual
Describe testing for independence
H: null - independence - no association
alt - no I, ass
A: Cochran’s rule, independence
T: X2 eqn
P: df: (m-1)(n-1) - m,n = categories, X2(m-1)(n-1) distribution - upper tail
What are mosaic plots?
plots standardised residuals
with shading
What is cochran’s rule isn’t satisfied:
if you also can’t increase sample size
use fishers.test in R
assumption: independent observations
Describe a Regression Test for the Slope in 1 sample t-Tests
test if a slope is significant or not
H: there is or is not a linear trend
A:
Linearity of variables - relationships look linear - check in scatterplot, residual plot
Independence of residuals - experimental design
Normality of residuals - qqplot - indicates normality, shapiro-wilk test
homoscedasicity of residuals - residual plot
T: OV-EV/SE = B1/SEb1with n-p - 1 (n= sample size, p=independent variables)
P: use tn-2 curve to find tail areas
C: usual
r studio for chi-tests and linear relationships
chisq.test() forms chi-squared tests
chisq.test(x, y, correct = FALSE, p = rep(1/length(x), length(x)))
lm() - find the linear model
summary() gives you the test