Decisions with Data

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/15

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

16 Terms

New cards

LO7 - what is the HATPC framework for hypothesis testing

Hypothesis - null - change is because of chance

alternative - change is not because of chance

two-sided - tests the extremes on both ends, whether increasing or decreasing - an expected proportion

Assumptions - made through the context

independence

equal chance

Test Statistic - mimics a z-score - the standardisation

OV - EV / SE mimics : OV - mean/ SD

P-value - tests is the null is retained or rejected - many ways to calculate it depending on the test

Conclusion - compare p value with significance level - usually 0.05

New cards

LO7 - what is a proportion test?

we compare the observed % of results with the expected percetnage of results

a type of hypothesis test

New cards

LO8 - what are common assumptions about p values and how do they interact with two sided hypothesis

mistakes:

p value doesn’t mean that the chance null is true
large p value doesn’t mean the null is true

p values for 1 sided - specifies the direction of alternative hypothesis

p values for 2 sided - doesn’t specify direction of alternative, we need to double the p value

New cards

How to test hypothesis in r studio

set up a box model through manual calculations
run a test stat through the formula (manually created with the variables OV, EV, SE etc)
p value is: p_value = 2 * (1 - pnorm(test_stat)) - this is multiplied by two for a two sided hypothesis
# Plot the normal curve
# plot the 68, 95, and 99.7 thresholds through abline()
# plot the test statistic through abline()
# plot the absolute value of the test statistic, since its two sided through abline()

We can use prop.test() to perform the test automatically.

prop.test(OV (observed value), n(sample size), p(prob of success for the null), alt = "two.sided", correct = FALSE) .

New cards

LO7 & LO8 outline the details and differences between one sample z-tests and one sample t-tests?

z tests:

used for the mean of a pop if the pop sd is known

H: remains the same

A: independence, normality, popSD known

T: find the test statistic to plot on the normal curve

P: as usual - below the significant threshold

C: use statistical (reference the p) and scientific (explain what this means effect or no effect)

in comparison with z test proportions:

continuous not binary
uses only the sample mean
must know the pop sd

t tests:

used when we don’t know the popsd, use t distribtions that look different based on the degrees of freedom - based on sample size - distribution: t(n-1) + accounts for extra variability

H: remains the same

A: normality and independence

T: use the sample SD instead

P: same but expected to be larger because of increased uncertainty (more degress of freedom = larger tails = uncertainty)

C: ~

New cards

LO7 & LO8 - outline the details of two sample t tests

has two groups - compares the mean of two populations
Hypothesis:
two sided: there is difference in groups, one sided: this group is more than the other
Assumptions:
2 samples are independent
2 samples have equal variance
2 populations are normal
Test Statistic
use the eqn just with double the group so double the mean
P Value

same

Conclusion:

same

New cards

LO7 & LO8 How to test the assumptions for two sample t tests? and what if they are not met?

comparative boxplots - normality and equal variance
levene’s test - f -tests - equal spread
- tests the null that the two pops have equal variance
qq-plot - for normality
shapiro-wilk test - for normality

If not met":

welch 2 sample t test - more accurate
non-normality - transform non-normal data
non-independence - one group with a before and after
paired t-test - one sample t test on the differences in means not the mean difference

New cards

what are the r studio codes for 1 and 2 sample tests?

The t.test() function takes the following inputs:

x, y: The vectors of data for group x and group y.
mu: The hypothesised mean (one-sample) or difference (two-sample, paired).
alternative: Which tail(s) you are interested in, depending on your alternative hypothesis.
var.equal: Whether the variances of group x and group y are the same.
paired: Whether the data is paired.

We can use pt() to find probabilities using a t-distribution, similar to how pnorm() uses the Normal distribution:

pt(<QUANTILE VALUE>, <df = degrees of freedom>, lower.tail = <TRUE/FALSE>)

New cards

what are the r studio codes to check assumptions?

qq plot - Use the stat_qq() and stat_qq_line() functions

box-plot - ggplot type shit

New cards

What are chi-squared tests?

tests the relationships between qualitative variables

tests for:

goodness of fit → 1 qual
- whether the variable distribution matches the expected theoretical distribution
- tests for the over or under representation of variables
independence → 2 qual
- whether variables effect one another

New cards

Describe the goodness of fit testing

compare the OF (observed frequency) with the EF (expected frequency) and look for gaps

H: same - there is or isnt a difference
A: independences , EV frequencies are <5 for 20% - Cochran’s Rule
T: X²= sum ( (OF-EF)²/EF) - testing for chi-squared
P: degrees of freedom = k - 1 - k = categories, X²_k-1 distribution - uppertail
C: usual

New cards

Describe testing for independence

H: null - independence - no association

alt - no I, ass

A: Cochran’s rule, independence

T: X² eqn

P: df: (m-1)(n-1) - m,n = categories, X²_(m-1)(n-1)distribution - upper tail

New cards

What are mosaic plots?

plots standardised residuals
with shading

New cards

What is cochran’s rule isn’t satisfied:

if you also can’t increase sample size
use fishers.test in R
assumption: independent observations

New cards

Describe a Regression Test for the Slope in 1 sample t-Tests

test if a slope is significant or not

H: there is or is not a linear trend

Linearity of variables - relationships look linear - check in scatterplot, residual plot
Independence of residuals - experimental design
Normality of residuals - qqplot - indicates normality, shapiro-wilk test
homoscedasicity of residuals - residual plot

T: OV-EV/SE = B₁/SE_b1with n-p - 1 (n= sample size, p=independent variables)

P: use t_n-2 curve to find tail areas

C: usual

New cards

r studio for chi-tests and linear relationships

chisq.test() forms chi-squared tests

chisq.test(x, y, correct = FALSE, p = rep(1/length(x), length(x)))

lm() - find the linear model

summary() gives you the test