Decisions with Data

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/15

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

16 Terms

1
New cards

LO7 - what is the HATPC framework for hypothesis testing

Hypothesis - null - change is because of chance

alternative - change is not because of chance

two-sided - tests the extremes on both ends, whether increasing or decreasing - an expected proportion

Assumptions - made through the context

independence

equal chance

Test Statistic - mimics a z-score - the standardisation

OV - EV / SE mimics : OV - mean/ SD

P-value - tests is the null is retained or rejected - many ways to calculate it depending on the test

Conclusion - compare p value with significance level - usually 0.05

2
New cards

LO7 - what is a proportion test?

we compare the observed % of results with the expected percetnage of results

a type of hypothesis test

3
New cards

LO8 - what are common assumptions about p values and how do they interact with two sided hypothesis

mistakes:

  • p value doesn’t mean that the chance null is true

  • large p value doesn’t mean the null is true

p values for 1 sided - specifies the direction of alternative hypothesis

p values for 2 sided - doesn’t specify direction of alternative, we need to double the p value

4
New cards

How to test hypothesis in r studio

  • set up a box model through manual calculations

  • run a test stat through the formula (manually created with the variables OV, EV, SE etc)

  • p value is: p_value = 2 * (1 - pnorm(test_stat)) - this is multiplied by two for a two sided hypothesis

  • # Plot the normal curve

  • # plot the 68, 95, and 99.7 thresholds through abline()

  • # plot the test statistic through abline()

  • # plot the absolute value of the test statistic, since its two sided through abline()

OR

We can use prop.test() to perform the test automatically.

prop.test(OV (observed value), n(sample size), p(prob of success for the null), alt = "two.sided", correct = FALSE) .

5
New cards

LO7 & LO8 outline the details and differences between one sample z-tests and one sample t-tests?

z tests:

used for the mean of a pop if the pop sd is known

H: remains the same

A: independence, normality, popSD known

T: find the test statistic to plot on the normal curve

P: as usual - below the significant threshold

C: use statistical (reference the p) and scientific (explain what this means effect or no effect)

in comparison with z test proportions:

  • continuous not binary

  • uses only the sample mean

  • must know the pop sd

t tests:

used when we don’t know the popsd, use t distribtions that look different based on the degrees of freedom - based on sample size - distribution: t(n-1) + accounts for extra variability

H: remains the same

A: normality and independence

T: use the sample SD instead

P: same but expected to be larger because of increased uncertainty (more degress of freedom = larger tails = uncertainty)

C: ~

6
New cards

LO7 & LO8 - outline the details of two sample t tests

  • has two groups - compares the mean of two populations

    Hypothesis:

  • two sided: there is difference in groups, one sided: this group is more than the other

    Assumptions:

  • 2 samples are independent

  • 2 samples have equal variance

  • 2 populations are normal

    Test Statistic

  • use the eqn just with double the group so double the mean

    P Value

same

Conclusion:

  • same

7
New cards

LO7 & LO8 How to test the assumptions for two sample t tests? and what if they are not met?

  • comparative boxplots - normality and equal variance

  • levene’s test - f -tests - equal spread

    • tests the null that the two pops have equal variance

  • qq-plot - for normality

  • shapiro-wilk test - for normality

If not met":

  • welch 2 sample t test - more accurate

  • non-normality - transform non-normal data

  • non-independence - one group with a before and after

  • paired t-test - one sample t test on the differences in means not the mean difference

8
New cards

what are the r studio codes for 1 and 2 sample tests?

The t.test() function takes the following inputs:

  • x, y: The vectors of data for group x and group y.

  • mu: The hypothesised mean (one-sample) or difference (two-sample, paired).

  • alternative: Which tail(s) you are interested in, depending on your alternative hypothesis.

  • var.equal: Whether the variances of group x and group y are the same.

  • paired: Whether the data is paired.

We can use pt() to find probabilities using a t-distribution, similar to how pnorm() uses the Normal distribution:

  • pt(<QUANTILE VALUE>, <df = degrees of freedom>, lower.tail = <TRUE/FALSE>)

9
New cards

what are the r studio codes to check assumptions?

qq plot - Use the stat_qq() and stat_qq_line() functions

box-plot - ggplot type shit

10
New cards

What are chi-squared tests?

tests the relationships between qualitative variables

tests for:

  • goodness of fit → 1 qual

    • whether the variable distribution matches the expected theoretical distribution

    • tests for the over or under representation of variables

  • independence → 2 qual

    • whether variables effect one another

11
New cards

Describe the goodness of fit testing

compare the OF (observed frequency) with the EF (expected frequency) and look for gaps

  • H: same - there is or isnt a difference

  • A: independences , EV frequencies are <5 for 20% - Cochran’s Rule

  • T: X2 = sum ( (OF-EF)2/EF) - testing for chi-squared

  • P: degrees of freedom = k - 1 - k = categories, X2k-1 distribution - uppertail

  • C: usual

12
New cards

Describe testing for independence

H: null - independence - no association

alt - no I, ass

A: Cochran’s rule, independence

T: X2 eqn

P: df: (m-1)(n-1) - m,n = categories, X2(m-1)(n-1) distribution - upper tail

13
New cards

What are mosaic plots?

  • plots standardised residuals

  • with shading

14
New cards

What is cochran’s rule isn’t satisfied:

  • if you also can’t increase sample size

  • use fishers.test in R

  • assumption: independent observations

15
New cards

Describe a Regression Test for the Slope in 1 sample t-Tests

test if a slope is significant or not

H: there is or is not a linear trend

A:

  • Linearity of variables - relationships look linear - check in scatterplot, residual plot

  • Independence of residuals - experimental design

  • Normality of residuals - qqplot - indicates normality, shapiro-wilk test

  • homoscedasicity of residuals - residual plot

T: OV-EV/SE = B1/SEb1with n-p - 1 (n= sample size, p=independent variables)

P: use tn-2 curve to find tail areas

C: usual

16
New cards

r studio for chi-tests and linear relationships

chisq.test() forms chi-squared tests

chisq.test(x, y, correct = FALSE, p = rep(1/length(x), length(x)))

lm() - find the linear model

summary() gives you the test