data science quiz three

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/49

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 6:05 AM on 4/24/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

50 Terms

1
New cards

hypothesis test

a statistical technique used to evaluate competing claims using data

2
New cards

null hypothesis (Ho)

an assumption about the population. “there is nothing going on”

3
New cards

alternative hypothesis (Ha)

a research question about the population. “there is something going on”

4
New cards

what is the motivation behind a hypothesis test

decision

5
New cards

what is the motivation behind a confidence interval

estimation

6
New cards

one sided (one tailed) alternative hypothesis

the parameter is hypothesised to be less than or greater than the null value, < or >

7
New cards

two sided (two tailed) alternative hypothesis

the parameter is hypothesised to be not equal to the null value

8
New cards

what are two characteristics of two sided alternatives

calculated as two times the tail area beyond the observed sample statistic; more objective and hence more widely preferred

9
New cards

state the hypothesis for an independent case

null hypothesis, observed different in proportions is simply due to chance; Ho: p(treatment) - p(control) = 0

10
New cards

state the hypothesis for a dependent case

alternative hypothesis, observed difference in proportions is not due to chance; Ha: p(treatment) - p(control) /= 0

11
New cards

explain the randomisation process

  1. randomly shuffle the rows in the data frame

  2. split off the first 16 rows and set them aside - these represent the people in the control group

  3. split of the final 34 rows and set them aside - these represent the people in the treatment group

  4. calculate the proportion of people in both groups who yawned

  5. calculate the difference in proportions of yawners (treatment - control) and plot it on the chart

12
New cards

write the code for simulation by computation

knowt flashcard image
13
New cards

explain the steps for writing the code for simulation by computation

  • start with the data frame

  • specify the variables

  • state the null hypothesis

  • generate simulated differences via permutation

  • calculate the sample statistic of interest

14
New cards

write the code for calculating the p value

knowt flashcard image
15
New cards

significance level

the cutoff value for whether the p-value is low enough that the data are unlikely to have come from the null model

16
New cards

when is Ho rejected

if p-value < alpha, reject Ho in favour of Ha - the data provide convincing evidence for the alternative hypothesis

17
New cards

when is Ho not rejected

if p-value > alpha, fail to reject Ho in favour of Ha - the data do not provide convincing evidence for the alternative hypothesis

18
New cards

false positive

rejecting the null hypothesis when it is correct

19
New cards

false negative

failing to reject the null hypothesis when it is incorrect

20
New cards

assumptions of the central limit theorem

  • assumes sampling statistics adhere to a normal distribution

  • observations in the sample are independent

  • the sample size is sufficiently large

21
New cards

sketch the normal distribution curve

knowt flashcard image
22
New cards

p-value

the probability of observing a test statistic as extreme as the one computed from the sample data, assuming the null hypothesis is true

23
New cards

why are permutation-based approaches used

they repeat simulations to estimate the distribution of the test statistic under the null hypothesis

24
New cards

code for generating the null distribution

set.seed(123)

25
New cards

what is the code for visualising a simulated p-value

visualize(null_dist) +
shade_p_value(obs_stat = d_hat, direction = "two-sided")

26
New cards

what is the code for calculating a simulated p-value

null_dist |>
get_p_value(obs_stat = d_hat, direction = "two-sided")

27
New cards

what function displays linear regression

fit()

28
New cards

statistical inference

the process of using sample data to make conclusions about the underlying population from which the sample came

29
New cards

estimation

uses data from samples to calculate sample statistics (mean, median, slope) which can then be used as estimates for population parameters

30
New cards

hypothesis testing

use data from samples to calculate p values which can then be used to evaluate competing claims about the population

31
New cards

confidence intervals

a plausible range of values for a population parameter; need to quantify the variability of the sample statistic in order to construct one

32
New cards

code for sampling without replacement

sample(x = 1:10, size = 10, replace = FALSE)

33
New cards

code for sampling with replacement

sample(x = 1:10, size = 10, replace = TRUE)

34
New cards

explain the bootstrapping scheme

  1. take a bootstrap sample - a random sample taken with replacement from the original sample, of the same size as the original sample

  2. calculate the bootstrap statistic - a statistic such as mean, median proportion, slope computed on the bootstrap samples

  3. repeat steps 1 and 2 to create a bootstrap distribution

  4. calculate the bounds of the confidence interval as the middle of the bootstrap distribution

35
New cards

code for taking a bootstrap sample

economy_boot_1 <- economy |>
slice_sample(n = nrow(economy), replace = TRUE)

36
New cards

explain the difference between confidence intervals and p values

  • confidence interval: range of plausible values for the population parameter; distribution centred around the observed sample statistic

  • p value: probability of observing the data, given the null hypothesis is true; distribution centred around the value from the null hypothesis

  • a 95% confidence interval in practice is a hypothesis test with alpha = 0.05

37
New cards

code for calculating mean

calculate(stat = “mean”)

38
New cards

code for obtaining the confidence interval

get_ci(x = boot_df, level = 0.95)

39
New cards

entire code for the chipotle confidence interval problem

knowt flashcard image
40
New cards

modeling

the use of models to explain the relationship between variables and to make predictions

41
New cards

linear models

classic forms used for statistical inference

42
New cards

nonlinear models

much more common in machine learning for prediction

43
New cards

correlation

ranges between -1 and 1, same sign as the slope

<p>ranges between -1 and 1, same sign as the slope </p>
44
New cards

regression model

a function that describes the relationship between the outcome and the predictor; Y =Model + Error

45
New cards

simple linear regression

used to model the relationship between a quantitative outcome and a single quantitative predictor

<p>used to model the relationship between a quantitative outcome and a single quantitative predictor </p>
46
New cards

residual formula

observed - predicted

47
New cards

least square lines

minimises the sum of squared residuals

<p>minimises the sum of squared residuals </p>
48
New cards

code for simple linear regression

movies_fit <- linear_reg() |>
fit(audience ~ critics, data = movie_scores)

tidy(movies_fit)

49
New cards

properties of least squares regression

  • The regression line goes through the center of mass point (the coordinates corresponding to average x and y coordinates)

  • Slope has the same sign as the correlation coefficient

  • Sum of the residuals is zero

  • Residuals and values are uncorrelated

50
New cards

in what context is the intercept meaningful

when the predictor has values near zero