STAT 201: Module 10

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/13

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

14 Terms

1
New cards

Response Variable

  • the outcome or metric being measured to compare the performance of the two groups (A and B)

2
New cards

Covariate

  • variable that is not the main focus of the test but may influence the response variable

  • often used to control for external factors or reduce variability in the analysis

  • what splits our sample into different groups

3
New cards

Randomization

  • randomization will average out any hidden differences (aka. lurking variables)

  • this ensures the only difference between groups will be the response variable

  • this is done by randomly assigning users to each group, thus minimizing the chances that other variables will cause a difference

4
New cards

A/B Testing

  • compare two or more treatments or interventions (Group A and Group B) to assess their performance regarding a specific variable of interest (the response variable)

  • goal is to determine whether any observed differences are statistically significant or merely due to chance

  • data is analyzed as it comes, we don’t wait for the entire sample data to arrive before testing the hypothesis

    • the sample size is flexible (data is monitored as they come)

  • the testing framework is dynamic; decisions are made as the experiment goes for a more iterative process (aka peeking)

    • controlling the risk of wrongly rejecting the null hypothesis is not an easy task in A/B testing if peeking and early stops are allowed

5
New cards

What does the Structure of A/B Testing consist of?

  • response variable (Y)

  • one or more covariates that split the population into groups

  • randomization, individuals are randomly assigned to the groups to avoid bias

  • statistical comparison of the groups’ parameter of interest

    • remember that all we have is a sample, so we need to account for the sampling variability

6
New cards

Peeking

  • checking the results of an ongoing A/B test with the intent to stop it and make a decision or inference based on the observed outcome

  • the more times you peek, the more hypothesis tests you’re doing ??

7
New cards

Early Stopping

  • stopping the experiment once the p-value is less than 0.05

8
New cards

Consequences of Peeking/Early Stopping

  • Type 1 error rate increases

  • effect size is biased upward: when stopping experiments early, this is more likely due to a random fluctuation (an overestimate of the true effect) than a real effect

However:

  • when done correctly, stopping an experiment earlier can be beneficial in many contexts

9
New cards

A/A Testing

  • ensures both groups receive the same treatment

  • allows us to confirm the truthfulness of H0 and any rejection of H0 would constitute a false positive

    • we know that there is no effect (H0 is true)

10
New cards

Steps

  1. run a balanced experiment with a pre-set sample size of n visitors per variation

  2. sequentially collect the data in batches of visitors per group

  3. sequentially analyze the data using two sample t-tests

  4. sequentially compute and monitor the p-values (non-adjusted)

  5. stop the experiment once a significant result is found

11
New cards

How to analyze the data in an incremental way

If sample_increase_step is 20 and n=500, the function will:

  • draw the first 20 experimental units from each group

  • perform the two sample t-test and return the associated t-statistic and p-value

  • draw 20 more experimental units for each group

  • perform the two sample t-test (now based on 40 experimental units per group) and return the associated t-statistic and p=value

  • draw another 20 experimental units for each group

  • perform the two sample t-test (now based on 60 experimental units per group) and return the associated t-statistic and p=value

Ans so on, until the total sample size in each group is 500 (as originally planned)

  • there’s a cost to having longer experiments

12
New cards

incremental_t_test Function

incremental_t_test(n=..., d_0=..., 			sample_increase_step= .., mean_current= ..., sd_current=..., sd_new=...)

Returns a tibble with 3 columns:

  • inc_sample_size: the sample size of the set of data analyzed

  • statistic: the t-statistic calculated by the t.test() function

  • p_value: p-value calculated by the t.test() function

13
New cards

p-value adjustments: Bonferroni

  • using a Bonferroni correction, the data can be sequentially analyzed and can be stopped earlier while controlling the type 1 error rate

  • the Bonferroni’s correction in sequential analysis is very conservative and can affect the power of the test (we are more vulnerable to type 2 error)

14
New cards

p-value adjustments: Pocock

  • similarly to Bonferroni’s method, also computes a common critical value for all interim analyses, however is not an adjustment of the cutoff value (critical value)

  • not as conservative as Bonferroni’s method

  • the data can be sequentially analyzed and the experiment can be stopped earlier while controlling the type 1 error rate