STAT 201: Module 10

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/13

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

14 Terms

New cards

Response Variable

the outcome or metric being measured to compare the performance of the two groups (A and B)

New cards

Covariate

variable that is not the main focus of the test but may influence the response variable
often used to control for external factors or reduce variability in the analysis
what splits our sample into different groups

New cards

Randomization

randomization will average out any hidden differences (aka. lurking variables)
this ensures the only difference between groups will be the response variable
this is done by randomly assigning users to each group, thus minimizing the chances that other variables will cause a difference

New cards

A/B Testing

compare two or more treatments or interventions (Group A and Group B) to assess their performance regarding a specific variable of interest (the response variable)
goal is to determine whether any observed differences are statistically significant or merely due to chance
data is analyzed as it comes, we don’t wait for the entire sample data to arrive before testing the hypothesis
- the sample size is flexible (data is monitored as they come)
the testing framework is dynamic; decisions are made as the experiment goes for a more iterative process (aka peeking)
- controlling the risk of wrongly rejecting the null hypothesis is not an easy task in A/B testing if peeking and early stops are allowed

New cards

What does the Structure of A/B Testing consist of?

response variable (Y)
one or more covariates that split the population into groups
randomization, individuals are randomly assigned to the groups to avoid bias
statistical comparison of the groups’ parameter of interest
- remember that all we have is a sample, so we need to account for the sampling variability

New cards

Peeking

checking the results of an ongoing A/B test with the intent to stop it and make a decision or inference based on the observed outcome
the more times you peek, the more hypothesis tests you’re doing ??

New cards

Early Stopping

stopping the experiment once the p-value is less than 0.05

New cards

Consequences of Peeking/Early Stopping

Type 1 error rate increases
effect size is biased upward: when stopping experiments early, this is more likely due to a random fluctuation (an overestimate of the true effect) than a real effect

However:

when done correctly, stopping an experiment earlier can be beneficial in many contexts

New cards

A/A Testing

ensures both groups receive the same treatment
allows us to confirm the truthfulness of H₀ and any rejection of H₀ would constitute a false positive
- we know that there is no effect (H₀ is true)

New cards

Steps

run a balanced experiment with a pre-set sample size of n visitors per variation
sequentially collect the data in batches of visitors per group
sequentially analyze the data using two sample t-tests
sequentially compute and monitor the p-values (non-adjusted)
stop the experiment once a significant result is found

New cards

How to analyze the data in an incremental way

If sample_increase_step is 20 and n=500, the function will:

draw the first 20 experimental units from each group
perform the two sample t-test and return the associated t-statistic and p-value
draw 20 more experimental units for each group
perform the two sample t-test (now based on 40 experimental units per group) and return the associated t-statistic and p=value
draw another 20 experimental units for each group
perform the two sample t-test (now based on 60 experimental units per group) and return the associated t-statistic and p=value

Ans so on, until the total sample size in each group is 500 (as originally planned)

there’s a cost to having longer experiments

New cards

incremental_t_test Function

incremental_t_test(n=..., d_0=..., 			sample_increase_step= .., mean_current= ..., sd_current=..., sd_new=...)

Returns a tibble with 3 columns:

inc_sample_size: the sample size of the set of data analyzed
statistic: the t-statistic calculated by the t.test() function
p_value: p-value calculated by the t.test() function

New cards

p-value adjustments: Bonferroni

using a Bonferroni correction, the data can be sequentially analyzed and can be stopped earlier while controlling the type 1 error rate
the Bonferroni’s correction in sequential analysis is very conservative and can affect the power of the test (we are more vulnerable to type 2 error)

New cards

p-value adjustments: Pocock

similarly to Bonferroni’s method, also computes a common critical value for all interim analyses, however is not an adjustment of the cutoff value (critical value)
not as conservative as Bonferroni’s method
the data can be sequentially analyzed and the experiment can be stopped earlier while controlling the type 1 error rate