1/13
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Response Variable
the outcome or metric being measured to compare the performance of the two groups (A and B)
Covariate
variable that is not the main focus of the test but may influence the response variable
often used to control for external factors or reduce variability in the analysis
what splits our sample into different groups
Randomization
randomization will average out any hidden differences (aka. lurking variables)
this ensures the only difference between groups will be the response variable
this is done by randomly assigning users to each group, thus minimizing the chances that other variables will cause a difference
A/B Testing
compare two or more treatments or interventions (Group A and Group B) to assess their performance regarding a specific variable of interest (the response variable)
goal is to determine whether any observed differences are statistically significant or merely due to chance
data is analyzed as it comes, we don’t wait for the entire sample data to arrive before testing the hypothesis
the sample size is flexible (data is monitored as they come)
the testing framework is dynamic; decisions are made as the experiment goes for a more iterative process (aka peeking)
controlling the risk of wrongly rejecting the null hypothesis is not an easy task in A/B testing if peeking and early stops are allowed
What does the Structure of A/B Testing consist of?
response variable (Y)
one or more covariates that split the population into groups
randomization, individuals are randomly assigned to the groups to avoid bias
statistical comparison of the groups’ parameter of interest
remember that all we have is a sample, so we need to account for the sampling variability
Peeking
checking the results of an ongoing A/B test with the intent to stop it and make a decision or inference based on the observed outcome
the more times you peek, the more hypothesis tests you’re doing ??
Early Stopping
stopping the experiment once the p-value is less than 0.05
Consequences of Peeking/Early Stopping
Type 1 error rate increases
effect size is biased upward: when stopping experiments early, this is more likely due to a random fluctuation (an overestimate of the true effect) than a real effect
However:
when done correctly, stopping an experiment earlier can be beneficial in many contexts
A/A Testing
ensures both groups receive the same treatment
allows us to confirm the truthfulness of H0 and any rejection of H0 would constitute a false positive
we know that there is no effect (H0 is true)
Steps
run a balanced experiment with a pre-set sample size of n visitors per variation
sequentially collect the data in batches of visitors per group
sequentially analyze the data using two sample t-tests
sequentially compute and monitor the p-values (non-adjusted)
stop the experiment once a significant result is found
How to analyze the data in an incremental way
If sample_increase_step
is 20 and n=500
, the function will:
draw the first 20 experimental units from each group
perform the two sample t-test and return the associated t-statistic and p-value
draw 20 more experimental units for each group
perform the two sample t-test (now based on 40 experimental units per group) and return the associated t-statistic and p=value
draw another 20 experimental units for each group
perform the two sample t-test (now based on 60 experimental units per group) and return the associated t-statistic and p=value
Ans so on, until the total sample size in each group is 500 (as originally planned)
there’s a cost to having longer experiments
incremental_t_test
Function
incremental_t_test(n=..., d_0=..., sample_increase_step= .., mean_current= ..., sd_current=..., sd_new=...)
Returns a tibble
with 3 columns:
inc_sample_size
: the sample size of the set of data analyzed
statistic
: the t-statistic calculated by the t.test()
function
p_value
: p-value calculated by the t.test()
function
p-value adjustments: Bonferroni
using a Bonferroni correction, the data can be sequentially analyzed and can be stopped earlier while controlling the type 1 error rate
the Bonferroni’s correction in sequential analysis is very conservative and can affect the power of the test (we are more vulnerable to type 2 error)
p-value adjustments: Pocock
similarly to Bonferroni’s method, also computes a common critical value for all interim analyses, however is not an adjustment of the cutoff value (critical value)
not as conservative as Bonferroni’s method
the data can be sequentially analyzed and the experiment can be stopped earlier while controlling the type 1 error rate