1/13
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Contextual bandits - general outline
Contextual bandits - how is reward drawn?
Contextual bandits - what is the expected reward?
Contextual bandits - what is the best response policy / optimal arm
Contextual bandits - how do we write the regret?
A simple contextual bandit algorithm - general outline
A simple contextual bandit algorithm - Regret/arm and total regret?
Problem with this bound is that the bound becomes very large if the cardinality of X is large (many context) - therefore we either need to assume some structure or change the objective
Adding structure: Lipshitz contextual bandits - definition
the idea is to reduce the problem into a smaller set of contexts by discretizing the context space
we treat each grid point as a seperate context - so for each context x we map it onto the closet grid point
we have in total 1 + 1/eps points in our grid
Stochastic Linear Bandits - general outline of algorithm? goal? how to we write pseudo regret?
LinUCB - what is the goal of LinUCB? How do we build LinUCB
The goal is to select actions optimistically by considering the best-case plausible outcomes given in the uncertainty in theta*
The problem is that we don’t know the true \theta - so we build a confidence set Omegat such that there exists candidates of theta in it with high probability
Then for any action we compute the best-case (lowest) loss it might incur under any plausible theta in our confidence set - this gives us a lower confidence bound
Then we act optimistically by choosing the action with the lowest bound
In the update rule:
the first term drives exploitation - the best guess on past data
the second term drive exploration - how uncertain we are in the direction a
the term is larger if a is not aligned with previous actions (unexplored direction) or the design matrix doesn’t have much information in that direction
Center of the confidence set - how does it look like?
How is theta^ - theta* look like?
How does the final confidence set look like?
UB Regret for LinUCB