1/13
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Contextual bandits - general outline

Contextual bandits - how is reward drawn?

Contextual bandits - what is the expected reward?

Contextual bandits - what is the best response policy / optimal arm

Contextual bandits - how do we write the regret?

A simple contextual bandit algorithm - general outline

A simple contextual bandit algorithm - Regret/arm and total regret?
Problem with this bound is that the bound becomes very large if the cardinality of X is large (many context) - therefore we either need to assume some structure or change the objective

Adding structure: Lipshitz contextual bandits - definition
the idea is to reduce the problem into a smaller set of contexts by discretizing the context space
we treat each grid point as a seperate context - so for each context x we map it onto the closet grid point
we have in total 1 + 1/eps points in our grid

Stochastic Linear Bandits - general outline of algorithm? goal? how to we write pseudo regret?

LinUCB - what is the goal of LinUCB? How do we build LinUCB
The goal is to select actions optimistically by considering the best-case plausible outcomes given in the uncertainty in theta*
The problem is that we don’t know the true \theta - so we build a confidence set Omegat such that there exists candidates of theta in it with high probability
Then for any action we compute the best-case (lowest) loss it might incur under any plausible theta in our confidence set - this gives us a lower confidence bound
Then we act optimistically by choosing the action with the lowest bound
In the update rule:
the first term drives exploitation - the best guess on past data
the second term drive exploration - how uncertain we are in the direction a
the term is larger if a is not aligned with previous actions (unexplored direction) or the design matrix doesn’t have much information in that direction

Center of the confidence set - how does it look like?

How is theta^ - theta* look like?

How does the final confidence set look like?

UB Regret for LinUCB
