7 contextual and linear bandits

0.0(0)

Studied by 2 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/13

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

14 Terms

New cards

Contextual bandits - general outline

New cards

Contextual bandits - how is reward drawn?

New cards

Contextual bandits - what is the expected reward?

New cards

Contextual bandits - what is the best response policy / optimal arm

New cards

Contextual bandits - how do we write the regret?

New cards

A simple contextual bandit algorithm - general outline

New cards

A simple contextual bandit algorithm - Regret/arm and total regret?

Problem with this bound is that the bound becomes very large if the cardinality of X is large (many context) - therefore we either need to assume some structure or change the objective

<p>Problem with this bound is that the bound becomes very large if the cardinality of X is large (many context) - therefore we either need to assume some structure or change the objective</p>

New cards

Adding structure: Lipshitz contextual bandits - definition

the idea is to reduce the problem into a smaller set of contexts by discretizing the context space
we treat each grid point as a seperate context - so for each context x we map it onto the closet grid point
we have in total 1 + 1/eps points in our grid

<ul><li><p>the idea is to reduce the problem into a smaller set of contexts by discretizing the context space</p></li><li><p>we treat each grid point as a seperate context - so for each context x we map it onto the closet grid point</p></li><li><p>we have in total 1 + 1/eps points in our grid</p></li></ul><p></p>

New cards

Stochastic Linear Bandits - general outline of algorithm? goal? how to we write pseudo regret?

New cards

LinUCB - what is the goal of LinUCB? How do we build LinUCB

The goal is to select actions optimistically by considering the best-case plausible outcomes given in the uncertainty in theta*
The problem is that we don’t know the true \theta - so we build a confidence set Omega_t such that there exists candidates of theta in it with high probability
Then for any action we compute the best-case (lowest) loss it might incur under any plausible theta in our confidence set - this gives us a lower confidence bound
Then we act optimistically by choosing the action with the lowest bound
In the update rule:
- the first term drives exploitation - the best guess on past data
- the second term drive exploration - how uncertain we are in the direction a
- the term is larger if a is not aligned with previous actions (unexplored direction) or the design matrix doesn’t have much information in that direction

$<ul><li>The goal is to select actions optimistically by considering the best-case plausible outcomes given in the uncertainty in theta*</li><li>The problem is that we don’t know the true \theta - so we build a confidence set Omegat such that there exists candidates of theta in it with high probability</li><li>Then for any action we compute the best-case (lowest) loss it might incur under any plausible theta in our confidence set - this gives us a lower confidence bound</li><li>Then we act optimistically by choosing the action with the lowest bound</li><li>In the update rule:<ul><li>the first term drives exploitation - the best guess on past data</li><li>the second term drive exploration - how uncertain we are in the direction a</li><li>the term is larger if a is not aligned with previous actions (unexplored direction) or the design matrix doesn’t have much information in that direction</li></ul></li></ul>$

New cards

Center of the confidence set - how does it look like?

New cards

How is theta^{^} - theta^* look like?

New cards

How does the final confidence set look like?

New cards

UB Regret for LinUCB