7 contextual and linear bandits

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/13

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

14 Terms

1
New cards

Contextual bandits - general outline

knowt flashcard image
2
New cards

Contextual bandits - how is reward drawn?

knowt flashcard image
3
New cards

Contextual bandits - what is the expected reward?

knowt flashcard image
4
New cards

Contextual bandits - what is the best response policy / optimal arm

knowt flashcard image
5
New cards

Contextual bandits - how do we write the regret?

knowt flashcard image
6
New cards

A simple contextual bandit algorithm - general outline

knowt flashcard image
7
New cards

A simple contextual bandit algorithm - Regret/arm and total regret?

Problem with this bound is that the bound becomes very large if the cardinality of X is large (many context) - therefore we either need to assume some structure or change the objective

<p>Problem with this bound is that the bound becomes very large if the cardinality of X is large (many context) - therefore we either need to assume some structure or change the objective</p>
8
New cards

Adding structure: Lipshitz contextual bandits - definition

  • the idea is to reduce the problem into a smaller set of contexts by discretizing the context space

  • we treat each grid point as a seperate context - so for each context x we map it onto the closet grid point

  • we have in total 1 + 1/eps points in our grid

<ul><li><p>the idea is to reduce the problem into a smaller set of contexts by discretizing the context space</p></li><li><p>we treat each grid point as a seperate context - so for each context x we map it onto the closet grid point</p></li><li><p>we have in total 1 + 1/eps points in our grid</p></li></ul><p></p>
9
New cards

Stochastic Linear Bandits - general outline of algorithm? goal? how to we write pseudo regret?

knowt flashcard image
10
New cards

LinUCB - what is the goal of LinUCB? How do we build LinUCB

  • The goal is to select actions optimistically by considering the best-case plausible outcomes given in the uncertainty in theta*

  • The problem is that we don’t know the true \theta - so we build a confidence set Omegat such that there exists candidates of theta in it with high probability

  • Then for any action we compute the best-case (lowest) loss it might incur under any plausible theta in our confidence set - this gives us a lower confidence bound

  • Then we act optimistically by choosing the action with the lowest bound

  • In the update rule:

    • the first term drives exploitation - the best guess on past data

    • the second term drive exploration - how uncertain we are in the direction a

    • the term is larger if a is not aligned with previous actions (unexplored direction) or the design matrix doesn’t have much information in that direction

<ul><li><p>The goal is to select actions optimistically by considering the best-case plausible outcomes given in the uncertainty in theta*</p></li><li><p>The problem is that we don’t know the true \theta - so we build a confidence set Omega<sub>t</sub> such that there exists candidates of theta in it with high probability</p></li><li><p>Then for any action we compute the best-case (lowest) loss it might incur under any plausible theta in our confidence set - this gives us a lower confidence bound</p></li><li><p>Then we act optimistically by choosing the action with the lowest bound</p></li><li><p>In the update rule:</p><ul><li><p>the first term drives exploitation - the best guess on past data</p></li><li><p>the second term drive exploration - how uncertain we are in the direction a</p></li><li><p>the term is larger if a is not aligned with previous actions (unexplored direction) or the design matrix doesn’t have much information in that direction</p></li></ul></li></ul><p></p>
11
New cards

Center of the confidence set - how does it look like?

knowt flashcard image
12
New cards

How is theta^ - theta* look like?

knowt flashcard image
13
New cards

How does the final confidence set look like?

knowt flashcard image
14
New cards

UB Regret for LinUCB

knowt flashcard image