2-3 stochastic bandits

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/19

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

20 Terms

1
New cards

UB Regret for OGD

<p></p>
2
New cards

Describe the stochastic multi-armed bandit setup

knowt flashcard image
3
New cards

Pseudo regret for the stoc multi-armed bandit setup

4
New cards

Performance measures: optimal arm, best arm, suboptimality gap

knowt flashcard image
5
New cards

Regret decomposition lemma

6
New cards

Regret simple strategy exploration

7
New cards

Regret simple strategy explotation FTL

\omega (T)

8
New cards

ETC - give outline of algorithm

knowt flashcard image
9
New cards

UB Regret ETC

10
New cards

General LB Regret for any Bandit Algorithm

11
New cards

Best-arm idenfication- what is it and what is the bound on at least how many rounds T we need?

knowt flashcard image
12
New cards

epsilon-greedy algorithm - give outline of algorithm

  • do exploration w.p. epsilon

  • do exploitation w.p 1-epsilon => pick the arm with highest empirical mean so far

<ul><li><p>do exploration w.p. epsilon</p></li><li><p>do exploitation w.p 1-epsilon =&gt; pick the arm with highest empirical mean so far</p></li></ul><p></p>
13
New cards

LB regret e-greedy

knowt flashcard image
14
New cards

UCB - outline of general algorithm

  • In UCB we try to be greedy but use optimism to encourage exploration

  • The idea is to pick the arm with highest UCB - the arm we’re most optimistic of

    • if we played an arm many times, the confidence interval gets narrow

    • it eventually concentrates to the best arm - but only does so by exploring

  • UCB encourages exploration without the need of explicit randomness

<ul><li><p>In UCB we try to be greedy but use optimism to encourage exploration</p></li><li><p>The idea is to pick the arm with highest UCB - the arm we’re most optimistic of</p><ul><li><p>if we played an arm many times, the confidence interval gets narrow</p></li><li><p>it eventually concentrates to the best arm - but only does so by exploring</p></li></ul></li><li><p>UCB encourages exploration without the need of explicit randomness</p></li></ul><p></p>
15
New cards

Lemma Clean event - calculation

knowt flashcard image
16
New cards

How many times do we play an sub-optimal arm?

knowt flashcard image
17
New cards

UCB: regret instance-dependent bound

knowt flashcard image
18
New cards

UCB: regret instance-INdependent bound

knowt flashcard image
19
New cards

Define the Optimism Paradigm

knowt flashcard image
20
New cards

Describe Thompson Sampling

knowt flashcard image