2-3 stochastic bandits

0.0(0)
studied byStudied by 6 people
0.0(0)
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/18

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

19 Terms

1
New cards

Describe the stochastic multi-armed bandit setup

knowt flashcard image
2
New cards

Pseudo regret for the stoc multi-armed bandit setup

3
New cards

Performance measures: optimal arm, best arm, suboptimality gap

knowt flashcard image
4
New cards

Regret decomposition lemma

5
New cards

Regret simple strategy exploration

6
New cards

Regret simple strategy explotation FTL

\omega (T)

7
New cards

ETC - give outline of algorithm

knowt flashcard image
8
New cards

UB Regret ETC

9
New cards

General LB Regret for any Bandit Algorithm

10
New cards

Best-arm idenfication- what is it and what is the bound on at least how many rounds T we need?

knowt flashcard image
11
New cards

epsilon-greedy algorithm - give outline of algorithm

  • do exploration w.p. epsilon

  • do exploitation w.p 1-epsilon => pick the arm with highest empirical mean so far

<ul><li><p>do exploration w.p. epsilon</p></li><li><p>do exploitation w.p 1-epsilon =&gt; pick the arm with highest empirical mean so far</p></li></ul><p></p>
12
New cards

LB regret e-greedy

knowt flashcard image
13
New cards

UCB - outline of general algorithm

  • In UCB we try to be greedy but use optimism to encourage exploration

  • The idea is to pick the arm with highest UCB - the arm we’re most optimistic of

    • if we played an arm many times, the confidence interval gets narrow

    • it eventually concentrates to the best arm - but only does so by exploring

  • UCB encourages exploration without the need of explicit randomness

<ul><li><p>In UCB we try to be greedy but use optimism to encourage exploration</p></li><li><p>The idea is to pick the arm with highest UCB - the arm we’re most optimistic of</p><ul><li><p>if we played an arm many times, the confidence interval gets narrow</p></li><li><p>it eventually concentrates to the best arm - but only does so by exploring</p></li></ul></li><li><p>UCB encourages exploration without the need of explicit randomness</p></li></ul><p></p>
14
New cards

Lemma Clean event - calculation

knowt flashcard image
15
New cards

How many times do we play an sub-optimal arm?

knowt flashcard image
16
New cards

UCB: regret instance-dependent bound

knowt flashcard image
17
New cards

UCB: regret instance-INdependent bound

knowt flashcard image
18
New cards

Define the Optimism Paradigm

knowt flashcard image
19
New cards

Describe Thompson Sampling

knowt flashcard image