2-3 stochastic bandits

0.0(0)

Studied by 6 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/18

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

19 Terms

1

New cards

Describe the stochastic multi-armed bandit setup

knowt flashcard image

2

New cards

Pseudo regret for the stoc multi-armed bandit setup

3

New cards

Performance measures: optimal arm, best arm, suboptimality gap

knowt flashcard image

4

New cards

Regret decomposition lemma

5

New cards

Regret simple strategy exploration

6

New cards

Regret simple strategy explotation FTL

\omega (T)

7

New cards

ETC - give outline of algorithm

knowt flashcard image

8

New cards

UB Regret ETC

9

New cards

General LB Regret for any Bandit Algorithm

10

New cards

Best-arm idenfication- what is it and what is the bound on at least how many rounds T we need?

knowt flashcard image

11

New cards

epsilon-greedy algorithm - give outline of algorithm

do exploration w.p. epsilon
do exploitation w.p 1-epsilon => pick the arm with highest empirical mean so far

<ul><li><p>do exploration w.p. epsilon</p></li><li><p>do exploitation w.p 1-epsilon => pick the arm with highest empirical mean so far</p></li></ul><p></p>

12

New cards

LB regret e-greedy

knowt flashcard image

13

New cards

UCB - outline of general algorithm

In UCB we try to be greedy but use optimism to encourage exploration
The idea is to pick the arm with highest UCB - the arm we’re most optimistic of
- if we played an arm many times, the confidence interval gets narrow
- it eventually concentrates to the best arm - but only does so by exploring
UCB encourages exploration without the need of explicit randomness

<ul><li><p>In UCB we try to be greedy but use optimism to encourage exploration</p></li><li><p>The idea is to pick the arm with highest UCB - the arm we’re most optimistic of</p><ul><li><p>if we played an arm many times, the confidence interval gets narrow</p></li><li><p>it eventually concentrates to the best arm - but only does so by exploring</p></li></ul></li><li><p>UCB encourages exploration without the need of explicit randomness</p></li></ul><p></p>

14

New cards

Lemma Clean event - calculation

knowt flashcard image

15

New cards

How many times do we play an sub-optimal arm?

knowt flashcard image

16

New cards

UCB: regret instance-dependent bound

knowt flashcard image

17

New cards

UCB: regret instance-INdependent bound

knowt flashcard image

18

New cards

Define the Optimism Paradigm

knowt flashcard image

19

New cards

Describe Thompson Sampling

knowt flashcard image