Policy gradient algorithm

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/16

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

17 Terms

1
New cards

45

expected return J of a policy

2
New cards

46

optimal policy using the expected return value J

3
New cards

47

policy gradient theorem gradient of J

4
New cards

48

Approximation of state action value fucntion at time t using monte carlo

5
New cards

49

policy gradient theorem gradient approximation using monte carlo over n trajectories

6
New cards

51

Loss function used in A-D (automatic differentiation) for basic policy gradient algorithm

7
New cards

52

policy gradient basic update of parameters

8
New cards

53

policy gradient theorem with a baseline estimation of the gradient

9
New cards

54

policy gradient theorem baseline equation

10
New cards

55

Actor critic policy gradient algorithm gradient estimation over n trajectrories

11
New cards

56

Actor critic policy gradient algorithm loss of the critic with monte carlo approach

12
New cards

57

Actor critic policy gradient algorithm loss of the critic with temporal difference learning

13
New cards

58

Actor critic policy gradient algorithm loss of the critic with multistep temporal difference learning

14
New cards

59

Entropy bonus term H

15
New cards

60

update of the actor using entropy bonus

16
New cards

61

TRPO update rule

17
New cards

62

TRPO dtheta equation