Policy gradient algorithm

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/16

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

17 Terms

1

New cards

45

expected return J of a policy

2

New cards

46

optimal policy using the expected return value J

3

New cards

47

policy gradient theorem gradient of J

4

New cards

48

Approximation of state action value fucntion at time t using monte carlo

5

New cards

49

policy gradient theorem gradient approximation using monte carlo over n trajectories

6

New cards

51

Loss function used in A-D (automatic differentiation) for basic policy gradient algorithm

7

New cards

52

policy gradient basic update of parameters

8

New cards

53

policy gradient theorem with a baseline estimation of the gradient

9

New cards

54

policy gradient theorem baseline equation

10

New cards

55

Actor critic policy gradient algorithm gradient estimation over n trajectrories

11

New cards

56

Actor critic policy gradient algorithm loss of the critic with monte carlo approach

12

New cards

57

Actor critic policy gradient algorithm loss of the critic with temporal difference learning

13

New cards

58

Actor critic policy gradient algorithm loss of the critic with multistep temporal difference learning

14

New cards

59

Entropy bonus term H

15

New cards

60

update of the actor using entropy bonus

16

New cards

61

TRPO update rule

17

New cards

62

TRPO dtheta equation