1/16
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
45
expected return J of a policy
46
optimal policy using the expected return value J
47
policy gradient theorem gradient of J
48
Approximation of state action value fucntion at time t using monte carlo
49
policy gradient theorem gradient approximation using monte carlo over n trajectories
51
Loss function used in A-D (automatic differentiation) for basic policy gradient algorithm
52
policy gradient basic update of parameters
53
policy gradient theorem with a baseline estimation of the gradient
54
policy gradient theorem baseline equation
55
Actor critic policy gradient algorithm gradient estimation over n trajectrories
56
Actor critic policy gradient algorithm loss of the critic with monte carlo approach
57
Actor critic policy gradient algorithm loss of the critic with temporal difference learning
58
Actor critic policy gradient algorithm loss of the critic with multistep temporal difference learning
59
Entropy bonus term H
60
update of the actor using entropy bonus
61
TRPO update rule
62
TRPO dtheta equation