Advertisement
Guest User

Untitled

a guest
Mar 25th, 2019
91
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.34 KB | None | 0 0
  1. def update(self, states, actions, rewards, values):
  2. # Calculate values (or advantage) at outside of update process.
  3. advantage = reward - values
  4. action_probs = self.actor(states)
  5. selected_action_probs = action_probs[self.to_one_hot(actions)]
  6. neg_logs = - log(selected_action_probs)
  7. policy_loss = reduce_mean(neg_logs * advantages)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement