Advertisement
Guest User

Untitled

a guest
Jul 24th, 2019
90
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.85 KB | None | 0 0
  1. def reset(self):
  2. self.noise.reset()
  3.  
  4. def learn(self, experiences, gamma):
  5. actions_next = self.actor_target(next_states)
  6. Q_targets_next = self.critic_target(next_states, actions_next)
  7. Q_targets = rewards + (gamma * Q_targets_next * (1 - dones))
  8. Q_expected = self.critic_local(states, actions)
  9. critic_loss = F.mse_loss(Q_expected, Q_targets)
  10. self.critic_optimizer.zero_grad()
  11. critic_loss.backward()
  12. self.critic_optimizer.step()
  13.  
  14. actions_pred = self.actor_local(states)
  15. actor_loss = -self.critic_local(states, actions_pred).mean()
  16. self.actor_optimizer.zero_grad()
  17. actor_loss.backward()
  18. self.actor_optimizer.step()
  19.  
  20. self.soft_update(self.critic_local, self.critic_target, TAU)
  21. self.soft_update(self.actor_local, self.actor_target, TAU)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement