Advertisement
Guest User

Untitled

a guest
Feb 21st, 2019
76
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.63 KB | None | 0 0
  1. # Excerpt of DDPG agent object.
  2. class Agent():
  3. '''Interact with and learn from environment.'''
  4.  
  5. def learn(self, experiences, gamma):
  6. """Update policy and value parameters using given batch of experience tuples.
  7. Q_targets = r + γ * critic_target(next_state, actor_target(next_state))
  8. where:
  9. actor_target(state) -> action
  10. critic_target(state, action) -> Q-value
  11. Params
  12. ======
  13. experiences (Tuple[torch.Tensor]): tuple of (s, a, r, s', done) tuples
  14. gamma (float): discount factor
  15. """
  16. states, actions, rewards, next_states, dones = experiences
  17.  
  18. # ---------------------------- update critic ---------------------------- #
  19. # Get predicted next-state actions and Q values from target models
  20. actions_next = self.actor_target(next_states)
  21. Q_targets_next = self.critic_target(next_states, actions_next)
  22. # Compute Q targets for current states (y_i)
  23. Q_targets = rewards + (gamma * Q_targets_next * (1 - dones))
  24. # Compute critic loss
  25. Q_expected = self.critic_local(states, actions)
  26. critic_loss = F.mse_loss(Q_expected, Q_targets)
  27.  
  28. # Minimize the loss
  29. self.critic_optimizer.zero_grad()
  30. critic_loss.backward()
  31. torch.nn.utils.clip_grad_norm_(self.critic_local.parameters(), 1) # clip gradient to max 1
  32. self.critic_optimizer.step()
  33.  
  34. # ---------------------------- update actor ---------------------------- #
  35. # Compute actor loss
  36. actions_pred = self.actor_local(states)
  37. actor_loss = -self.critic_local(states, actions_pred).mean()
  38.  
  39. # Minimize the loss
  40. self.actor_optimizer.zero_grad()
  41. actor_loss.backward()
  42. self.actor_optimizer.step()
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement