Advertisement
Guest User

Untitled

a guest
Mar 20th, 2018
72
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.72 KB | None | 0 0
  1. # We are computing the cost-to-go using the following formula:
  2. # Jpi = (I-y*Ppi)^-1*Cpi
  3.  
  4. # We started by making some changes to the policy so the rows and the columns correspond to the states
  5.  
  6. Ppi = np.array([[0, 0.5, 0, 0.5, 0, 0], [0, 0, 0.5, 0, 0.5, 0], [0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0]])
  7.  
  8. # We have a new c(pi)
  9.  
  10. Cpi = np.array([[0, 0.1, 0, 0.1], [0, 0.1, 0, 0.1], [0, 1, 0, 0], [0, 0, 0, 0.2], [0, 0, 0, 0.2], [0, 0, 0, 0]])
  11.  
  12. I = np.eye(6)
  13.  
  14. # DÚVIDA: é suposto usar esta formula? Como a Policy não é um vector nxn, não dá para subtrair à Identidade por isso alterámos o PPi <-----------
  15. Jpi = np.dot(np.linalg.inv(I-0.99*Ppi), Cpi)
  16.  
  17. print("Cost-to-go: \n", Jpi)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement