Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- # We are computing the cost-to-go using the following formula:
- # Jpi = (I-y*Ppi)^-1*Cpi
- # We started by making some changes to the policy so the rows and the columns correspond to the states
- Ppi = np.array([[0, 0.5, 0, 0.5, 0, 0], [0, 0, 0.5, 0, 0.5, 0], [0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0]])
- # We have a new c(pi)
- Cpi = np.array([[0, 0.1, 0, 0.1], [0, 0.1, 0, 0.1], [0, 1, 0, 0], [0, 0, 0, 0.2], [0, 0, 0, 0.2], [0, 0, 0, 0]])
- I = np.eye(6)
- # DÚVIDA: é suposto usar esta formula? Como a Policy não é um vector nxn, não dá para subtrair à Identidade por isso alterámos o PPi <-----------
- Jpi = np.dot(np.linalg.inv(I-0.99*Ppi), Cpi)
- print("Cost-to-go: \n", Jpi)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement