Advertisement
Guest User

Untitled

a guest
Aug 20th, 2017
77
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.73 KB | None | 0 0
  1. # For practice purpose as part of Reinforcement Learning course.
  2.  
  3. ## Optimized policy via Genetic Algorithm
  4. best_policy = [3, 2, 3, 2, 3, 2, 2, 0, 3, 3, 3, 3, 2, 3, 2, 1, 2, 2, 0, 2, 2, 2, 2, 2, 0, 3, 1, 1, 0, 2, 2, 2, 0, 3, 3, 0, 2, 1, 3, 2, 1, 2, 0, 0, 1, 0, 3, 2, 2, 0, 0, 0, 3, 2, 0, 2, 0, 1, 2, 1, 3, 3, 0, 0]
  5. import gym
  6. from gym import wrappers
  7. env = gym.make('FrozenLake8x8-v0')
  8. env = wrappers.Monitor(env, '/tmp/frozenlake-experiment-2')
  9. for i_episode in range(4000):
  10. observation = env.reset()
  11. for t in range(400):
  12. env.render()
  13. observation, reward, done, info = env.step(best_policy[observation])
  14. if done:
  15. print("Episode finished after {} timesteps".format(t+1))
  16. break
  17. env.close()
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement