Advertisement
Guest User

Untitled

a guest
May 24th, 2015
225
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 4.25 KB | None | 0 0
  1. """A simple Markov Decision Process describing a painting robot.
  2.  
  3. This problem is inspired by the "Planning" lesson from the CS 7637 (Knowledge-
  4. Based Artificial Intelligence) course at Georgia Institue of Technology, with
  5. Prof. Ashok Goyel. It was created in support of an MDP assignment in CS 7641
  6. (Machine Learning), with Profs. Charles Isbell and Michael Littman.
  7.  
  8. A robot is tasked with painting a ladder and a ceiling. The following
  9. independent states are possible, for a total of 8:
  10.  
  11. - The robot may be positioned on the ladder or on the floor.
  12. - The ceiling may be wet (painted) or dry (unpainted).
  13. - The ladder may be wet (painted) or dry (unpainted).
  14.  
  15. The initial state is robot: on floor, ceiling: dry, ladder: dry.
  16.  
  17. The robot may take the following actions:
  18.  
  19. - Climb the ladder
  20. - Descend the ladder
  21. - Paint the ladder
  22. - Paint the ceiling
  23.  
  24. The robot's actions are reliable. There is no stochastic element to this
  25. problem.
  26.  
  27. Climbing or descending the ladder changes the robot's state. Painting the
  28. ceiling or ladder changes the ceiling's or ladder's state. Some action-state
  29. combinations result in no change in state, such as painting the ceiling while
  30. on the floor, or climbing the ladder while already on the ladder.
  31.  
  32. The reward for each action is -1, with the following exceptions:
  33.  
  34. -5 if the robot paints the ladder while on it.
  35. -5 if the robot climbs a wet ladder.
  36. +10 (and the problem ends) when the ceiling and ladder are both wet, and the
  37. robot is on the floor.
  38.  
  39. """
  40.  
  41. import numpy as np
  42.  
  43.  
  44. def get_matrices():
  45. """Provide transition and rewards matrices for a Robot Painter MDP.
  46.  
  47. Returns (P, R), where P contains the transition probability matrices, and
  48. R is the rewards matrix. Both P and R are numpy arrays. P is a 3-D array
  49. of shape (Actions, States, States). R is a 2-D array of shape (State,
  50. Action).
  51.  
  52. The order of states is:
  53. robot: on floor, ladder: dry, ceiling: dry
  54. robot: on floor, ladder: dry, ceiling: wet
  55. robot: on floor, ladder: wet, ceiling: dry
  56. robot: on floor, ladder: wet, ceiling: wet
  57. robot: on ladder, ladder: dry, ceiling: dry
  58. robot: on ladder, ladder: dry, ceiling: wet
  59. robot: on ladder, ladder: wet, ceiling: dry
  60. robot: on ladder, ladder: wet, ceiling: wet
  61.  
  62. The order of actions is:
  63. climb ladder
  64. descend ladder
  65. paint ladder
  66. paint ceiling
  67.  
  68. """
  69.  
  70. P = np.array([
  71. # Action: climb ladder
  72. [
  73. [0., 0., 0., 0., 1., 0., 0., 0.],
  74. [0., 0., 0., 0., 0., 1., 0., 0.],
  75. [0., 0., 0., 0., 0., 0., 1., 0.],
  76. [0., 0., 0., 1., 0., 0., 0., 0.],
  77. [0., 0., 0., 0., 1., 0., 0., 0.],
  78. [0., 0., 0., 0., 0., 1., 0., 0.],
  79. [0., 0., 0., 0., 0., 0., 1., 0.],
  80. [0., 0., 0., 0., 0., 0., 0., 1.]
  81. ],
  82. # Action: descend ladder
  83. [
  84. [1., 0., 0., 0., 0., 0., 0., 0.],
  85. [0., 1., 0., 0., 0., 0., 0., 0.],
  86. [0., 0., 1., 0., 0., 0., 0., 0.],
  87. [0., 0., 0., 1., 0., 0., 0., 0.],
  88. [1., 0., 0., 0., 0., 0., 0., 0.],
  89. [0., 1., 0., 0., 0., 0., 0., 0.],
  90. [0., 0., 1., 0., 0., 0., 0., 0.],
  91. [0., 0., 0., 1., 0., 0., 0., 0.]
  92. ],
  93. # Action: paint ladder
  94. [
  95. [0., 0., 1., 0., 0., 0., 0., 0.],
  96. [0., 0., 0., 1., 0., 0., 0., 0.],
  97. [0., 0., 1., 0., 0., 0., 0., 0.],
  98. [0., 0., 0., 1., 0., 0., 0., 0.],
  99. [0., 0., 0., 0., 0., 0., 1., 0.],
  100. [0., 0., 0., 0., 0., 0., 0., 1.],
  101. [0., 0., 0., 0., 0., 0., 1., 0.],
  102. [0., 0., 0., 0., 0., 0., 0., 1.]
  103. ],
  104. # Action: paint ceiling
  105. [
  106. [1., 0., 0., 0., 0., 0., 0., 0.],
  107. [0., 1., 0., 0., 0., 0., 0., 0.],
  108. [0., 0., 1., 0., 0., 0., 0., 0.],
  109. [0., 0., 0., 1., 0., 0., 0., 0.],
  110. [0., 0., 0., 0., 0., 1., 0., 0.],
  111. [0., 0., 0., 0., 0., 1., 0., 0.],
  112. [0., 0., 0., 0., 0., 0., 0., 1.],
  113. [0., 0., 0., 0., 0., 0., 0., 1.]
  114. ]
  115. ])
  116.  
  117. R = np.array([
  118. [-1., -1., -1., -1.],
  119. [-1., -1., 10., -1.],
  120. [-5., -1., -1., -1.],
  121. [0., 0., 0., 0.],
  122. [-1., -1., -5., -1.],
  123. [-1., -1., -5., -1.],
  124. [-1., -1., -5., -1.],
  125. [-1., 10., -5., -1.]
  126. ])
  127.  
  128. return (P, R)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement