Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- """A simple Markov Decision Process describing a painting robot.
- This problem is inspired by the "Planning" lesson from the CS 7637 (Knowledge-
- Based Artificial Intelligence) course at Georgia Institue of Technology, with
- Prof. Ashok Goyel. It was created in support of an MDP assignment in CS 7641
- (Machine Learning), with Profs. Charles Isbell and Michael Littman.
- A robot is tasked with painting a ladder and a ceiling. The following
- independent states are possible, for a total of 8:
- - The robot may be positioned on the ladder or on the floor.
- - The ceiling may be wet (painted) or dry (unpainted).
- - The ladder may be wet (painted) or dry (unpainted).
- The initial state is robot: on floor, ceiling: dry, ladder: dry.
- The robot may take the following actions:
- - Climb the ladder
- - Descend the ladder
- - Paint the ladder
- - Paint the ceiling
- The robot's actions are reliable. There is no stochastic element to this
- problem.
- Climbing or descending the ladder changes the robot's state. Painting the
- ceiling or ladder changes the ceiling's or ladder's state. Some action-state
- combinations result in no change in state, such as painting the ceiling while
- on the floor, or climbing the ladder while already on the ladder.
- The reward for each action is -1, with the following exceptions:
- -5 if the robot paints the ladder while on it.
- -5 if the robot climbs a wet ladder.
- +10 (and the problem ends) when the ceiling and ladder are both wet, and the
- robot is on the floor.
- """
- import numpy as np
- def get_matrices():
- """Provide transition and rewards matrices for a Robot Painter MDP.
- Returns (P, R), where P contains the transition probability matrices, and
- R is the rewards matrix. Both P and R are numpy arrays. P is a 3-D array
- of shape (Actions, States, States). R is a 2-D array of shape (State,
- Action).
- The order of states is:
- robot: on floor, ladder: dry, ceiling: dry
- robot: on floor, ladder: dry, ceiling: wet
- robot: on floor, ladder: wet, ceiling: dry
- robot: on floor, ladder: wet, ceiling: wet
- robot: on ladder, ladder: dry, ceiling: dry
- robot: on ladder, ladder: dry, ceiling: wet
- robot: on ladder, ladder: wet, ceiling: dry
- robot: on ladder, ladder: wet, ceiling: wet
- The order of actions is:
- climb ladder
- descend ladder
- paint ladder
- paint ceiling
- """
- P = np.array([
- # Action: climb ladder
- [
- [0., 0., 0., 0., 1., 0., 0., 0.],
- [0., 0., 0., 0., 0., 1., 0., 0.],
- [0., 0., 0., 0., 0., 0., 1., 0.],
- [0., 0., 0., 1., 0., 0., 0., 0.],
- [0., 0., 0., 0., 1., 0., 0., 0.],
- [0., 0., 0., 0., 0., 1., 0., 0.],
- [0., 0., 0., 0., 0., 0., 1., 0.],
- [0., 0., 0., 0., 0., 0., 0., 1.]
- ],
- # Action: descend ladder
- [
- [1., 0., 0., 0., 0., 0., 0., 0.],
- [0., 1., 0., 0., 0., 0., 0., 0.],
- [0., 0., 1., 0., 0., 0., 0., 0.],
- [0., 0., 0., 1., 0., 0., 0., 0.],
- [1., 0., 0., 0., 0., 0., 0., 0.],
- [0., 1., 0., 0., 0., 0., 0., 0.],
- [0., 0., 1., 0., 0., 0., 0., 0.],
- [0., 0., 0., 1., 0., 0., 0., 0.]
- ],
- # Action: paint ladder
- [
- [0., 0., 1., 0., 0., 0., 0., 0.],
- [0., 0., 0., 1., 0., 0., 0., 0.],
- [0., 0., 1., 0., 0., 0., 0., 0.],
- [0., 0., 0., 1., 0., 0., 0., 0.],
- [0., 0., 0., 0., 0., 0., 1., 0.],
- [0., 0., 0., 0., 0., 0., 0., 1.],
- [0., 0., 0., 0., 0., 0., 1., 0.],
- [0., 0., 0., 0., 0., 0., 0., 1.]
- ],
- # Action: paint ceiling
- [
- [1., 0., 0., 0., 0., 0., 0., 0.],
- [0., 1., 0., 0., 0., 0., 0., 0.],
- [0., 0., 1., 0., 0., 0., 0., 0.],
- [0., 0., 0., 1., 0., 0., 0., 0.],
- [0., 0., 0., 0., 0., 1., 0., 0.],
- [0., 0., 0., 0., 0., 1., 0., 0.],
- [0., 0., 0., 0., 0., 0., 0., 1.],
- [0., 0., 0., 0., 0., 0., 0., 1.]
- ]
- ])
- R = np.array([
- [-1., -1., -1., -1.],
- [-1., -1., 10., -1.],
- [-5., -1., -1., -1.],
- [0., 0., 0., 0.],
- [-1., -1., -5., -1.],
- [-1., -1., -5., -1.],
- [-1., -1., -5., -1.],
- [-1., 10., -5., -1.]
- ])
- return (P, R)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement