Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- An example - let's have three high-level strategies for a dog (stay_with_the_master, run_after_chicken, and run_after_car), three high-level sensory inputs (sees_chicken, sees_car, sees_master) and an initial state (at time t) of a simple strategy-picking model (I list the probabilities conditioned on an example state at time t):
- $P_t(stay_with_the_master=1|sees_chicken=1, sees_car=0, sees_master=1) = 0.5$
- $P_t(run_after_chicken=1|sees_chicken=1, sees_car=0, sees_master=1) = 0.5$
- $P_t(run_after_car=1|sees_chicken=1, sees_car=0, sees_master=1) = 0$
- Based on this, the dog chooses to run after the chicken, and gets an angry shout from the master in return. As a result, let's say:
- $P_{t+1}(run_after_chicken|sees_chicken=1, sees_car=0, sees_master=1) = 0.75 * P_t(run_after_chicken|sees_chicken=1, sees_car=0, sees_master=1)$
- $P_{t+1}(stay_with_the_master=1|sees_chicken=1, sees_car=0, sees_master=1)$ is proportional to $P_t(stay_with_the_master|sees_chicken=1, sees_car=0, sees_master=1)$
- $P_{t+1}(run_after_car=1|sees_chicken=1, sees_car=0, sees_master=1)$ is proportional to $P_t(run_after_car|sees_chicken=1, sees_car=0, sees_master=1)$
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement