Advertisement
Guest User

Untitled

a guest
Feb 25th, 2020
102
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.80 KB | None | 0 0
  1. Working on the load-teachers branch for now. After 856aac34dcf22d08425473dd8dd8d2c6e90ba086 these commands:
  2.  
  3. ```
  4. python experiments/dqn_train_save_sshots.py --n_parallel 10 --run_ID 0 --n_steps 1000000 --save_as_student --game alien
  5. python experiments/dqn_train_save_sshots.py --n_parallel 10 --run_ID 0 --n_steps 1000000 --save_as_student --game boxing
  6. python experiments/dqn_train_save_sshots.py --n_parallel 10 --run_ID 0 --n_steps 1000000 --save_as_student --game breakout
  7. python experiments/dqn_train_save_sshots.py --n_parallel 10 --run_ID 0 --n_steps 1000000 --save_as_student --game pong
  8. python experiments/dqn_train_save_sshots.py --n_parallel 10 --run_ID 0 --n_steps 1000000 --save_as_student --game seaquest
  9. python experiments/dqn_train_save_sshots.py --n_parallel 10 --run_ID 0 --n_steps 1000000 --save_as_student --game beam_rider
  10. ```
  11.  
  12. - First three commands Feb 24 on takeshi
  13. - Last three commands Feb 25 on stout
  14.  
  15. which will:
  16.  
  17. - generate one student per game, where we train for 10M steps.
  18. - BUT, after 1M steps, we save not only the parameters but THE ENTIRE REPLAY BUFFER AT THAT TIME, which is ~16G. It's pre-allocated in advance, so we don't need to worry about the space growing unexpectedly.
  19. - This way, we can use another script to reset the entire student state at that time, and then we can just resume training as if nothing else had changed. This way I can do this many times to see what happens to the variability in training runs if we effectively retrain.
  20. - But of course the more interesting thing would be to see what happens if we load in a teacher or several teachers. This would involve using the teacher data only (as in BatchRL) or maybe a mix of the data [but that is less interesting, I think].
  21.  
  22. - Alien
  23. - Boxing
  24. - BeamRider
  25. - Breakout
  26. - Pong
  27. - Qbert
  28. - Robotank
  29. - Seaquest
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement