Untitled

Working on the load-teachers branch for now. After 856aac34dcf22d08425473dd8dd8d2c6e90ba086 these commands:

```
python experiments/dqn_train_save_sshots.py --n_parallel 10 --run_ID 0 --n_steps 1000000 --save_as_student --game alien
python experiments/dqn_train_save_sshots.py --n_parallel 10 --run_ID 0 --n_steps 1000000 --save_as_student --game boxing
python experiments/dqn_train_save_sshots.py --n_parallel 10 --run_ID 0 --n_steps 1000000 --save_as_student --game breakout
python experiments/dqn_train_save_sshots.py --n_parallel 10 --run_ID 0 --n_steps 1000000 --save_as_student --game pong
python experiments/dqn_train_save_sshots.py --n_parallel 10 --run_ID 0 --n_steps 1000000 --save_as_student --game seaquest
python experiments/dqn_train_save_sshots.py --n_parallel 10 --run_ID 0 --n_steps 1000000  --save_as_student --game beam_rider
```

- First three commands Feb 24 on takeshi
- Last three commands Feb 25 on stout

which will:

- generate one student per game, where we train for 10M steps.
- BUT, after 1M steps, we save not only the parameters but THE ENTIRE REPLAY BUFFER AT THAT TIME, which is ~16G. It's pre-allocated in advance, so we don't need to worry about the space growing unexpectedly.
- This way, we can use another script to reset the entire student state at that time, and then we can just resume training as if nothing else had changed. This way I can do this many times to see what happens to the variability in training runs if we effectively retrain.
- But of course the more interesting thing would be to see what happens if we load in a teacher or several teachers. This would involve using the teacher data only (as in BatchRL) or maybe a mix of the data [but that is less interesting, I think].

- Alien
- Boxing
- BeamRider
- Breakout
- Pong
- Qbert
- Robotank
- Seaquest