Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- behaviors:
- Agent Controller:
- trainer_type: ppo
- hyperparameters:
- batch_size: 2048
- buffer_size: 20480
- learning_rate: 0.0003
- beta: 0.005
- epsilon: 0.2
- lambd: 0.95
- num_epoch: 3
- learning_rate_schedule: constant
- network_settings:
- normalize: true
- hidden_units: 512
- num_layers: 4
- vis_encode_type: simple
- reward_signals:
- extrinsic:
- gamma: 0.99
- strength: 1.0
- curiosity:
- gamma: 0.99
- strength: 0.02
- encoding_size: 256
- learning_rate: 0.0003
- keep_checkpoints: 5
- max_steps: 50000000
- time_horizon: 1000
- summary_freq: 10000
- threaded: false
- self_play:
- save_steps: 50000
- team_change: 100000
- swap_steps: 2000
- window: 10
- play_against_latest_model_ratio: 0.5
- initial_elo: 1200.0
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement