Guest User

output

a guest
Jul 26th, 2025
20
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 66.51 KB | None | 0 0
  1. python test.py
  2. <removed>/envs/rllib/lib/python3.9/site-packages/ray/tune/impl/tuner_internal.py:144: RayDeprecationWarning: The `RunConfig` class should be imported from `ray.tune` when passing it to the Tuner. Please update your imports. See this issue for more context and migration options: https://github.com/ray-project/ray/issues/49454. Disable these warnings by setting the environment variable: RAY_TRAIN_ENABLE_V2_MIGRATION_WARNINGS=0
  3. _log_deprecation_warning(
  4. 2025-07-26 05:23:08,028 INFO worker.py:1917 -- Started a local Ray instance.
  5. 2025-07-26 05:23:08,671 INFO tune.py:253 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `Tuner(...)`.
  6. 2025-07-26 05:23:08,676 INFO tensorboardx.py:193 -- pip install "ray[tune]" to see TensorBoard files.
  7. 2025-07-26 05:23:08,676 WARNING callback.py:136 -- The TensorboardX logger cannot be instantiated because either TensorboardX or one of it's dependencies is not installed. Please make sure you have the latest version of TensorboardX installed: `pip install -U tensorboardx`
  8. ╭────────────────────────────────────────────────────────────╮
  9. │ Configuration for experiment PPO_2025-07-26_05-23-05 │
  10. ├────────────────────────────────────────────────────────────┤
  11. │ Search algorithm BasicVariantGenerator │
  12. │ Scheduler FIFOScheduler │
  13. │ Number of trials 2 │
  14. ╰────────────────────────────────────────────────────────────╯
  15.  
  16. View detailed results here: <removed>
  17. 2025-07-26 05:23:08,687 WARNING algorithm_config.py:5033 -- You are running PPO on the new API stack! This is the new default behavior for this algorithm. If you don't want to use the new API stack, set `config.api_stack(enable_rl_module_and_learner=False,enable_env_runner_and_connector_v2=False)`. For a detailed migration guide, see here: https://docs.ray.io/en/master/rllib/new-api-stack-migration-guide.html
  18.  
  19. Trial status: 2 PENDING
  20. Current time: 2025-07-26 05:23:08. Total running time: 0s
  21. Logical resource usage: 0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
  22. ╭─────────────────────────────────────────────────╮
  23. │ Trial name status lr │
  24. ├─────────────────────────────────────────────────┤
  25. │ PPO_Pendulum-v1_86789_00000 PENDING 0.001 │
  26. │ PPO_Pendulum-v1_86789_00001 PENDING 0.0001 │
  27. ╰─────────────────────────────────────────────────╯
  28. (PPO pid=1241) 2025-07-26 05:23:13,597 WARNING algorithm_config.py:5033 -- You are running PPO on the new API stack! This is the new default behavior for this algorithm. If you don't want to use the new API stack, set `config.api_stack(enable_rl_module_and_learner=False,enable_env_runner_and_connector_v2=False)`. For a detailed migration guide, see here: https://docs.ray.io/en/master/rllib/new-api-stack-migration-guide.html
  29. (SingleAgentEnvRunner pid=1337) 2025-07-26 05:23:18,240 WARNING deprecation.py:50 -- DeprecationWarning: `RLModule(config=[RLModuleConfig object])` has been deprecated. Use `RLModule(observation_space=.., action_space=.., inference_only=.., model_config=.., catalog_class=..)` instead. This will raise an error in the future!
  30. (_WrappedExecutable pid=1460) Setting up process group for: env:// [rank=0, world_size=1]
  31. (PPO pid=1241) 2025-07-26 05:23:20,142 WARNING deprecation.py:50 -- DeprecationWarning: `RLModule(config=[RLModuleConfig object])` has been deprecated. Use `RLModule(observation_space=.., action_space=.., inference_only=.., model_config=.., catalog_class=..)` instead. This will raise an error in the future! [repeated 2x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
  32.  
  33. Trial PPO_Pendulum-v1_86789_00000 started with configuration:
  34. ╭───────────────────────────────────────────────────────────────────────────╮
  35. │ Trial PPO_Pendulum-v1_86789_00000 config │
  36. ├───────────────────────────────────────────────────────────────────────────┤
  37. │ _disable_action_flattening False │
  38. │ _disable_execution_plan_api -1 │
  39. │ _disable_initialize_loss_from_dummy_batch False │
  40. │ _disable_preprocessor_api False │
  41. │ _dont_auto_sync_env_runner_states False │
  42. │ _enable_rl_module_api -1 │
  43. │ _env_to_module_connector │
  44. │ _fake_gpus False │
  45. │ _is_atari │
  46. │ _is_online True │
  47. │ _learner_class │
  48. │ _learner_connector │
  49. │ _module_to_env_connector │
  50. │ _prior_exploration_config/type StochasticSampling │
  51. │ _rl_module_spec │
  52. │ _tf_policy_handles_more_than_one_loss False │
  53. │ _torch_grad_scaler_class │
  54. │ _torch_lr_scheduler_classes │
  55. │ _train_batch_size_per_learner │
  56. │ _use_msgpack_checkpoints False │
  57. │ _validate_config True │
  58. │ action_mask_key action_mask │
  59. │ action_space │
  60. │ actions_in_input_normalized False │
  61. │ add_default_connectors_to_env_to_module_pipeline True │
  62. │ add_default_connectors_to_learner_pipeline True │
  63. │ add_default_connectors_to_module_to_env_pipeline True │
  64. │ always_attach_evaluation_results -1 │
  65. │ auto_wrap_old_gym_envs -1 │
  66. │ batch_mode complete_episodes │
  67. │ broadcast_env_runner_states True │
  68. │ broadcast_offline_eval_runner_states False │
  69. │ callbacks ...s.RLlibCallback'> │
  70. │ callbacks_on_algorithm_init │
  71. │ callbacks_on_checkpoint_loaded │
  72. │ callbacks_on_env_runners_recreated │
  73. │ callbacks_on_environment_created │
  74. │ callbacks_on_episode_created │
  75. │ callbacks_on_episode_end │
  76. │ callbacks_on_episode_start │
  77. │ callbacks_on_episode_step │
  78. │ callbacks_on_evaluate_end │
  79. │ callbacks_on_evaluate_offline_end │
  80. │ callbacks_on_evaluate_offline_start │
  81. │ callbacks_on_evaluate_start │
  82. │ callbacks_on_offline_eval_runners_recreated │
  83. │ callbacks_on_sample_end │
  84. │ callbacks_on_train_result │
  85. │ checkpoint_trainable_policies_only False │
  86. │ clip_actions False │
  87. │ clip_param 0.3 │
  88. │ clip_rewards │
  89. │ compress_observations False │
  90. │ count_steps_by env_steps │
  91. │ create_env_on_driver False │
  92. │ create_local_env_runner True │
  93. │ custom_async_evaluation_function -1 │
  94. │ custom_eval_function │
  95. │ dataset_num_iters_per_eval_runner 1 │
  96. │ dataset_num_iters_per_learner │
  97. │ delay_between_env_runner_restarts_s 60. │
  98. │ disable_env_checking False │
  99. │ eager_max_retraces 20 │
  100. │ eager_tracing True │
  101. │ enable_async_evaluation -1 │
  102. │ enable_connectors -1 │
  103. │ enable_env_runner_and_connector_v2 True │
  104. │ enable_rl_module_and_learner True │
  105. │ enable_tf1_exec_eagerly False │
  106. │ entropy_coeff 0. │
  107. │ entropy_coeff_schedule │
  108. │ env Pendulum-v1 │
  109. │ env_runner_cls │
  110. │ env_runner_health_probe_timeout_s 30. │
  111. │ env_runner_restore_timeout_s 1800. │
  112. │ env_task_fn -1 │
  113. │ episode_lookback_horizon 1 │
  114. │ episodes_to_numpy True │
  115. │ evaluation_auto_duration_max_env_steps_per_sample 2000 │
  116. │ evaluation_auto_duration_min_env_steps_per_sample 100 │
  117. │ evaluation_config │
  118. │ evaluation_duration 10 │
  119. │ evaluation_duration_unit episodes │
  120. │ evaluation_force_reset_envs_before_iteration True │
  121. │ evaluation_interval │
  122. │ evaluation_num_env_runners 0 │
  123. │ evaluation_parallel_to_training False │
  124. │ evaluation_sample_timeout_s 120. │
  125. │ explore True │
  126. │ export_native_model_files False │
  127. │ fake_sampler False │
  128. │ framework torch │
  129. │ gamma 0.99 │
  130. │ grad_clip │
  131. │ grad_clip_by global_norm │
  132. │ gym_env_vectorize_mode SYNC │
  133. │ ignore_env_runner_failures False │
  134. │ ignore_final_observation False │
  135. │ ignore_offline_eval_runner_failures False │
  136. │ in_evaluation False │
  137. │ input sampler │
  138. │ input_compress_columns ['obs', 'new_obs'] │
  139. │ input_filesystem │
  140. │ input_read_batch_size │
  141. │ input_read_episodes False │
  142. │ input_read_method read_parquet │
  143. │ input_read_sample_batches False │
  144. │ input_spaces_jsonable True │
  145. │ keep_per_episode_custom_metrics False │
  146. │ kl_coeff 0.2 │
  147. │ kl_target 0.01 │
  148. │ lambda 1. │
  149. │ local_gpu_idx 0 │
  150. │ local_tf_session_args/inter_op_parallelism_threads 8 │
  151. │ local_tf_session_args/intra_op_parallelism_threads 8 │
  152. │ log_gradients True │
  153. │ log_level WARN │
  154. │ log_sys_usage True │
  155. │ logger_config │
  156. │ logger_creator │
  157. │ lr 0.001 │
  158. │ lr_schedule │
  159. │ materialize_data False │
  160. │ materialize_mapped_data True │
  161. │ max_num_env_runner_restarts 1000 │
  162. │ max_num_offline_eval_runner_restarts 1000 │
  163. │ max_requests_in_flight_per_aggregator_actor 3 │
  164. │ max_requests_in_flight_per_env_runner 1 │
  165. │ max_requests_in_flight_per_learner 3 │
  166. │ max_requests_in_flight_per_offline_eval_runner 1 │
  167. │ merge_env_runner_states training_only │
  168. │ metrics_episode_collection_timeout_s 60. │
  169. │ metrics_num_episodes_for_smoothing 100 │
  170. │ min_sample_timesteps_per_iteration 0 │
  171. │ min_time_s_per_iteration │
  172. │ min_train_timesteps_per_iteration 0 │
  173. │ minibatch_size 128 │
  174. │ model/_disable_action_flattening False │
  175. │ model/_disable_preprocessor_api False │
  176. │ model/_time_major False │
  177. │ model/_use_default_native_models -1 │
  178. │ model/always_check_shapes False │
  179. │ model/attention_dim 64 │
  180. │ model/attention_head_dim 32 │
  181. │ model/attention_init_gru_gate_bias 2.0 │
  182. │ model/attention_memory_inference 50 │
  183. │ model/attention_memory_training 50 │
  184. │ model/attention_num_heads 1 │
  185. │ model/attention_num_transformer_units 1 │
  186. │ model/attention_position_wise_mlp_dim 32 │
  187. │ model/attention_use_n_prev_actions 0 │
  188. │ model/attention_use_n_prev_rewards 0 │
  189. │ model/conv_activation relu │
  190. │ model/conv_bias_initializer │
  191. │ model/conv_bias_initializer_config │
  192. │ model/conv_filters │
  193. │ model/conv_kernel_initializer │
  194. │ model/conv_kernel_initializer_config │
  195. │ model/conv_transpose_bias_initializer │
  196. │ model/conv_transpose_bias_initializer_config │
  197. │ model/conv_transpose_kernel_initializer │
  198. │ model/conv_transpose_kernel_initializer_config │
  199. │ model/custom_action_dist │
  200. │ model/custom_model │
  201. │ model/custom_preprocessor │
  202. │ model/dim 84 │
  203. │ model/encoder_latent_dim │
  204. │ model/fcnet_activation tanh │
  205. │ model/fcnet_bias_initializer │
  206. │ model/fcnet_bias_initializer_config │
  207. │ model/fcnet_hiddens [256, 256] │
  208. │ model/fcnet_weights_initializer │
  209. │ model/fcnet_weights_initializer_config │
  210. │ model/framestack True │
  211. │ model/free_log_std False │
  212. │ model/grayscale False │
  213. │ model/log_std_clip_param 20.0 │
  214. │ model/lstm_bias_initializer │
  215. │ model/lstm_bias_initializer_config │
  216. │ model/lstm_cell_size 256 │
  217. │ model/lstm_use_prev_action False │
  218. │ model/lstm_use_prev_action_reward -1 │
  219. │ model/lstm_use_prev_reward False │
  220. │ model/lstm_weights_initializer │
  221. │ model/lstm_weights_initializer_config │
  222. │ model/max_seq_len 20 │
  223. │ model/no_final_linear False │
  224. │ model/post_fcnet_activation relu │
  225. │ model/post_fcnet_bias_initializer │
  226. │ model/post_fcnet_bias_initializer_config │
  227. │ model/post_fcnet_hiddens [] │
  228. │ model/post_fcnet_weights_initializer │
  229. │ model/post_fcnet_weights_initializer_config │
  230. │ model/use_attention False │
  231. │ model/use_lstm False │
  232. │ model/vf_share_layers False │
  233. │ model/zero_mean True │
  234. │ normalize_actions True │
  235. │ num_aggregator_actors_per_learner 0 │
  236. │ num_consecutive_env_runner_failures_tolerance 100 │
  237. │ num_cpus_for_main_process 1 │
  238. │ num_cpus_per_env_runner 1 │
  239. │ num_cpus_per_learner auto │
  240. │ num_cpus_per_offline_eval_runner 1 │
  241. │ num_env_runners 2 │
  242. │ num_envs_per_env_runner 1 │
  243. │ num_epochs 30 │
  244. │ num_gpus 0 │
  245. │ num_gpus_per_env_runner 0 │
  246. │ num_gpus_per_learner 1 │
  247. │ num_gpus_per_offline_eval_runner 0 │
  248. │ num_learners 1 │
  249. │ num_offline_eval_runners 0 │
  250. │ observation_filter NoFilter │
  251. │ observation_fn │
  252. │ observation_space │
  253. │ offline_data_class │
  254. │ offline_eval_batch_size_per_runner 256 │
  255. │ offline_eval_rl_module_inference_only False │
  256. │ offline_eval_runner_class │
  257. │ offline_eval_runner_health_probe_timeout_s 30. │
  258. │ offline_eval_runner_restore_timeout_s 1800. │
  259. │ offline_evaluation_duration 1 │
  260. │ offline_evaluation_interval │
  261. │ offline_evaluation_parallel_to_training False │
  262. │ offline_evaluation_timeout_s 120. │
  263. │ offline_evaluation_type │
  264. │ offline_loss_for_module_fn │
  265. │ offline_sampling False │
  266. │ ope_split_batch_by_episode True │
  267. │ output │
  268. │ output_compress_columns ['obs', 'new_obs'] │
  269. │ output_filesystem │
  270. │ output_max_file_size 67108864 │
  271. │ output_max_rows_per_file │
  272. │ output_write_episodes True │
  273. │ output_write_method write_parquet │
  274. │ output_write_remaining_data False │
  275. │ placement_strategy PACK │
  276. │ policies/default_policy ...None, None, None) │
  277. │ policies_to_train │
  278. │ policy_map_cache -1 │
  279. │ policy_map_capacity 100 │
  280. │ policy_mapping_fn ...t 0x7f4ac3745670> │
  281. │ policy_states_are_swappable False │
  282. │ postprocess_inputs False │
  283. │ prelearner_buffer_class │
  284. │ prelearner_class │
  285. │ prelearner_module_synch_period 10 │
  286. │ preprocessor_pref deepmind │
  287. │ remote_env_batch_wait_ms 0 │
  288. │ remote_worker_envs False │
  289. │ render_env False │
  290. │ replay_sequence_length │
  291. │ restart_failed_env_runners True │
  292. │ restart_failed_offline_eval_runners True │
  293. │ restart_failed_sub_environments False │
  294. │ rollout_fragment_length auto │
  295. │ sample_collector ...leListCollector'> │
  296. │ sample_timeout_s 60. │
  297. │ sampler_perf_stats_ema_coef │
  298. │ seed │
  299. │ sgd_minibatch_size -1 │
  300. │ shuffle_batch_per_epoch True │
  301. │ shuffle_buffer_size 0 │
  302. │ simple_optimizer -1 │
  303. │ sync_filters_on_rollout_workers_timeout_s 10. │
  304. │ synchronize_filters -1 │
  305. │ tf_session_args/allow_soft_placement True │
  306. │ tf_session_args/device_count/CPU 1 │
  307. │ tf_session_args/gpu_options/allow_growth True │
  308. │ tf_session_args/inter_op_parallelism_threads 2 │
  309. │ tf_session_args/intra_op_parallelism_threads 2 │
  310. │ tf_session_args/log_device_placement False │
  311. │ torch_compile_learner False │
  312. │ torch_compile_learner_dynamo_backend inductor │
  313. │ torch_compile_learner_dynamo_mode │
  314. │ torch_compile_learner_what_to_compile ...ile.FORWARD_TRAIN │
  315. │ torch_compile_worker False │
  316. │ torch_compile_worker_dynamo_backend onnxrt │
  317. │ torch_compile_worker_dynamo_mode │
  318. │ torch_skip_nan_gradients False │
  319. │ train_batch_size 4000 │
  320. │ update_worker_filter_stats True │
  321. │ use_critic True │
  322. │ use_gae True │
  323. │ use_kl_loss True │
  324. │ use_worker_filter_stats True │
  325. │ validate_env_runners_after_construction True │
  326. │ validate_offline_eval_runners_after_construction True │
  327. │ vf_clip_param 10. │
  328. │ vf_loss_coeff 1. │
  329. │ vf_share_layers -1 │
  330. │ worker_cls -1 │
  331. ╰───────────────────────────────────────────────────────────────────────────╯
  332. (PPO pid=1241) Trainable.setup took 17.221 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
  333. (PPO pid=1241) Install gputil for GPU system monitoring.
  334. (_WrappedExecutable pid=1460) 2025-07-26 05:23:28,733 WARNING deprecation.py:50 -- DeprecationWarning: `RLModule(config=[RLModuleConfig object])` has been deprecated. Use `RLModule(observation_space=.., action_space=.., inference_only=.., model_config=.., catalog_class=..)` instead. This will raise an error in the future!
  335.  
  336. Trial status: 1 RUNNING | 1 PENDING
  337. Current time: 2025-07-26 05:23:38. Total running time: 30s
  338. Logical resource usage: 3.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:G)
  339. ╭─────────────────────────────────────────────────╮
  340. │ Trial name status lr │
  341. ├─────────────────────────────────────────────────┤
  342. │ PPO_Pendulum-v1_86789_00000 RUNNING 0.001 │
  343. │ PPO_Pendulum-v1_86789_00001 PENDING 0.0001 │
  344. ╰─────────────────────────────────────────────────╯
  345. Trial status: 1 RUNNING | 1 PENDING
  346. Current time: 2025-07-26 05:24:08. Total running time: 1min 0s
  347. Logical resource usage: 3.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:G)
  348. ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
  349. │ Trial name status lr iter total time (s) ...lls_per_iteration ..._sampled_lifetime │
  350. ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  351. │ PPO_Pendulum-v1_86789_00000 RUNNING 0.001 4 37.7262 1 34000 │
  352. │ PPO_Pendulum-v1_86789_00001 PENDING 0.0001 │
  353. ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
  354.  
  355. Trial PPO_Pendulum-v1_86789_00000 completed after 5 iterations at 2025-07-26 05:24:17. Total running time: 1min 8s
  356. ╭───────────────────────────────────────────────────────╮
  357. │ Trial PPO_Pendulum-v1_86789_00000 result │
  358. ├───────────────────────────────────────────────────────┤
  359. │ env_runners/episode_len_mean 200 │
  360. │ env_runners/episode_return_mean -1036.44 │
  361. │ num_env_steps_sampled_lifetime 38000 │
  362. ╰───────────────────────────────────────────────────────╯
  363. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=1241) Checkpoint successfully created at: Checkpoint(filesystem=local, path=<removed>)
  364. (_WrappedExecutable pid=1460) [rank0]:[W726 05:24:18.981079025 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
  365. (PPO pid=1557) 2025-07-26 05:24:23,947 WARNING algorithm_config.py:5033 -- You are running PPO on the new API stack! This is the new default behavior for this algorithm. If you don't want to use the new API stack, set `config.api_stack(enable_rl_module_and_learner=False,enable_env_runner_and_connector_v2=False)`. For a detailed migration guide, see here: https://docs.ray.io/en/master/rllib/new-api-stack-migration-guide.html
  366. (SingleAgentEnvRunner pid=1620) 2025-07-26 05:24:28,905 WARNING deprecation.py:50 -- DeprecationWarning: `RLModule(config=[RLModuleConfig object])` has been deprecated. Use `RLModule(observation_space=.., action_space=.., inference_only=.., model_config=.., catalog_class=..)` instead. This will raise an error in the future!
  367. (_WrappedExecutable pid=1744) Setting up process group for: env:// [rank=0, world_size=1]
  368. (PPO pid=1557) 2025-07-26 05:24:30,851 WARNING deprecation.py:50 -- DeprecationWarning: `RLModule(config=[RLModuleConfig object])` has been deprecated. Use `RLModule(observation_space=.., action_space=.., inference_only=.., model_config=.., catalog_class=..)` instead. This will raise an error in the future! [repeated 2x across cluster]
  369.  
  370. Trial status: 1 TERMINATED | 1 PENDING
  371. Current time: 2025-07-26 05:24:39. Total running time: 1min 30s
  372. Logical resource usage: 3.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:G)
  373. ╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
  374. │ Trial name status lr iter total time (s) ...lls_per_iteration ..._sampled_lifetime │
  375. ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  376. │ PPO_Pendulum-v1_86789_00000 TERMINATED 0.001 5 46.425 1 38000 │
  377. │ PPO_Pendulum-v1_86789_00001 PENDING 0.0001 │
  378. ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
  379. (PPO pid=1557) Trainable.setup took 17.251 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
  380. (PPO pid=1557) Install gputil for GPU system monitoring.
  381.  
  382. Trial PPO_Pendulum-v1_86789_00001 started with configuration:
  383. ╭───────────────────────────────────────────────────────────────────────────╮
  384. │ Trial PPO_Pendulum-v1_86789_00001 config │
  385. ├───────────────────────────────────────────────────────────────────────────┤
  386. │ _disable_action_flattening False │
  387. │ _disable_execution_plan_api -1 │
  388. │ _disable_initialize_loss_from_dummy_batch False │
  389. │ _disable_preprocessor_api False │
  390. │ _dont_auto_sync_env_runner_states False │
  391. │ _enable_rl_module_api -1 │
  392. │ _env_to_module_connector │
  393. │ _fake_gpus False │
  394. │ _is_atari │
  395. │ _is_online True │
  396. │ _learner_class │
  397. │ _learner_connector │
  398. │ _module_to_env_connector │
  399. │ _prior_exploration_config/type StochasticSampling │
  400. │ _rl_module_spec │
  401. │ _tf_policy_handles_more_than_one_loss False │
  402. │ _torch_grad_scaler_class │
  403. │ _torch_lr_scheduler_classes │
  404. │ _train_batch_size_per_learner │
  405. │ _use_msgpack_checkpoints False │
  406. │ _validate_config True │
  407. │ action_mask_key action_mask │
  408. │ action_space │
  409. │ actions_in_input_normalized False │
  410. │ add_default_connectors_to_env_to_module_pipeline True │
  411. │ add_default_connectors_to_learner_pipeline True │
  412. │ add_default_connectors_to_module_to_env_pipeline True │
  413. │ always_attach_evaluation_results -1 │
  414. │ auto_wrap_old_gym_envs -1 │
  415. │ batch_mode complete_episodes │
  416. │ broadcast_env_runner_states True │
  417. │ broadcast_offline_eval_runner_states False │
  418. │ callbacks ...s.RLlibCallback'> │
  419. │ callbacks_on_algorithm_init │
  420. │ callbacks_on_checkpoint_loaded │
  421. │ callbacks_on_env_runners_recreated │
  422. │ callbacks_on_environment_created │
  423. │ callbacks_on_episode_created │
  424. │ callbacks_on_episode_end │
  425. │ callbacks_on_episode_start │
  426. │ callbacks_on_episode_step │
  427. │ callbacks_on_evaluate_end │
  428. │ callbacks_on_evaluate_offline_end │
  429. │ callbacks_on_evaluate_offline_start │
  430. │ callbacks_on_evaluate_start │
  431. │ callbacks_on_offline_eval_runners_recreated │
  432. │ callbacks_on_sample_end │
  433. │ callbacks_on_train_result │
  434. │ checkpoint_trainable_policies_only False │
  435. │ clip_actions False │
  436. │ clip_param 0.3 │
  437. │ clip_rewards │
  438. │ compress_observations False │
  439. │ count_steps_by env_steps │
  440. │ create_env_on_driver False │
  441. │ create_local_env_runner True │
  442. │ custom_async_evaluation_function -1 │
  443. │ custom_eval_function │
  444. │ dataset_num_iters_per_eval_runner 1 │
  445. │ dataset_num_iters_per_learner │
  446. │ delay_between_env_runner_restarts_s 60. │
  447. │ disable_env_checking False │
  448. │ eager_max_retraces 20 │
  449. │ eager_tracing True │
  450. │ enable_async_evaluation -1 │
  451. │ enable_connectors -1 │
  452. │ enable_env_runner_and_connector_v2 True │
  453. │ enable_rl_module_and_learner True │
  454. │ enable_tf1_exec_eagerly False │
  455. │ entropy_coeff 0. │
  456. │ entropy_coeff_schedule │
  457. │ env Pendulum-v1 │
  458. │ env_runner_cls │
  459. │ env_runner_health_probe_timeout_s 30. │
  460. │ env_runner_restore_timeout_s 1800. │
  461. │ env_task_fn -1 │
  462. │ episode_lookback_horizon 1 │
  463. │ episodes_to_numpy True │
  464. │ evaluation_auto_duration_max_env_steps_per_sample 2000 │
  465. │ evaluation_auto_duration_min_env_steps_per_sample 100 │
  466. │ evaluation_config │
  467. │ evaluation_duration 10 │
  468. │ evaluation_duration_unit episodes │
  469. │ evaluation_force_reset_envs_before_iteration True │
  470. │ evaluation_interval │
  471. │ evaluation_num_env_runners 0 │
  472. │ evaluation_parallel_to_training False │
  473. │ evaluation_sample_timeout_s 120. │
  474. │ explore True │
  475. │ export_native_model_files False │
  476. │ fake_sampler False │
  477. │ framework torch │
  478. │ gamma 0.99 │
  479. │ grad_clip │
  480. │ grad_clip_by global_norm │
  481. │ gym_env_vectorize_mode SYNC │
  482. │ ignore_env_runner_failures False │
  483. │ ignore_final_observation False │
  484. │ ignore_offline_eval_runner_failures False │
  485. │ in_evaluation False │
  486. │ input sampler │
  487. │ input_compress_columns ['obs', 'new_obs'] │
  488. │ input_filesystem │
  489. │ input_read_batch_size │
  490. │ input_read_episodes False │
  491. │ input_read_method read_parquet │
  492. │ input_read_sample_batches False │
  493. │ input_spaces_jsonable True │
  494. │ keep_per_episode_custom_metrics False │
  495. │ kl_coeff 0.2 │
  496. │ kl_target 0.01 │
  497. │ lambda 1. │
  498. │ local_gpu_idx 0 │
  499. │ local_tf_session_args/inter_op_parallelism_threads 8 │
  500. │ local_tf_session_args/intra_op_parallelism_threads 8 │
  501. │ log_gradients True │
  502. │ log_level WARN │
  503. │ log_sys_usage True │
  504. │ logger_config │
  505. │ logger_creator │
  506. │ lr 0.0001 │
  507. │ lr_schedule │
  508. │ materialize_data False │
  509. │ materialize_mapped_data True │
  510. │ max_num_env_runner_restarts 1000 │
  511. │ max_num_offline_eval_runner_restarts 1000 │
  512. │ max_requests_in_flight_per_aggregator_actor 3 │
  513. │ max_requests_in_flight_per_env_runner 1 │
  514. │ max_requests_in_flight_per_learner 3 │
  515. │ max_requests_in_flight_per_offline_eval_runner 1 │
  516. │ merge_env_runner_states training_only │
  517. │ metrics_episode_collection_timeout_s 60. │
  518. │ metrics_num_episodes_for_smoothing 100 │
  519. │ min_sample_timesteps_per_iteration 0 │
  520. │ min_time_s_per_iteration │
  521. │ min_train_timesteps_per_iteration 0 │
  522. │ minibatch_size 128 │
  523. │ model/_disable_action_flattening False │
  524. │ model/_disable_preprocessor_api False │
  525. │ model/_time_major False │
  526. │ model/_use_default_native_models -1 │
  527. │ model/always_check_shapes False │
  528. │ model/attention_dim 64 │
  529. │ model/attention_head_dim 32 │
  530. │ model/attention_init_gru_gate_bias 2.0 │
  531. │ model/attention_memory_inference 50 │
  532. │ model/attention_memory_training 50 │
  533. │ model/attention_num_heads 1 │
  534. │ model/attention_num_transformer_units 1 │
  535. │ model/attention_position_wise_mlp_dim 32 │
  536. │ model/attention_use_n_prev_actions 0 │
  537. │ model/attention_use_n_prev_rewards 0 │
  538. │ model/conv_activation relu │
  539. │ model/conv_bias_initializer │
  540. │ model/conv_bias_initializer_config │
  541. │ model/conv_filters │
  542. │ model/conv_kernel_initializer │
  543. │ model/conv_kernel_initializer_config │
  544. │ model/conv_transpose_bias_initializer │
  545. │ model/conv_transpose_bias_initializer_config │
  546. │ model/conv_transpose_kernel_initializer │
  547. │ model/conv_transpose_kernel_initializer_config │
  548. │ model/custom_action_dist │
  549. │ model/custom_model │
  550. │ model/custom_preprocessor │
  551. │ model/dim 84 │
  552. │ model/encoder_latent_dim │
  553. │ model/fcnet_activation tanh │
  554. │ model/fcnet_bias_initializer │
  555. │ model/fcnet_bias_initializer_config │
  556. │ model/fcnet_hiddens [256, 256] │
  557. │ model/fcnet_weights_initializer │
  558. │ model/fcnet_weights_initializer_config │
  559. │ model/framestack True │
  560. │ model/free_log_std False │
  561. │ model/grayscale False │
  562. │ model/log_std_clip_param 20.0 │
  563. │ model/lstm_bias_initializer │
  564. │ model/lstm_bias_initializer_config │
  565. │ model/lstm_cell_size 256 │
  566. │ model/lstm_use_prev_action False │
  567. │ model/lstm_use_prev_action_reward -1 │
  568. │ model/lstm_use_prev_reward False │
  569. │ model/lstm_weights_initializer │
  570. │ model/lstm_weights_initializer_config │
  571. │ model/max_seq_len 20 │
  572. │ model/no_final_linear False │
  573. │ model/post_fcnet_activation relu │
  574. │ model/post_fcnet_bias_initializer │
  575. │ model/post_fcnet_bias_initializer_config │
  576. │ model/post_fcnet_hiddens [] │
  577. │ model/post_fcnet_weights_initializer │
  578. │ model/post_fcnet_weights_initializer_config │
  579. │ model/use_attention False │
  580. │ model/use_lstm False │
  581. │ model/vf_share_layers False │
  582. │ model/zero_mean True │
  583. │ normalize_actions True │
  584. │ num_aggregator_actors_per_learner 0 │
  585. │ num_consecutive_env_runner_failures_tolerance 100 │
  586. │ num_cpus_for_main_process 1 │
  587. │ num_cpus_per_env_runner 1 │
  588. │ num_cpus_per_learner auto │
  589. │ num_cpus_per_offline_eval_runner 1 │
  590. │ num_env_runners 2 │
  591. │ num_envs_per_env_runner 1 │
  592. │ num_epochs 30 │
  593. │ num_gpus 0 │
  594. │ num_gpus_per_env_runner 0 │
  595. │ num_gpus_per_learner 1 │
  596. │ num_gpus_per_offline_eval_runner 0 │
  597. │ num_learners 1 │
  598. │ num_offline_eval_runners 0 │
  599. │ observation_filter NoFilter │
  600. │ observation_fn │
  601. │ observation_space │
  602. │ offline_data_class │
  603. │ offline_eval_batch_size_per_runner 256 │
  604. │ offline_eval_rl_module_inference_only False │
  605. │ offline_eval_runner_class │
  606. │ offline_eval_runner_health_probe_timeout_s 30. │
  607. │ offline_eval_runner_restore_timeout_s 1800. │
  608. │ offline_evaluation_duration 1 │
  609. │ offline_evaluation_interval │
  610. │ offline_evaluation_parallel_to_training False │
  611. │ offline_evaluation_timeout_s 120. │
  612. │ offline_evaluation_type │
  613. │ offline_loss_for_module_fn │
  614. │ offline_sampling False │
  615. │ ope_split_batch_by_episode True │
  616. │ output │
  617. │ output_compress_columns ['obs', 'new_obs'] │
  618. │ output_filesystem │
  619. │ output_max_file_size 67108864 │
  620. │ output_max_rows_per_file │
  621. │ output_write_episodes True │
  622. │ output_write_method write_parquet │
  623. │ output_write_remaining_data False │
  624. │ placement_strategy PACK │
  625. │ policies/default_policy ...None, None, None) │
  626. │ policies_to_train │
  627. │ policy_map_cache -1 │
  628. │ policy_map_capacity 100 │
  629. │ policy_mapping_fn ...t 0x7f4ac3745670> │
  630. │ policy_states_are_swappable False │
  631. │ postprocess_inputs False │
  632. │ prelearner_buffer_class │
  633. │ prelearner_class │
  634. │ prelearner_module_synch_period 10 │
  635. │ preprocessor_pref deepmind │
  636. │ remote_env_batch_wait_ms 0 │
  637. │ remote_worker_envs False │
  638. │ render_env False │
  639. │ replay_sequence_length │
  640. │ restart_failed_env_runners True │
  641. │ restart_failed_offline_eval_runners True │
  642. │ restart_failed_sub_environments False │
  643. │ rollout_fragment_length auto │
  644. │ sample_collector ...leListCollector'> │
  645. │ sample_timeout_s 60. │
  646. │ sampler_perf_stats_ema_coef │
  647. │ seed │
  648. │ sgd_minibatch_size -1 │
  649. │ shuffle_batch_per_epoch True │
  650. │ shuffle_buffer_size 0 │
  651. │ simple_optimizer -1 │
  652. │ sync_filters_on_rollout_workers_timeout_s 10. │
  653. │ synchronize_filters -1 │
  654. │ tf_session_args/allow_soft_placement True │
  655. │ tf_session_args/device_count/CPU 1 │
  656. │ tf_session_args/gpu_options/allow_growth True │
  657. │ tf_session_args/inter_op_parallelism_threads 2 │
  658. │ tf_session_args/intra_op_parallelism_threads 2 │
  659. │ tf_session_args/log_device_placement False │
  660. │ torch_compile_learner False │
  661. │ torch_compile_learner_dynamo_backend inductor │
  662. │ torch_compile_learner_dynamo_mode │
  663. │ torch_compile_learner_what_to_compile ...ile.FORWARD_TRAIN │
  664. │ torch_compile_worker False │
  665. │ torch_compile_worker_dynamo_backend onnxrt │
  666. │ torch_compile_worker_dynamo_mode │
  667. │ torch_skip_nan_gradients False │
  668. │ train_batch_size 4000 │
  669. │ update_worker_filter_stats True │
  670. │ use_critic True │
  671. │ use_gae True │
  672. │ use_kl_loss True │
  673. │ use_worker_filter_stats True │
  674. │ validate_env_runners_after_construction True │
  675. │ validate_offline_eval_runners_after_construction True │
  676. │ vf_clip_param 10. │
  677. │ vf_loss_coeff 1. │
  678. │ vf_share_layers -1 │
  679. │ worker_cls -1 │
  680. ╰───────────────────────────────────────────────────────────────────────────╯
  681.  
  682. Trial status: 1 TERMINATED | 1 RUNNING
  683. Current time: 2025-07-26 05:25:09. Total running time: 2min 0s
  684. Logical resource usage: 3.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:G)
  685. ╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
  686. │ Trial name status lr iter total time (s) ...lls_per_iteration ..._sampled_lifetime │
  687. ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  688. │ PPO_Pendulum-v1_86789_00001 RUNNING 0.0001 2 19.0405 1 26000 │
  689. │ PPO_Pendulum-v1_86789_00000 TERMINATED 0.001 5 46.425 1 38000 │
  690. ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
  691.  
  692. Trial PPO_Pendulum-v1_86789_00001 completed after 5 iterations at 2025-07-26 05:25:28. Total running time: 2min 19s
  693. ╭───────────────────────────────────────────────────────╮
  694. │ Trial PPO_Pendulum-v1_86789_00001 result │
  695. ├───────────────────────────────────────────────────────┤
  696. │ env_runners/episode_len_mean 200 │
  697. │ env_runners/episode_return_mean -1239.85 │
  698. │ num_env_steps_sampled_lifetime 38000 │
  699. ╰───────────────────────────────────────────────────────╯
  700. 2025-07-26 05:25:28,608 INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '<removed>' in 0.0109s.
  701.  
  702. Trial status: 2 TERMINATED
  703. Current time: 2025-07-26 05:25:28. Total running time: 2min 19s
  704. Logical resource usage: 3.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:G)
  705. ╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
  706. │ Trial name status lr iter total time (s) ...lls_per_iteration ..._sampled_lifetime │
  707. ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  708. │ PPO_Pendulum-v1_86789_00000 TERMINATED 0.001 5 46.425 1 38000 │
  709. │ PPO_Pendulum-v1_86789_00001 TERMINATED 0.0001 5 47.2958 1 38000 │
  710. ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
  711. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=1557) Checkpoint successfully created at: Checkpoint(filesystem=local, path=<removed>)
  712. (_WrappedExecutable pid=1744) 2025-07-26 05:24:39,662 WARNING deprecation.py:50 -- DeprecationWarning: `RLModule(config=[RLModuleConfig object])` has been deprecated. Use `RLModule(observation_space=.., action_space=.., inference_only=.., model_config=.., catalog_class=..)` instead. This will raise an error in the future!
  713.  
  714. (_WrappedExecutable pid=1744) [rank0]:[W726 05:25:29.008109760 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Advertisement
Add Comment
Please, Sign In to add comment