theo_Fan

Untitled

Jul 30th, 2025 (edited)
12
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 88.45 KB | None | 0 0
  1. (rl) ~ % python test_ray.py
  2. /data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/tune/impl/tuner_internal.py:144: RayDeprecationWarning: The `RunConfig` class should be imported from `ray.tune` when passing it to the Tuner. Please update your imports. See this issue for more context and migration options: https://github.com/ray-project/ray/issues/49454. Disable these warnings by setting the environment variable: RAY_TRAIN_ENABLE_V2_MIGRATION_WARNINGS=0
  3. _log_deprecation_warning(
  4. 2025-07-31 09:09:11,056 INFO worker.py:1917 -- Started a local Ray instance.
  5. 2025-07-31 09:09:12,656 INFO tune.py:253 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `Tuner(...)`.
  6. 2025-07-31 09:09:12,721 WARNING tune_controller.py:2132 -- The maximum number of pending trials has been automatically set to the number of available cluster CPUs, which is high (281 CPUs/pending trials). If you're running an experiment with a large number of trials, this could lead to scheduling overhead. In this case, consider setting the `TUNE_MAX_PENDING_TRIALS_PG` environment variable to the desired maximum number of concurrent pending trials.
  7. 2025-07-31 09:09:12,723 WARNING tune_controller.py:2132 -- The maximum number of pending trials has been automatically set to the number of available cluster CPUs, which is high (281 CPUs/pending trials). If you're running an experiment with a large number of trials, this could lead to scheduling overhead. In this case, consider setting the `TUNE_MAX_PENDING_TRIALS_PG` environment variable to the desired maximum number of concurrent pending trials.
  8. ╭────────────────────────────────────────────────────────────╮
  9. │ Configuration for experiment PPO_2025-07-31_09-09-09 │
  10. ├────────────────────────────────────────────────────────────┤
  11. │ Search algorithm BasicVariantGenerator │
  12. │ Scheduler FIFOScheduler │
  13. │ Number of trials 2 │
  14. ╰────────────────────────────────────────────────────────────╯
  15.  
  16. View detailed results here: /data/home/fzy/ray_results/PPO_2025-07-31_09-09-09
  17. To visualize your results with TensorBoard, run: `tensorboard --logdir /tmp/ray/session_2025-07-31_09-09-09_900802_4059124/artifacts/2025-07-31_09-09-12/PPO_2025-07-31_09-09-09/driver_artifacts`
  18. 2025-07-31 09:09:12,733 WARNING algorithm_config.py:5014 -- You are running PPO on the new API stack! This is the new default behavior for this algorithm. If you don't want to use the new API stack, set `config.api_stack(enable_rl_module_and_learner=False,enable_env_runner_and_connector_v2=False)`. For a detailed migration guide, see here: https://docs.ray.io/en/master/rllib/new-api-stack-migration-guide.html
  19.  
  20. Trial status: 2 PENDING
  21. Current time: 2025-07-31 09:09:13. Total running time: 0s
  22. Logical resource usage: 0/256 CPUs, 0/2 GPUs (0.0/1.0 accelerator_type:G)
  23. ╭─────────────────────────────────────────────────╮
  24. │ Trial name status lr │
  25. ├─────────────────────────────────────────────────┤
  26. │ PPO_Pendulum-v1_f8541_00000 PENDING 0.001 │
  27. │ PPO_Pendulum-v1_f8541_00001 PENDING 0.0001 │
  28. ╰─────────────────────────────────────────────────╯
  29. (PPO pid=4074735) 2025-07-31 09:09:16,616 WARNING algorithm_config.py:5014 -- You are running PPO on the new API stack! This is the new default behavior for this algorithm. If you don't want to use the new API stack, set `config.api_stack(enable_rl_module_and_learner=False,enable_env_runner_and_connector_v2=False)`. For a detailed migration guide, see here: https://docs.ray.io/en/master/rllib/new-api-stack-migration-guide.html
  30. (SingleAgentEnvRunner pid=4075495) 2025-07-31 09:09:20,263 WARNING deprecation.py:50 -- DeprecationWarning: `RLModule(config=[RLModuleConfig object])` has been deprecated. Use `RLModule(observation_space=.., action_space=.., inference_only=.., model_config=.., catalog_class=..)` instead. This will raise an error in the future!
  31. (_WrappedExecutable pid=4076167) Setting up process group for: env:// [rank=0, world_size=1]
  32. (PPO pid=4074734) 2025-07-31 09:09:16,634 WARNING algorithm_config.py:5014 -- You are running PPO on the new API stack! This is the new default behavior for this algorithm. If you don't want to use the new API stack, set `config.api_stack(enable_rl_module_and_learner=False,enable_env_runner_and_connector_v2=False)`. For a detailed migration guide, see here: https://docs.ray.io/en/master/rllib/new-api-stack-migration-guide.html
  33.  
  34. Trial PPO_Pendulum-v1_f8541_00000 started with configuration:
  35. ╭───────────────────────────────────────────────────────────────────────────╮
  36. │ Trial PPO_Pendulum-v1_f8541_00000 config │
  37. ├───────────────────────────────────────────────────────────────────────────┤
  38. │ _disable_action_flattening False │
  39. │ _disable_execution_plan_api -1 │
  40. │ _disable_initialize_loss_from_dummy_batch False │
  41. │ _disable_preprocessor_api False │
  42. │ _dont_auto_sync_env_runner_states False │
  43. │ _enable_rl_module_api -1 │
  44. │ _env_to_module_connector │
  45. │ _fake_gpus False │
  46. │ _is_atari │
  47. │ _is_online True │
  48. │ _learner_class │
  49. │ _learner_connector │
  50. │ _module_to_env_connector │
  51. │ _prior_exploration_config/type StochasticSampling │
  52. │ _rl_module_spec │
  53. │ _tf_policy_handles_more_than_one_loss False │
  54. │ _torch_grad_scaler_class │
  55. │ _torch_lr_scheduler_classes │
  56. │ _train_batch_size_per_learner │
  57. │ _use_msgpack_checkpoints False │
  58. │ _validate_config True │
  59. │ action_mask_key action_mask │
  60. │ action_space │
  61. │ actions_in_input_normalized False │
  62. │ add_default_connectors_to_env_to_module_pipeline True │
  63. │ add_default_connectors_to_learner_pipeline True │
  64. │ add_default_connectors_to_module_to_env_pipeline True │
  65. │ always_attach_evaluation_results -1 │
  66. │ auto_wrap_old_gym_envs -1 │
  67. │ batch_mode complete_episodes │
  68. │ broadcast_env_runner_states True │
  69. │ broadcast_offline_eval_runner_states False │
  70. │ callbacks ...s.RLlibCallback'> │
  71. │ callbacks_on_algorithm_init │
  72. │ callbacks_on_checkpoint_loaded │
  73. │ callbacks_on_env_runners_recreated │
  74. │ callbacks_on_environment_created │
  75. │ callbacks_on_episode_created │
  76. │ callbacks_on_episode_end │
  77. │ callbacks_on_episode_start │
  78. │ callbacks_on_episode_step │
  79. │ callbacks_on_evaluate_end │
  80. │ callbacks_on_evaluate_offline_end │
  81. │ callbacks_on_evaluate_offline_start │
  82. │ callbacks_on_evaluate_start │
  83. │ callbacks_on_offline_eval_runners_recreated │
  84. │ callbacks_on_sample_end │
  85. │ callbacks_on_train_result │
  86. │ checkpoint_trainable_policies_only False │
  87. │ clip_actions False │
  88. │ clip_param 0.3 │
  89. │ clip_rewards │
  90. │ compress_observations False │
  91. │ count_steps_by env_steps │
  92. │ create_env_on_driver False │
  93. │ create_local_env_runner True │
  94. │ custom_async_evaluation_function -1 │
  95. │ custom_eval_function │
  96. │ dataset_num_iters_per_eval_runner 1 │
  97. │ dataset_num_iters_per_learner │
  98. │ delay_between_env_runner_restarts_s 60. │
  99. │ disable_env_checking False │
  100. │ eager_max_retraces 20 │
  101. │ eager_tracing True │
  102. │ enable_async_evaluation -1 │
  103. │ enable_connectors -1 │
  104. │ enable_env_runner_and_connector_v2 True │
  105. │ enable_rl_module_and_learner True │
  106. │ enable_tf1_exec_eagerly False │
  107. │ entropy_coeff 0. │
  108. │ entropy_coeff_schedule │
  109. │ env Pendulum-v1 │
  110. │ env_runner_cls │
  111. │ env_runner_health_probe_timeout_s 30. │
  112. │ env_runner_restore_timeout_s 1800. │
  113. │ env_task_fn -1 │
  114. │ episode_lookback_horizon 1 │
  115. │ episodes_to_numpy True │
  116. │ evaluation_auto_duration_max_env_steps_per_sample 2000 │
  117. │ evaluation_auto_duration_min_env_steps_per_sample 100 │
  118. │ evaluation_config │
  119. │ evaluation_duration 10 │
  120. │ evaluation_duration_unit episodes │
  121. │ evaluation_force_reset_envs_before_iteration True │
  122. │ evaluation_interval │
  123. │ evaluation_num_env_runners 0 │
  124. │ evaluation_parallel_to_training False │
  125. │ evaluation_sample_timeout_s 120. │
  126. │ explore True │
  127. │ export_native_model_files False │
  128. │ fake_sampler False │
  129. │ framework torch │
  130. │ gamma 0.99 │
  131. │ grad_clip │
  132. │ grad_clip_by global_norm │
  133. │ gym_env_vectorize_mode SYNC │
  134. │ ignore_env_runner_failures False │
  135. │ ignore_final_observation False │
  136. │ ignore_offline_eval_runner_failures False │
  137. │ in_evaluation False │
  138. │ input sampler │
  139. │ input_compress_columns ['obs', 'new_obs'] │
  140. │ input_filesystem │
  141. │ input_read_batch_size │
  142. │ input_read_episodes False │
  143. │ input_read_method read_parquet │
  144. │ input_read_sample_batches False │
  145. │ input_spaces_jsonable True │
  146. │ keep_per_episode_custom_metrics False │
  147. │ kl_coeff 0.2 │
  148. │ kl_target 0.01 │
  149. │ lambda 1. │
  150. │ local_gpu_idx 0 │
  151. │ local_tf_session_args/inter_op_parallelism_threads 8 │
  152. │ local_tf_session_args/intra_op_parallelism_threads 8 │
  153. │ log_gradients True │
  154. │ log_level WARN │
  155. │ log_sys_usage True │
  156. │ logger_config │
  157. │ logger_creator │
  158. │ lr 0.001 │
  159. │ lr_schedule │
  160. │ materialize_data False │
  161. │ materialize_mapped_data True │
  162. │ max_num_env_runner_restarts 1000 │
  163. │ max_num_offline_eval_runner_restarts 1000 │
  164. │ max_requests_in_flight_per_aggregator_actor 3 │
  165. │ max_requests_in_flight_per_env_runner 1 │
  166. │ max_requests_in_flight_per_learner 3 │
  167. │ max_requests_in_flight_per_offline_eval_runner 1 │
  168. │ merge_env_runner_states training_only │
  169. │ metrics_episode_collection_timeout_s 60. │
  170. │ metrics_num_episodes_for_smoothing 100 │
  171. │ min_sample_timesteps_per_iteration 0 │
  172. │ min_time_s_per_iteration │
  173. │ min_train_timesteps_per_iteration 0 │
  174. │ minibatch_size 128 │
  175. │ model/_disable_action_flattening False │
  176. │ model/_disable_preprocessor_api False │
  177. │ model/_time_major False │
  178. │ model/_use_default_native_models -1 │
  179. │ model/always_check_shapes False │
  180. │ model/attention_dim 64 │
  181. │ model/attention_head_dim 32 │
  182. │ model/attention_init_gru_gate_bias 2.0 │
  183. │ model/attention_memory_inference 50 │
  184. │ model/attention_memory_training 50 │
  185. │ model/attention_num_heads 1 │
  186. │ model/attention_num_transformer_units 1 │
  187. │ model/attention_position_wise_mlp_dim 32 │
  188. │ model/attention_use_n_prev_actions 0 │
  189. │ model/attention_use_n_prev_rewards 0 │
  190. │ model/conv_activation relu │
  191. │ model/conv_bias_initializer │
  192. │ model/conv_bias_initializer_config │
  193. │ model/conv_filters │
  194. │ model/conv_kernel_initializer │
  195. │ model/conv_kernel_initializer_config │
  196. │ model/conv_transpose_bias_initializer │
  197. │ model/conv_transpose_bias_initializer_config │
  198. │ model/conv_transpose_kernel_initializer │
  199. │ model/conv_transpose_kernel_initializer_config │
  200. │ model/custom_action_dist │
  201. │ model/custom_model │
  202. │ model/custom_preprocessor │
  203. │ model/dim 84 │
  204. │ model/encoder_latent_dim │
  205. │ model/fcnet_activation tanh │
  206. │ model/fcnet_bias_initializer │
  207. │ model/fcnet_bias_initializer_config │
  208. │ model/fcnet_hiddens [256, 256] │
  209. │ model/fcnet_weights_initializer │
  210. │ model/fcnet_weights_initializer_config │
  211. │ model/framestack True │
  212. │ model/free_log_std False │
  213. │ model/grayscale False │
  214. │ model/log_std_clip_param 20.0 │
  215. │ model/lstm_bias_initializer │
  216. │ model/lstm_bias_initializer_config │
  217. │ model/lstm_cell_size 256 │
  218. │ model/lstm_use_prev_action False │
  219. │ model/lstm_use_prev_action_reward -1 │
  220. │ model/lstm_use_prev_reward False │
  221. │ model/lstm_weights_initializer │
  222. │ model/lstm_weights_initializer_config │
  223. │ model/max_seq_len 20 │
  224. │ model/no_final_linear False │
  225. │ model/post_fcnet_activation relu │
  226. │ model/post_fcnet_bias_initializer │
  227. │ model/post_fcnet_bias_initializer_config │
  228. │ model/post_fcnet_hiddens [] │
  229. │ model/post_fcnet_weights_initializer │
  230. │ model/post_fcnet_weights_initializer_config │
  231. │ model/use_attention False │
  232. │ model/use_lstm False │
  233. │ model/vf_share_layers False │
  234. │ model/zero_mean True │
  235. │ normalize_actions True │
  236. │ num_aggregator_actors_per_learner 0 │
  237. │ num_consecutive_env_runner_failures_tolerance 100 │
  238. │ num_cpus_for_main_process 1 │
  239. │ num_cpus_per_env_runner 1 │
  240. │ num_cpus_per_learner auto │
  241. │ num_cpus_per_offline_eval_runner 1 │
  242. │ num_env_runners 2 │
  243. │ num_envs_per_env_runner 1 │
  244. │ num_epochs 30 │
  245. │ num_gpus 0 │
  246. │ num_gpus_per_env_runner 0 │
  247. │ num_gpus_per_learner 1 │
  248. │ num_gpus_per_offline_eval_runner 0 │
  249. │ num_learners 1 │
  250. │ num_offline_eval_runners 0 │
  251. │ observation_filter NoFilter │
  252. │ observation_fn │
  253. │ observation_space │
  254. │ offline_data_class │
  255. │ offline_eval_batch_size_per_runner 256 │
  256. │ offline_eval_rl_module_inference_only False │
  257. │ offline_eval_runner_health_probe_timeout_s 30. │
  258. │ offline_eval_runner_restore_timeout_s 1800. │
  259. │ offline_evaluation_duration 1 │
  260. │ offline_evaluation_interval │
  261. │ offline_evaluation_parallel_to_training False │
  262. │ offline_evaluation_timeout_s 120. │
  263. │ offline_loss_for_module_fn │
  264. │ offline_sampling False │
  265. │ ope_split_batch_by_episode True │
  266. │ output │
  267. │ output_compress_columns ['obs', 'new_obs'] │
  268. │ output_filesystem │
  269. │ output_max_file_size 67108864 │
  270. │ output_max_rows_per_file │
  271. │ output_write_episodes True │
  272. │ output_write_method write_parquet │
  273. │ output_write_remaining_data False │
  274. │ placement_strategy PACK │
  275. │ policies/default_policy ...None, None, None) │
  276. │ policies_to_train │
  277. │ policy_map_cache -1 │
  278. │ policy_map_capacity 100 │
  279. │ policy_mapping_fn ...t 0x7f197f9e8670> │
  280. │ policy_states_are_swappable False │
  281. │ postprocess_inputs False │
  282. │ prelearner_buffer_class │
  283. │ prelearner_class │
  284. │ prelearner_module_synch_period 10 │
  285. │ preprocessor_pref deepmind │
  286. │ remote_env_batch_wait_ms 0 │
  287. │ remote_worker_envs False │
  288. │ render_env False │
  289. │ replay_sequence_length │
  290. │ restart_failed_env_runners True │
  291. │ restart_failed_offline_eval_runners True │
  292. │ restart_failed_sub_environments False │
  293. │ rollout_fragment_length auto │
  294. │ sample_collector ...leListCollector'> │
  295. │ sample_timeout_s 60. │
  296. │ sampler_perf_stats_ema_coef │
  297. │ seed │
  298. │ sgd_minibatch_size -1 │
  299. │ shuffle_batch_per_epoch True │
  300. │ shuffle_buffer_size 0 │
  301. │ simple_optimizer -1 │
  302. │ sync_filters_on_rollout_workers_timeout_s 10. │
  303. │ synchronize_filters -1 │
  304. │ tf_session_args/allow_soft_placement True │
  305. │ tf_session_args/device_count/CPU 1 │
  306. │ tf_session_args/gpu_options/allow_growth True │
  307. │ tf_session_args/inter_op_parallelism_threads 2 │
  308. │ tf_session_args/intra_op_parallelism_threads 2 │
  309. │ tf_session_args/log_device_placement False │
  310. │ torch_compile_learner False │
  311. │ torch_compile_learner_dynamo_backend inductor │
  312. │ torch_compile_learner_dynamo_mode │
  313. │ torch_compile_learner_what_to_compile ...ile.FORWARD_TRAIN │
  314. │ torch_compile_worker False │
  315. │ torch_compile_worker_dynamo_backend onnxrt │
  316. │ torch_compile_worker_dynamo_mode │
  317. │ torch_skip_nan_gradients False │
  318. │ train_batch_size 4000 │
  319. │ update_worker_filter_stats True │
  320. │ use_critic True │
  321. │ use_gae True │
  322. │ use_kl_loss True │
  323. │ use_worker_filter_stats True │
  324. │ validate_env_runners_after_construction True │
  325. │ validate_offline_eval_runners_after_construction True │
  326. │ vf_clip_param 10. │
  327. │ vf_loss_coeff 1. │
  328. │ vf_share_layers -1 │
  329. │ worker_cls -1 │
  330. ╰───────────────────────────────────────────────────────────────────────────╯
  331. (PPO pid=4074735) Install gputil for GPU system monitoring.
  332. (_WrappedExecutable pid=4076168) 2025-07-31 09:09:24,500 WARNING deprecation.py:50 -- DeprecationWarning: `RLModule(config=[RLModuleConfig object])` has been deprecated. Use `RLModule(observation_space=.., action_space=.., inference_only=.., model_config=.., catalog_class=..)` instead. This will raise an error in the future! [repeated 7x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
  333.  
  334. Trial PPO_Pendulum-v1_f8541_00001 started with configuration:
  335. ╭───────────────────────────────────────────────────────────────────────────╮
  336. │ Trial PPO_Pendulum-v1_f8541_00001 config │
  337. ├───────────────────────────────────────────────────────────────────────────┤
  338. │ _disable_action_flattening False │
  339. │ _disable_execution_plan_api -1 │
  340. │ _disable_initialize_loss_from_dummy_batch False │
  341. │ _disable_preprocessor_api False │
  342. │ _dont_auto_sync_env_runner_states False │
  343. │ _enable_rl_module_api -1 │
  344. │ _env_to_module_connector │
  345. │ _fake_gpus False │
  346. │ _is_atari │
  347. │ _is_online True │
  348. │ _learner_class │
  349. │ _learner_connector │
  350. │ _module_to_env_connector │
  351. │ _prior_exploration_config/type StochasticSampling │
  352. │ _rl_module_spec │
  353. │ _tf_policy_handles_more_than_one_loss False │
  354. │ _torch_grad_scaler_class │
  355. │ _torch_lr_scheduler_classes │
  356. │ _train_batch_size_per_learner │
  357. │ _use_msgpack_checkpoints False │
  358. │ _validate_config True │
  359. │ action_mask_key action_mask │
  360. │ action_space │
  361. │ actions_in_input_normalized False │
  362. │ add_default_connectors_to_env_to_module_pipeline True │
  363. │ add_default_connectors_to_learner_pipeline True │
  364. │ add_default_connectors_to_module_to_env_pipeline True │
  365. │ always_attach_evaluation_results -1 │
  366. │ auto_wrap_old_gym_envs -1 │
  367. │ batch_mode complete_episodes │
  368. │ broadcast_env_runner_states True │
  369. │ broadcast_offline_eval_runner_states False │
  370. │ callbacks ...s.RLlibCallback'> │
  371. │ callbacks_on_algorithm_init │
  372. │ callbacks_on_checkpoint_loaded │
  373. │ callbacks_on_env_runners_recreated │
  374. │ callbacks_on_environment_created │
  375. │ callbacks_on_episode_created │
  376. │ callbacks_on_episode_end │
  377. │ callbacks_on_episode_start │
  378. │ callbacks_on_episode_step │
  379. │ callbacks_on_evaluate_end │
  380. │ callbacks_on_evaluate_offline_end │
  381. │ callbacks_on_evaluate_offline_start │
  382. │ callbacks_on_evaluate_start │
  383. │ callbacks_on_offline_eval_runners_recreated │
  384. │ callbacks_on_sample_end │
  385. │ callbacks_on_train_result │
  386. │ checkpoint_trainable_policies_only False │
  387. │ clip_actions False │
  388. │ clip_param 0.3 │
  389. │ clip_rewards │
  390. │ compress_observations False │
  391. │ count_steps_by env_steps │
  392. │ create_env_on_driver False │
  393. │ create_local_env_runner True │
  394. │ custom_async_evaluation_function -1 │
  395. │ custom_eval_function │
  396. │ dataset_num_iters_per_eval_runner 1 │
  397. │ dataset_num_iters_per_learner │
  398. │ delay_between_env_runner_restarts_s 60. │
  399. │ disable_env_checking False │
  400. │ eager_max_retraces 20 │
  401. │ eager_tracing True │
  402. │ enable_async_evaluation -1 │
  403. │ enable_connectors -1 │
  404. │ enable_env_runner_and_connector_v2 True │
  405. │ enable_rl_module_and_learner True │
  406. │ enable_tf1_exec_eagerly False │
  407. │ entropy_coeff 0. │
  408. │ entropy_coeff_schedule │
  409. │ env Pendulum-v1 │
  410. │ env_runner_cls │
  411. │ env_runner_health_probe_timeout_s 30. │
  412. │ env_runner_restore_timeout_s 1800. │
  413. │ env_task_fn -1 │
  414. │ episode_lookback_horizon 1 │
  415. │ episodes_to_numpy True │
  416. │ evaluation_auto_duration_max_env_steps_per_sample 2000 │
  417. │ evaluation_auto_duration_min_env_steps_per_sample 100 │
  418. │ evaluation_config │
  419. │ evaluation_duration 10 │
  420. │ evaluation_duration_unit episodes │
  421. │ evaluation_force_reset_envs_before_iteration True │
  422. │ evaluation_interval │
  423. │ evaluation_num_env_runners 0 │
  424. │ evaluation_parallel_to_training False │
  425. │ evaluation_sample_timeout_s 120. │
  426. │ explore True │
  427. │ export_native_model_files False │
  428. │ fake_sampler False │
  429. │ framework torch │
  430. │ gamma 0.99 │
  431. │ grad_clip │
  432. │ grad_clip_by global_norm │
  433. │ gym_env_vectorize_mode SYNC │
  434. │ ignore_env_runner_failures False │
  435. │ ignore_final_observation False │
  436. │ ignore_offline_eval_runner_failures False │
  437. │ in_evaluation False │
  438. │ input sampler │
  439. │ input_compress_columns ['obs', 'new_obs'] │
  440. │ input_filesystem │
  441. │ input_read_batch_size │
  442. │ input_read_episodes False │
  443. │ input_read_method read_parquet │
  444. │ input_read_sample_batches False │
  445. │ input_spaces_jsonable True │
  446. │ keep_per_episode_custom_metrics False │
  447. │ kl_coeff 0.2 │
  448. │ kl_target 0.01 │
  449. │ lambda 1. │
  450. │ local_gpu_idx 0 │
  451. │ local_tf_session_args/inter_op_parallelism_threads 8 │
  452. │ local_tf_session_args/intra_op_parallelism_threads 8 │
  453. │ log_gradients True │
  454. │ log_level WARN │
  455. │ log_sys_usage True │
  456. │ logger_config │
  457. │ logger_creator │
  458. │ lr 0.0001 │
  459. │ lr_schedule │
  460. │ materialize_data False │
  461. │ materialize_mapped_data True │
  462. │ max_num_env_runner_restarts 1000 │
  463. │ max_num_offline_eval_runner_restarts 1000 │
  464. │ max_requests_in_flight_per_aggregator_actor 3 │
  465. │ max_requests_in_flight_per_env_runner 1 │
  466. │ max_requests_in_flight_per_learner 3 │
  467. │ max_requests_in_flight_per_offline_eval_runner 1 │
  468. │ merge_env_runner_states training_only │
  469. │ metrics_episode_collection_timeout_s 60. │
  470. │ metrics_num_episodes_for_smoothing 100 │
  471. │ min_sample_timesteps_per_iteration 0 │
  472. │ min_time_s_per_iteration │
  473. │ min_train_timesteps_per_iteration 0 │
  474. │ minibatch_size 128 │
  475. │ model/_disable_action_flattening False │
  476. │ model/_disable_preprocessor_api False │
  477. │ model/_time_major False │
  478. │ model/_use_default_native_models -1 │
  479. │ model/always_check_shapes False │
  480. │ model/attention_dim 64 │
  481. │ model/attention_head_dim 32 │
  482. │ model/attention_init_gru_gate_bias 2.0 │
  483. │ model/attention_memory_inference 50 │
  484. │ model/attention_memory_training 50 │
  485. │ model/attention_num_heads 1 │
  486. │ model/attention_num_transformer_units 1 │
  487. │ model/attention_position_wise_mlp_dim 32 │
  488. │ model/attention_use_n_prev_actions 0 │
  489. │ model/attention_use_n_prev_rewards 0 │
  490. │ model/conv_activation relu │
  491. │ model/conv_bias_initializer │
  492. │ model/conv_bias_initializer_config │
  493. │ model/conv_filters │
  494. │ model/conv_kernel_initializer │
  495. │ model/conv_kernel_initializer_config │
  496. │ model/conv_transpose_bias_initializer │
  497. │ model/conv_transpose_bias_initializer_config │
  498. │ model/conv_transpose_kernel_initializer │
  499. │ model/conv_transpose_kernel_initializer_config │
  500. │ model/custom_action_dist │
  501. │ model/custom_model │
  502. │ model/custom_preprocessor │
  503. │ model/dim 84 │
  504. │ model/encoder_latent_dim │
  505. │ model/fcnet_activation tanh │
  506. │ model/fcnet_bias_initializer │
  507. │ model/fcnet_bias_initializer_config │
  508. │ model/fcnet_hiddens [256, 256] │
  509. │ model/fcnet_weights_initializer │
  510. │ model/fcnet_weights_initializer_config │
  511. │ model/framestack True │
  512. │ model/free_log_std False │
  513. │ model/grayscale False │
  514. │ model/log_std_clip_param 20.0 │
  515. │ model/lstm_bias_initializer │
  516. │ model/lstm_bias_initializer_config │
  517. │ model/lstm_cell_size 256 │
  518. │ model/lstm_use_prev_action False │
  519. │ model/lstm_use_prev_action_reward -1 │
  520. │ model/lstm_use_prev_reward False │
  521. │ model/lstm_weights_initializer │
  522. │ model/lstm_weights_initializer_config │
  523. │ model/max_seq_len 20 │
  524. │ model/no_final_linear False │
  525. │ model/post_fcnet_activation relu │
  526. │ model/post_fcnet_bias_initializer │
  527. │ model/post_fcnet_bias_initializer_config │
  528. │ model/post_fcnet_hiddens [] │
  529. │ model/post_fcnet_weights_initializer │
  530. │ model/post_fcnet_weights_initializer_config │
  531. │ model/use_attention False │
  532. │ model/use_lstm False │
  533. │ model/vf_share_layers False │
  534. │ model/zero_mean True │
  535. │ normalize_actions True │
  536. │ num_aggregator_actors_per_learner 0 │
  537. │ num_consecutive_env_runner_failures_tolerance 100 │
  538. │ num_cpus_for_main_process 1 │
  539. │ num_cpus_per_env_runner 1 │
  540. │ num_cpus_per_learner auto │
  541. │ num_cpus_per_offline_eval_runner 1 │
  542. │ num_env_runners 2 │
  543. │ num_envs_per_env_runner 1 │
  544. │ num_epochs 30 │
  545. │ num_gpus 0 │
  546. │ num_gpus_per_env_runner 0 │
  547. │ num_gpus_per_learner 1 │
  548. │ num_gpus_per_offline_eval_runner 0 │
  549. │ num_learners 1 │
  550. │ num_offline_eval_runners 0 │
  551. │ observation_filter NoFilter │
  552. │ observation_fn │
  553. │ observation_space │
  554. │ offline_data_class │
  555. │ offline_eval_batch_size_per_runner 256 │
  556. │ offline_eval_rl_module_inference_only False │
  557. │ offline_eval_runner_health_probe_timeout_s 30. │
  558. │ offline_eval_runner_restore_timeout_s 1800. │
  559. │ offline_evaluation_duration 1 │
  560. │ offline_evaluation_interval │
  561. │ offline_evaluation_parallel_to_training False │
  562. │ offline_evaluation_timeout_s 120. │
  563. │ offline_loss_for_module_fn │
  564. │ offline_sampling False │
  565. │ ope_split_batch_by_episode True │
  566. │ output │
  567. │ output_compress_columns ['obs', 'new_obs'] │
  568. │ output_filesystem │
  569. │ output_max_file_size 67108864 │
  570. │ output_max_rows_per_file │
  571. │ output_write_episodes True │
  572. │ output_write_method write_parquet │
  573. │ output_write_remaining_data False │
  574. │ placement_strategy PACK │
  575. │ policies/default_policy ...None, None, None) │
  576. │ policies_to_train │
  577. │ policy_map_cache -1 │
  578. │ policy_map_capacity 100 │
  579. │ policy_mapping_fn ...t 0x7f197f9e8670> │
  580. │ policy_states_are_swappable False │
  581. │ postprocess_inputs False │
  582. │ prelearner_buffer_class │
  583. │ prelearner_class │
  584. │ prelearner_module_synch_period 10 │
  585. │ preprocessor_pref deepmind │
  586. │ remote_env_batch_wait_ms 0 │
  587. │ remote_worker_envs False │
  588. │ render_env False │
  589. │ replay_sequence_length │
  590. │ restart_failed_env_runners True │
  591. │ restart_failed_offline_eval_runners True │
  592. │ restart_failed_sub_environments False │
  593. │ rollout_fragment_length auto │
  594. │ sample_collector ...leListCollector'> │
  595. │ sample_timeout_s 60. │
  596. │ sampler_perf_stats_ema_coef │
  597. │ seed │
  598. │ sgd_minibatch_size -1 │
  599. │ shuffle_batch_per_epoch True │
  600. │ shuffle_buffer_size 0 │
  601. │ simple_optimizer -1 │
  602. │ sync_filters_on_rollout_workers_timeout_s 10. │
  603. │ synchronize_filters -1 │
  604. │ tf_session_args/allow_soft_placement True │
  605. │ tf_session_args/device_count/CPU 1 │
  606. │ tf_session_args/gpu_options/allow_growth True │
  607. │ tf_session_args/inter_op_parallelism_threads 2 │
  608. │ tf_session_args/intra_op_parallelism_threads 2 │
  609. │ tf_session_args/log_device_placement False │
  610. │ torch_compile_learner False │
  611. │ torch_compile_learner_dynamo_backend inductor │
  612. │ torch_compile_learner_dynamo_mode │
  613. │ torch_compile_learner_what_to_compile ...ile.FORWARD_TRAIN │
  614. │ torch_compile_worker False │
  615. │ torch_compile_worker_dynamo_backend onnxrt │
  616. │ torch_compile_worker_dynamo_mode │
  617. │ torch_skip_nan_gradients False │
  618. │ train_batch_size 4000 │
  619. │ update_worker_filter_stats True │
  620. │ use_critic True │
  621. │ use_gae True │
  622. │ use_kl_loss True │
  623. │ use_worker_filter_stats True │
  624. │ validate_env_runners_after_construction True │
  625. │ validate_offline_eval_runners_after_construction True │
  626. │ vf_clip_param 10. │
  627. │ vf_loss_coeff 1. │
  628. │ vf_share_layers -1 │
  629. │ worker_cls -1 │
  630. ╰───────────────────────────────────────────────────────────────────────────╯
  631.  
  632. Trial status: 2 RUNNING
  633. Current time: 2025-07-31 09:09:43. Total running time: 30s
  634. Logical resource usage: 6.0/256 CPUs, 2.0/2 GPUs (0.0/1.0 accelerator_type:G)
  635. ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
  636. │ Trial name status lr iter total time (s) ...lls_per_iteration ..._sampled_lifetime │
  637. ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  638. │ PPO_Pendulum-v1_f8541_00000 RUNNING 0.001 1 9.84908 1 22000 │
  639. │ PPO_Pendulum-v1_f8541_00001 RUNNING 0.0001 1 9.86235 1 22000 │
  640. ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
  641. 2025-07-31 09:10:12,806 ERROR tune_controller.py:1331 -- Trial task failed for trial PPO_Pendulum-v1_f8541_00000
  642. Traceback (most recent call last):
  643. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
  644. result = ray.get(future)
  645. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
  646. return fn(*args, **kwargs)
  647. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
  648. return func(*args, **kwargs)
  649. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/worker.py", line 2849, in get
  650. values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
  651. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/worker.py", line 937, in get_objects
  652. raise value.as_instanceof_cause()
  653. ray.exceptions.RayTaskError(RaySystemError): ray::PPO.save() (pid=4074735, ip=10.25.12.104, actor_id=0cd8458b8ccdf603e2e8bfa101000000, repr=PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False))
  654. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 486, in save
  655. checkpoint_dict_or_path = self.save_checkpoint(checkpoint_dir)
  656. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2690, in save_checkpoint
  657. self.save_to_path(
  658. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/utils/checkpoints.py", line 300, in save_to_path
  659. comp_state = self.get_state(components=comp_name)[comp_name]
  660. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2834, in get_state
  661. state[COMPONENT_LEARNER_GROUP] = self.learner_group.get_state(
  662. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/core/learner/learner_group.py", line 521, in get_state
  663. state[COMPONENT_LEARNER] = self._get_results(results)[0]
  664. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/core/learner/learner_group.py", line 672, in _get_results
  665. raise result_or_error
  666. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/utils/actor_manager.py", line 861, in _fetch_result
  667. result = ray.get(ready)
  668. ray.exceptions.RaySystemError: System error: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
  669. traceback: Traceback (most recent call last):
  670. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/storage.py", line 530, in _load_from_bytes
  671. return torch.load(io.BytesIO(b), weights_only=False)
  672. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1549, in load
  673. return _legacy_load(
  674. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1807, in _legacy_load
  675. result = unpickler.load()
  676. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1742, in persistent_load
  677. obj = restore_location(obj, location)
  678. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 698, in default_restore_location
  679. result = fn(storage, location)
  680. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 636, in _deserialize
  681. device = _validate_device(location, backend_name)
  682. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 605, in _validate_device
  683. raise RuntimeError(
  684. RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
  685.  
  686. Trial PPO_Pendulum-v1_f8541_00000 errored after 5 iterations at 2025-07-31 09:10:12. Total running time: 1min 0s
  687. Error file: /tmp/ray/session_2025-07-31_09-09-09_900802_4059124/artifacts/2025-07-31_09-09-12/PPO_2025-07-31_09-09-09/driver_artifacts/PPO_Pendulum-v1_f8541_00000_0_lr=0.0010_2025-07-31_09-09-12/error.txt
  688. ╭───────────────────────────────────────────────────────╮
  689. │ Trial PPO_Pendulum-v1_f8541_00000 result │
  690. ├───────────────────────────────────────────────────────┤
  691. │ env_runners/episode_len_mean 200 │
  692. │ env_runners/episode_return_mean -1276.32 │
  693. │ num_env_steps_sampled_lifetime 38000 │
  694. ╰───────────────────────────────────────────────────────╯
  695. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
  696. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) Traceback (most recent call last):
  697. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 458, in deserialize_objects
  698. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) obj = self._deserialize_object(data, metadata, object_ref)
  699. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 315, in _deserialize_object
  700. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) return self._deserialize_msgpack_data(data, metadata_fields)
  701. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 270, in _deserialize_msgpack_data
  702. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) python_objects = self._deserialize_pickle5_data(pickle5_data)
  703. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 258, in _deserialize_pickle5_data
  704. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) obj = pickle.loads(in_band, buffers=buffers)
  705. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/storage.py", line 530, in _load_from_bytes
  706. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) return torch.load(io.BytesIO(b), weights_only=False)
  707. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1549, in load
  708. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) return _legacy_load(
  709. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1807, in _legacy_load
  710. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) result = unpickler.load()
  711. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1742, in persistent_load
  712. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) obj = restore_location(obj, location)
  713. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 698, in default_restore_location
  714. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) result = fn(storage, location)
  715. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 636, in _deserialize
  716. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) device = _validate_device(location, backend_name)
  717. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 605, in _validate_device
  718. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) raise RuntimeError(
  719. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
  720. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) 2025-07-31 09:10:12,804 ERROR actor_manager.py:873 -- Ray error (System error: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
  721. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) traceback: Traceback (most recent call last):
  722. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 458, in deserialize_objects
  723. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) obj = self._deserialize_object(data, metadata, object_ref)
  724. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 315, in _deserialize_object
  725. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) return self._deserialize_msgpack_data(data, metadata_fields)
  726. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 270, in _deserialize_msgpack_data
  727. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) python_objects = self._deserialize_pickle5_data(pickle5_data)
  728. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 258, in _deserialize_pickle5_data
  729. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) obj = pickle.loads(in_band, buffers=buffers)
  730. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/storage.py", line 530, in _load_from_bytes
  731. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) return torch.load(io.BytesIO(b), weights_only=False)
  732. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1549, in load
  733. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) return _legacy_load(
  734. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1807, in _legacy_load
  735. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) result = unpickler.load()
  736. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1742, in persistent_load
  737. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) obj = restore_location(obj, location)
  738. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 698, in default_restore_location
  739. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) result = fn(storage, location)
  740. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 636, in _deserialize
  741. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) device = _validate_device(location, backend_name)
  742. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 605, in _validate_device
  743. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) raise RuntimeError(
  744. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
  745. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) ), taking actor 0 out of service.
  746. (_WrappedExecutable pid=4076168) Setting up process group for: env:// [rank=0, world_size=1]
  747. (PPO pid=4074734) Install gputil for GPU system monitoring.
  748.  
  749. Trial status: 1 ERROR | 1 RUNNING
  750. Current time: 2025-07-31 09:10:13. Total running time: 1min 0s
  751. Logical resource usage: 6.0/256 CPUs, 2.0/2 GPUs (0.0/1.0 accelerator_type:G)
  752. ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
  753. │ Trial name status lr iter total time (s) ...lls_per_iteration ..._sampled_lifetime │
  754. ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  755. │ PPO_Pendulum-v1_f8541_00001 RUNNING 0.0001 4 38.6533 1 34000 │
  756. │ PPO_Pendulum-v1_f8541_00000 ERROR 0.001 5 47.2436 1 38000 │
  757. ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
  758. 2025-07-31 09:10:13,711 ERROR tune_controller.py:1331 -- Trial task failed for trial PPO_Pendulum-v1_f8541_00001
  759. Traceback (most recent call last):
  760. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
  761. result = ray.get(future)
  762. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
  763. return fn(*args, **kwargs)
  764. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
  765. return func(*args, **kwargs)
  766. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/worker.py", line 2849, in get
  767. values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
  768. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/worker.py", line 937, in get_objects
  769. raise value.as_instanceof_cause()
  770. ray.exceptions.RayTaskError(RaySystemError): ray::PPO.save() (pid=4074734, ip=10.25.12.104, actor_id=1de47fa4d390bd07305272a501000000, repr=PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False))
  771. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 486, in save
  772. checkpoint_dict_or_path = self.save_checkpoint(checkpoint_dir)
  773. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2690, in save_checkpoint
  774. self.save_to_path(
  775. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/utils/checkpoints.py", line 300, in save_to_path
  776. comp_state = self.get_state(components=comp_name)[comp_name]
  777. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2834, in get_state
  778. state[COMPONENT_LEARNER_GROUP] = self.learner_group.get_state(
  779. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/core/learner/learner_group.py", line 521, in get_state
  780. state[COMPONENT_LEARNER] = self._get_results(results)[0]
  781. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/core/learner/learner_group.py", line 672, in _get_results
  782. raise result_or_error
  783. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/utils/actor_manager.py", line 861, in _fetch_result
  784. result = ray.get(ready)
  785. ray.exceptions.RaySystemError: System error: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
  786. traceback: Traceback (most recent call last):
  787. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/storage.py", line 530, in _load_from_bytes
  788. return torch.load(io.BytesIO(b), weights_only=False)
  789. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1549, in load
  790. return _legacy_load(
  791. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1807, in _legacy_load
  792. result = unpickler.load()
  793. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1742, in persistent_load
  794. obj = restore_location(obj, location)
  795. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 698, in default_restore_location
  796. result = fn(storage, location)
  797. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 636, in _deserialize
  798. device = _validate_device(location, backend_name)
  799. File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 605, in _validate_device
  800. raise RuntimeError(
  801. RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
  802.  
  803. Trial PPO_Pendulum-v1_f8541_00001 errored after 5 iterations at 2025-07-31 09:10:13. Total running time: 1min 0s
  804. Error file: /tmp/ray/session_2025-07-31_09-09-09_900802_4059124/artifacts/2025-07-31_09-09-12/PPO_2025-07-31_09-09-09/driver_artifacts/PPO_Pendulum-v1_f8541_00001_1_lr=0.0001_2025-07-31_09-09-12/error.txt
  805. ╭───────────────────────────────────────────────────────╮
  806. │ Trial PPO_Pendulum-v1_f8541_00001 result │
  807. ├───────────────────────────────────────────────────────┤
  808. │ env_runners/episode_len_mean 200 │
  809. │ env_runners/episode_return_mean -1261.03 │
  810. │ num_env_steps_sampled_lifetime 38000 │
  811. ╰───────────────────────────────────────────────────────╯
  812. 2025-07-31 09:10:13,751 INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/data/home/fzy/ray_results/PPO_2025-07-31_09-09-09' in 0.0336s.
  813.  
  814. Trial status: 2 ERROR
  815. Current time: 2025-07-31 09:10:13. Total running time: 1min 1s
  816. Logical resource usage: 3.0/256 CPUs, 1.0/2 GPUs (0.0/1.0 accelerator_type:G)
  817. ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
  818. │ Trial name status lr iter total time (s) ...lls_per_iteration ..._sampled_lifetime │
  819. ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  820. │ PPO_Pendulum-v1_f8541_00000 ERROR 0.001 5 47.2436 1 38000 │
  821. │ PPO_Pendulum-v1_f8541_00001 ERROR 0.0001 5 47.9919 1 38000 │
  822. ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
  823.  
  824. Number of errored trials: 2
  825. ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
  826. │ Trial name # failures error file │
  827. ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  828. │ PPO_Pendulum-v1_f8541_00000 1 /tmp/ray/session_2025-07-31_09-09-09_900802_4059124/artifacts/2025-07-31_09-09-12/PPO_2025-07-31_09-09-09/driver_artifacts/PPO_Pendulum-v1_f8541_00000_0_lr=0.0010_2025-07-31_09-09-12/error.txt │
  829. │ PPO_Pendulum-v1_f8541_00001 1 /tmp/ray/session_2025-07-31_09-09-09_900802_4059124/artifacts/2025-07-31_09-09-12/PPO_2025-07-31_09-09-09/driver_artifacts/PPO_Pendulum-v1_f8541_00001_1_lr=0.0001_2025-07-31_09-09-12/error.txt │
  830. ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
  831. (_WrappedExecutable pid=4076167) [rank0]:[W731 09:10:14.248110984 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
  832.  
  833. 2025-07-31 09:10:14,165 ERROR tune.py:1037 -- Trials did not complete: [PPO_Pendulum-v1_f8541_00000, PPO_Pendulum-v1_f8541_00001]
  834. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
  835. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) Traceback (most recent call last):
  836. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 458, in deserialize_objects [repeated 2x across cluster]
  837. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) obj = self._deserialize_object(data, metadata, object_ref) [repeated 2x across cluster]
  838. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 315, in _deserialize_object [repeated 2x across cluster]
  839. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) return self._deserialize_msgpack_data(data, metadata_fields) [repeated 2x across cluster]
  840. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 270, in _deserialize_msgpack_data [repeated 2x across cluster]
  841. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) python_objects = self._deserialize_pickle5_data(pickle5_data) [repeated 2x across cluster]
  842. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 258, in _deserialize_pickle5_data [repeated 2x across cluster]
  843. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) obj = pickle.loads(in_band, buffers=buffers) [repeated 2x across cluster]
  844. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/storage.py", line 530, in _load_from_bytes [repeated 2x across cluster]
  845. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) return torch.load(io.BytesIO(b), weights_only=False) [repeated 2x across cluster]
  846. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1549, in load [repeated 2x across cluster]
  847. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) return _legacy_load( [repeated 2x across cluster]
  848. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1807, in _legacy_load [repeated 2x across cluster]
  849. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) result = unpickler.load() [repeated 2x across cluster]
  850. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1742, in persistent_load [repeated 2x across cluster]
  851. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) obj = restore_location(obj, location) [repeated 2x across cluster]
  852. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 698, in default_restore_location [repeated 2x across cluster]
  853. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) result = fn(storage, location) [repeated 2x across cluster]
  854. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 636, in _deserialize [repeated 2x across cluster]
  855. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) device = _validate_device(location, backend_name) [repeated 2x across cluster]
  856. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 605, in _validate_device [repeated 2x across cluster]
  857. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) raise RuntimeError( [repeated 2x across cluster]
  858. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU. [repeated 2x across cluster]
  859. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) 2025-07-31 09:10:13,708 ERROR actor_manager.py:873 -- Ray error (System error: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
  860. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) traceback: Traceback (most recent call last):
  861. (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) ), taking actor 0 out of service.
  862. (rl) ~ %
Advertisement
Add Comment
Please, Sign In to add comment