Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- (rl) ~ % python test_ray.py
- /data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/tune/impl/tuner_internal.py:144: RayDeprecationWarning: The `RunConfig` class should be imported from `ray.tune` when passing it to the Tuner. Please update your imports. See this issue for more context and migration options: https://github.com/ray-project/ray/issues/49454. Disable these warnings by setting the environment variable: RAY_TRAIN_ENABLE_V2_MIGRATION_WARNINGS=0
- _log_deprecation_warning(
- 2025-07-31 09:09:11,056 INFO worker.py:1917 -- Started a local Ray instance.
- 2025-07-31 09:09:12,656 INFO tune.py:253 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `Tuner(...)`.
- 2025-07-31 09:09:12,721 WARNING tune_controller.py:2132 -- The maximum number of pending trials has been automatically set to the number of available cluster CPUs, which is high (281 CPUs/pending trials). If you're running an experiment with a large number of trials, this could lead to scheduling overhead. In this case, consider setting the `TUNE_MAX_PENDING_TRIALS_PG` environment variable to the desired maximum number of concurrent pending trials.
- 2025-07-31 09:09:12,723 WARNING tune_controller.py:2132 -- The maximum number of pending trials has been automatically set to the number of available cluster CPUs, which is high (281 CPUs/pending trials). If you're running an experiment with a large number of trials, this could lead to scheduling overhead. In this case, consider setting the `TUNE_MAX_PENDING_TRIALS_PG` environment variable to the desired maximum number of concurrent pending trials.
- ╭────────────────────────────────────────────────────────────╮
- │ Configuration for experiment PPO_2025-07-31_09-09-09 │
- ├────────────────────────────────────────────────────────────┤
- │ Search algorithm BasicVariantGenerator │
- │ Scheduler FIFOScheduler │
- │ Number of trials 2 │
- ╰────────────────────────────────────────────────────────────╯
- View detailed results here: /data/home/fzy/ray_results/PPO_2025-07-31_09-09-09
- To visualize your results with TensorBoard, run: `tensorboard --logdir /tmp/ray/session_2025-07-31_09-09-09_900802_4059124/artifacts/2025-07-31_09-09-12/PPO_2025-07-31_09-09-09/driver_artifacts`
- 2025-07-31 09:09:12,733 WARNING algorithm_config.py:5014 -- You are running PPO on the new API stack! This is the new default behavior for this algorithm. If you don't want to use the new API stack, set `config.api_stack(enable_rl_module_and_learner=False,enable_env_runner_and_connector_v2=False)`. For a detailed migration guide, see here: https://docs.ray.io/en/master/rllib/new-api-stack-migration-guide.html
- Trial status: 2 PENDING
- Current time: 2025-07-31 09:09:13. Total running time: 0s
- Logical resource usage: 0/256 CPUs, 0/2 GPUs (0.0/1.0 accelerator_type:G)
- ╭─────────────────────────────────────────────────╮
- │ Trial name status lr │
- ├─────────────────────────────────────────────────┤
- │ PPO_Pendulum-v1_f8541_00000 PENDING 0.001 │
- │ PPO_Pendulum-v1_f8541_00001 PENDING 0.0001 │
- ╰─────────────────────────────────────────────────╯
- (PPO pid=4074735) 2025-07-31 09:09:16,616 WARNING algorithm_config.py:5014 -- You are running PPO on the new API stack! This is the new default behavior for this algorithm. If you don't want to use the new API stack, set `config.api_stack(enable_rl_module_and_learner=False,enable_env_runner_and_connector_v2=False)`. For a detailed migration guide, see here: https://docs.ray.io/en/master/rllib/new-api-stack-migration-guide.html
- (SingleAgentEnvRunner pid=4075495) 2025-07-31 09:09:20,263 WARNING deprecation.py:50 -- DeprecationWarning: `RLModule(config=[RLModuleConfig object])` has been deprecated. Use `RLModule(observation_space=.., action_space=.., inference_only=.., model_config=.., catalog_class=..)` instead. This will raise an error in the future!
- (_WrappedExecutable pid=4076167) Setting up process group for: env:// [rank=0, world_size=1]
- (PPO pid=4074734) 2025-07-31 09:09:16,634 WARNING algorithm_config.py:5014 -- You are running PPO on the new API stack! This is the new default behavior for this algorithm. If you don't want to use the new API stack, set `config.api_stack(enable_rl_module_and_learner=False,enable_env_runner_and_connector_v2=False)`. For a detailed migration guide, see here: https://docs.ray.io/en/master/rllib/new-api-stack-migration-guide.html
- Trial PPO_Pendulum-v1_f8541_00000 started with configuration:
- ╭───────────────────────────────────────────────────────────────────────────╮
- │ Trial PPO_Pendulum-v1_f8541_00000 config │
- ├───────────────────────────────────────────────────────────────────────────┤
- │ _disable_action_flattening False │
- │ _disable_execution_plan_api -1 │
- │ _disable_initialize_loss_from_dummy_batch False │
- │ _disable_preprocessor_api False │
- │ _dont_auto_sync_env_runner_states False │
- │ _enable_rl_module_api -1 │
- │ _env_to_module_connector │
- │ _fake_gpus False │
- │ _is_atari │
- │ _is_online True │
- │ _learner_class │
- │ _learner_connector │
- │ _module_to_env_connector │
- │ _prior_exploration_config/type StochasticSampling │
- │ _rl_module_spec │
- │ _tf_policy_handles_more_than_one_loss False │
- │ _torch_grad_scaler_class │
- │ _torch_lr_scheduler_classes │
- │ _train_batch_size_per_learner │
- │ _use_msgpack_checkpoints False │
- │ _validate_config True │
- │ action_mask_key action_mask │
- │ action_space │
- │ actions_in_input_normalized False │
- │ add_default_connectors_to_env_to_module_pipeline True │
- │ add_default_connectors_to_learner_pipeline True │
- │ add_default_connectors_to_module_to_env_pipeline True │
- │ always_attach_evaluation_results -1 │
- │ auto_wrap_old_gym_envs -1 │
- │ batch_mode complete_episodes │
- │ broadcast_env_runner_states True │
- │ broadcast_offline_eval_runner_states False │
- │ callbacks ...s.RLlibCallback'> │
- │ callbacks_on_algorithm_init │
- │ callbacks_on_checkpoint_loaded │
- │ callbacks_on_env_runners_recreated │
- │ callbacks_on_environment_created │
- │ callbacks_on_episode_created │
- │ callbacks_on_episode_end │
- │ callbacks_on_episode_start │
- │ callbacks_on_episode_step │
- │ callbacks_on_evaluate_end │
- │ callbacks_on_evaluate_offline_end │
- │ callbacks_on_evaluate_offline_start │
- │ callbacks_on_evaluate_start │
- │ callbacks_on_offline_eval_runners_recreated │
- │ callbacks_on_sample_end │
- │ callbacks_on_train_result │
- │ checkpoint_trainable_policies_only False │
- │ clip_actions False │
- │ clip_param 0.3 │
- │ clip_rewards │
- │ compress_observations False │
- │ count_steps_by env_steps │
- │ create_env_on_driver False │
- │ create_local_env_runner True │
- │ custom_async_evaluation_function -1 │
- │ custom_eval_function │
- │ dataset_num_iters_per_eval_runner 1 │
- │ dataset_num_iters_per_learner │
- │ delay_between_env_runner_restarts_s 60. │
- │ disable_env_checking False │
- │ eager_max_retraces 20 │
- │ eager_tracing True │
- │ enable_async_evaluation -1 │
- │ enable_connectors -1 │
- │ enable_env_runner_and_connector_v2 True │
- │ enable_rl_module_and_learner True │
- │ enable_tf1_exec_eagerly False │
- │ entropy_coeff 0. │
- │ entropy_coeff_schedule │
- │ env Pendulum-v1 │
- │ env_runner_cls │
- │ env_runner_health_probe_timeout_s 30. │
- │ env_runner_restore_timeout_s 1800. │
- │ env_task_fn -1 │
- │ episode_lookback_horizon 1 │
- │ episodes_to_numpy True │
- │ evaluation_auto_duration_max_env_steps_per_sample 2000 │
- │ evaluation_auto_duration_min_env_steps_per_sample 100 │
- │ evaluation_config │
- │ evaluation_duration 10 │
- │ evaluation_duration_unit episodes │
- │ evaluation_force_reset_envs_before_iteration True │
- │ evaluation_interval │
- │ evaluation_num_env_runners 0 │
- │ evaluation_parallel_to_training False │
- │ evaluation_sample_timeout_s 120. │
- │ explore True │
- │ export_native_model_files False │
- │ fake_sampler False │
- │ framework torch │
- │ gamma 0.99 │
- │ grad_clip │
- │ grad_clip_by global_norm │
- │ gym_env_vectorize_mode SYNC │
- │ ignore_env_runner_failures False │
- │ ignore_final_observation False │
- │ ignore_offline_eval_runner_failures False │
- │ in_evaluation False │
- │ input sampler │
- │ input_compress_columns ['obs', 'new_obs'] │
- │ input_filesystem │
- │ input_read_batch_size │
- │ input_read_episodes False │
- │ input_read_method read_parquet │
- │ input_read_sample_batches False │
- │ input_spaces_jsonable True │
- │ keep_per_episode_custom_metrics False │
- │ kl_coeff 0.2 │
- │ kl_target 0.01 │
- │ lambda 1. │
- │ local_gpu_idx 0 │
- │ local_tf_session_args/inter_op_parallelism_threads 8 │
- │ local_tf_session_args/intra_op_parallelism_threads 8 │
- │ log_gradients True │
- │ log_level WARN │
- │ log_sys_usage True │
- │ logger_config │
- │ logger_creator │
- │ lr 0.001 │
- │ lr_schedule │
- │ materialize_data False │
- │ materialize_mapped_data True │
- │ max_num_env_runner_restarts 1000 │
- │ max_num_offline_eval_runner_restarts 1000 │
- │ max_requests_in_flight_per_aggregator_actor 3 │
- │ max_requests_in_flight_per_env_runner 1 │
- │ max_requests_in_flight_per_learner 3 │
- │ max_requests_in_flight_per_offline_eval_runner 1 │
- │ merge_env_runner_states training_only │
- │ metrics_episode_collection_timeout_s 60. │
- │ metrics_num_episodes_for_smoothing 100 │
- │ min_sample_timesteps_per_iteration 0 │
- │ min_time_s_per_iteration │
- │ min_train_timesteps_per_iteration 0 │
- │ minibatch_size 128 │
- │ model/_disable_action_flattening False │
- │ model/_disable_preprocessor_api False │
- │ model/_time_major False │
- │ model/_use_default_native_models -1 │
- │ model/always_check_shapes False │
- │ model/attention_dim 64 │
- │ model/attention_head_dim 32 │
- │ model/attention_init_gru_gate_bias 2.0 │
- │ model/attention_memory_inference 50 │
- │ model/attention_memory_training 50 │
- │ model/attention_num_heads 1 │
- │ model/attention_num_transformer_units 1 │
- │ model/attention_position_wise_mlp_dim 32 │
- │ model/attention_use_n_prev_actions 0 │
- │ model/attention_use_n_prev_rewards 0 │
- │ model/conv_activation relu │
- │ model/conv_bias_initializer │
- │ model/conv_bias_initializer_config │
- │ model/conv_filters │
- │ model/conv_kernel_initializer │
- │ model/conv_kernel_initializer_config │
- │ model/conv_transpose_bias_initializer │
- │ model/conv_transpose_bias_initializer_config │
- │ model/conv_transpose_kernel_initializer │
- │ model/conv_transpose_kernel_initializer_config │
- │ model/custom_action_dist │
- │ model/custom_model │
- │ model/custom_preprocessor │
- │ model/dim 84 │
- │ model/encoder_latent_dim │
- │ model/fcnet_activation tanh │
- │ model/fcnet_bias_initializer │
- │ model/fcnet_bias_initializer_config │
- │ model/fcnet_hiddens [256, 256] │
- │ model/fcnet_weights_initializer │
- │ model/fcnet_weights_initializer_config │
- │ model/framestack True │
- │ model/free_log_std False │
- │ model/grayscale False │
- │ model/log_std_clip_param 20.0 │
- │ model/lstm_bias_initializer │
- │ model/lstm_bias_initializer_config │
- │ model/lstm_cell_size 256 │
- │ model/lstm_use_prev_action False │
- │ model/lstm_use_prev_action_reward -1 │
- │ model/lstm_use_prev_reward False │
- │ model/lstm_weights_initializer │
- │ model/lstm_weights_initializer_config │
- │ model/max_seq_len 20 │
- │ model/no_final_linear False │
- │ model/post_fcnet_activation relu │
- │ model/post_fcnet_bias_initializer │
- │ model/post_fcnet_bias_initializer_config │
- │ model/post_fcnet_hiddens [] │
- │ model/post_fcnet_weights_initializer │
- │ model/post_fcnet_weights_initializer_config │
- │ model/use_attention False │
- │ model/use_lstm False │
- │ model/vf_share_layers False │
- │ model/zero_mean True │
- │ normalize_actions True │
- │ num_aggregator_actors_per_learner 0 │
- │ num_consecutive_env_runner_failures_tolerance 100 │
- │ num_cpus_for_main_process 1 │
- │ num_cpus_per_env_runner 1 │
- │ num_cpus_per_learner auto │
- │ num_cpus_per_offline_eval_runner 1 │
- │ num_env_runners 2 │
- │ num_envs_per_env_runner 1 │
- │ num_epochs 30 │
- │ num_gpus 0 │
- │ num_gpus_per_env_runner 0 │
- │ num_gpus_per_learner 1 │
- │ num_gpus_per_offline_eval_runner 0 │
- │ num_learners 1 │
- │ num_offline_eval_runners 0 │
- │ observation_filter NoFilter │
- │ observation_fn │
- │ observation_space │
- │ offline_data_class │
- │ offline_eval_batch_size_per_runner 256 │
- │ offline_eval_rl_module_inference_only False │
- │ offline_eval_runner_health_probe_timeout_s 30. │
- │ offline_eval_runner_restore_timeout_s 1800. │
- │ offline_evaluation_duration 1 │
- │ offline_evaluation_interval │
- │ offline_evaluation_parallel_to_training False │
- │ offline_evaluation_timeout_s 120. │
- │ offline_loss_for_module_fn │
- │ offline_sampling False │
- │ ope_split_batch_by_episode True │
- │ output │
- │ output_compress_columns ['obs', 'new_obs'] │
- │ output_filesystem │
- │ output_max_file_size 67108864 │
- │ output_max_rows_per_file │
- │ output_write_episodes True │
- │ output_write_method write_parquet │
- │ output_write_remaining_data False │
- │ placement_strategy PACK │
- │ policies/default_policy ...None, None, None) │
- │ policies_to_train │
- │ policy_map_cache -1 │
- │ policy_map_capacity 100 │
- │ policy_mapping_fn ...t 0x7f197f9e8670> │
- │ policy_states_are_swappable False │
- │ postprocess_inputs False │
- │ prelearner_buffer_class │
- │ prelearner_class │
- │ prelearner_module_synch_period 10 │
- │ preprocessor_pref deepmind │
- │ remote_env_batch_wait_ms 0 │
- │ remote_worker_envs False │
- │ render_env False │
- │ replay_sequence_length │
- │ restart_failed_env_runners True │
- │ restart_failed_offline_eval_runners True │
- │ restart_failed_sub_environments False │
- │ rollout_fragment_length auto │
- │ sample_collector ...leListCollector'> │
- │ sample_timeout_s 60. │
- │ sampler_perf_stats_ema_coef │
- │ seed │
- │ sgd_minibatch_size -1 │
- │ shuffle_batch_per_epoch True │
- │ shuffle_buffer_size 0 │
- │ simple_optimizer -1 │
- │ sync_filters_on_rollout_workers_timeout_s 10. │
- │ synchronize_filters -1 │
- │ tf_session_args/allow_soft_placement True │
- │ tf_session_args/device_count/CPU 1 │
- │ tf_session_args/gpu_options/allow_growth True │
- │ tf_session_args/inter_op_parallelism_threads 2 │
- │ tf_session_args/intra_op_parallelism_threads 2 │
- │ tf_session_args/log_device_placement False │
- │ torch_compile_learner False │
- │ torch_compile_learner_dynamo_backend inductor │
- │ torch_compile_learner_dynamo_mode │
- │ torch_compile_learner_what_to_compile ...ile.FORWARD_TRAIN │
- │ torch_compile_worker False │
- │ torch_compile_worker_dynamo_backend onnxrt │
- │ torch_compile_worker_dynamo_mode │
- │ torch_skip_nan_gradients False │
- │ train_batch_size 4000 │
- │ update_worker_filter_stats True │
- │ use_critic True │
- │ use_gae True │
- │ use_kl_loss True │
- │ use_worker_filter_stats True │
- │ validate_env_runners_after_construction True │
- │ validate_offline_eval_runners_after_construction True │
- │ vf_clip_param 10. │
- │ vf_loss_coeff 1. │
- │ vf_share_layers -1 │
- │ worker_cls -1 │
- ╰───────────────────────────────────────────────────────────────────────────╯
- (PPO pid=4074735) Install gputil for GPU system monitoring.
- (_WrappedExecutable pid=4076168) 2025-07-31 09:09:24,500 WARNING deprecation.py:50 -- DeprecationWarning: `RLModule(config=[RLModuleConfig object])` has been deprecated. Use `RLModule(observation_space=.., action_space=.., inference_only=.., model_config=.., catalog_class=..)` instead. This will raise an error in the future! [repeated 7x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
- Trial PPO_Pendulum-v1_f8541_00001 started with configuration:
- ╭───────────────────────────────────────────────────────────────────────────╮
- │ Trial PPO_Pendulum-v1_f8541_00001 config │
- ├───────────────────────────────────────────────────────────────────────────┤
- │ _disable_action_flattening False │
- │ _disable_execution_plan_api -1 │
- │ _disable_initialize_loss_from_dummy_batch False │
- │ _disable_preprocessor_api False │
- │ _dont_auto_sync_env_runner_states False │
- │ _enable_rl_module_api -1 │
- │ _env_to_module_connector │
- │ _fake_gpus False │
- │ _is_atari │
- │ _is_online True │
- │ _learner_class │
- │ _learner_connector │
- │ _module_to_env_connector │
- │ _prior_exploration_config/type StochasticSampling │
- │ _rl_module_spec │
- │ _tf_policy_handles_more_than_one_loss False │
- │ _torch_grad_scaler_class │
- │ _torch_lr_scheduler_classes │
- │ _train_batch_size_per_learner │
- │ _use_msgpack_checkpoints False │
- │ _validate_config True │
- │ action_mask_key action_mask │
- │ action_space │
- │ actions_in_input_normalized False │
- │ add_default_connectors_to_env_to_module_pipeline True │
- │ add_default_connectors_to_learner_pipeline True │
- │ add_default_connectors_to_module_to_env_pipeline True │
- │ always_attach_evaluation_results -1 │
- │ auto_wrap_old_gym_envs -1 │
- │ batch_mode complete_episodes │
- │ broadcast_env_runner_states True │
- │ broadcast_offline_eval_runner_states False │
- │ callbacks ...s.RLlibCallback'> │
- │ callbacks_on_algorithm_init │
- │ callbacks_on_checkpoint_loaded │
- │ callbacks_on_env_runners_recreated │
- │ callbacks_on_environment_created │
- │ callbacks_on_episode_created │
- │ callbacks_on_episode_end │
- │ callbacks_on_episode_start │
- │ callbacks_on_episode_step │
- │ callbacks_on_evaluate_end │
- │ callbacks_on_evaluate_offline_end │
- │ callbacks_on_evaluate_offline_start │
- │ callbacks_on_evaluate_start │
- │ callbacks_on_offline_eval_runners_recreated │
- │ callbacks_on_sample_end │
- │ callbacks_on_train_result │
- │ checkpoint_trainable_policies_only False │
- │ clip_actions False │
- │ clip_param 0.3 │
- │ clip_rewards │
- │ compress_observations False │
- │ count_steps_by env_steps │
- │ create_env_on_driver False │
- │ create_local_env_runner True │
- │ custom_async_evaluation_function -1 │
- │ custom_eval_function │
- │ dataset_num_iters_per_eval_runner 1 │
- │ dataset_num_iters_per_learner │
- │ delay_between_env_runner_restarts_s 60. │
- │ disable_env_checking False │
- │ eager_max_retraces 20 │
- │ eager_tracing True │
- │ enable_async_evaluation -1 │
- │ enable_connectors -1 │
- │ enable_env_runner_and_connector_v2 True │
- │ enable_rl_module_and_learner True │
- │ enable_tf1_exec_eagerly False │
- │ entropy_coeff 0. │
- │ entropy_coeff_schedule │
- │ env Pendulum-v1 │
- │ env_runner_cls │
- │ env_runner_health_probe_timeout_s 30. │
- │ env_runner_restore_timeout_s 1800. │
- │ env_task_fn -1 │
- │ episode_lookback_horizon 1 │
- │ episodes_to_numpy True │
- │ evaluation_auto_duration_max_env_steps_per_sample 2000 │
- │ evaluation_auto_duration_min_env_steps_per_sample 100 │
- │ evaluation_config │
- │ evaluation_duration 10 │
- │ evaluation_duration_unit episodes │
- │ evaluation_force_reset_envs_before_iteration True │
- │ evaluation_interval │
- │ evaluation_num_env_runners 0 │
- │ evaluation_parallel_to_training False │
- │ evaluation_sample_timeout_s 120. │
- │ explore True │
- │ export_native_model_files False │
- │ fake_sampler False │
- │ framework torch │
- │ gamma 0.99 │
- │ grad_clip │
- │ grad_clip_by global_norm │
- │ gym_env_vectorize_mode SYNC │
- │ ignore_env_runner_failures False │
- │ ignore_final_observation False │
- │ ignore_offline_eval_runner_failures False │
- │ in_evaluation False │
- │ input sampler │
- │ input_compress_columns ['obs', 'new_obs'] │
- │ input_filesystem │
- │ input_read_batch_size │
- │ input_read_episodes False │
- │ input_read_method read_parquet │
- │ input_read_sample_batches False │
- │ input_spaces_jsonable True │
- │ keep_per_episode_custom_metrics False │
- │ kl_coeff 0.2 │
- │ kl_target 0.01 │
- │ lambda 1. │
- │ local_gpu_idx 0 │
- │ local_tf_session_args/inter_op_parallelism_threads 8 │
- │ local_tf_session_args/intra_op_parallelism_threads 8 │
- │ log_gradients True │
- │ log_level WARN │
- │ log_sys_usage True │
- │ logger_config │
- │ logger_creator │
- │ lr 0.0001 │
- │ lr_schedule │
- │ materialize_data False │
- │ materialize_mapped_data True │
- │ max_num_env_runner_restarts 1000 │
- │ max_num_offline_eval_runner_restarts 1000 │
- │ max_requests_in_flight_per_aggregator_actor 3 │
- │ max_requests_in_flight_per_env_runner 1 │
- │ max_requests_in_flight_per_learner 3 │
- │ max_requests_in_flight_per_offline_eval_runner 1 │
- │ merge_env_runner_states training_only │
- │ metrics_episode_collection_timeout_s 60. │
- │ metrics_num_episodes_for_smoothing 100 │
- │ min_sample_timesteps_per_iteration 0 │
- │ min_time_s_per_iteration │
- │ min_train_timesteps_per_iteration 0 │
- │ minibatch_size 128 │
- │ model/_disable_action_flattening False │
- │ model/_disable_preprocessor_api False │
- │ model/_time_major False │
- │ model/_use_default_native_models -1 │
- │ model/always_check_shapes False │
- │ model/attention_dim 64 │
- │ model/attention_head_dim 32 │
- │ model/attention_init_gru_gate_bias 2.0 │
- │ model/attention_memory_inference 50 │
- │ model/attention_memory_training 50 │
- │ model/attention_num_heads 1 │
- │ model/attention_num_transformer_units 1 │
- │ model/attention_position_wise_mlp_dim 32 │
- │ model/attention_use_n_prev_actions 0 │
- │ model/attention_use_n_prev_rewards 0 │
- │ model/conv_activation relu │
- │ model/conv_bias_initializer │
- │ model/conv_bias_initializer_config │
- │ model/conv_filters │
- │ model/conv_kernel_initializer │
- │ model/conv_kernel_initializer_config │
- │ model/conv_transpose_bias_initializer │
- │ model/conv_transpose_bias_initializer_config │
- │ model/conv_transpose_kernel_initializer │
- │ model/conv_transpose_kernel_initializer_config │
- │ model/custom_action_dist │
- │ model/custom_model │
- │ model/custom_preprocessor │
- │ model/dim 84 │
- │ model/encoder_latent_dim │
- │ model/fcnet_activation tanh │
- │ model/fcnet_bias_initializer │
- │ model/fcnet_bias_initializer_config │
- │ model/fcnet_hiddens [256, 256] │
- │ model/fcnet_weights_initializer │
- │ model/fcnet_weights_initializer_config │
- │ model/framestack True │
- │ model/free_log_std False │
- │ model/grayscale False │
- │ model/log_std_clip_param 20.0 │
- │ model/lstm_bias_initializer │
- │ model/lstm_bias_initializer_config │
- │ model/lstm_cell_size 256 │
- │ model/lstm_use_prev_action False │
- │ model/lstm_use_prev_action_reward -1 │
- │ model/lstm_use_prev_reward False │
- │ model/lstm_weights_initializer │
- │ model/lstm_weights_initializer_config │
- │ model/max_seq_len 20 │
- │ model/no_final_linear False │
- │ model/post_fcnet_activation relu │
- │ model/post_fcnet_bias_initializer │
- │ model/post_fcnet_bias_initializer_config │
- │ model/post_fcnet_hiddens [] │
- │ model/post_fcnet_weights_initializer │
- │ model/post_fcnet_weights_initializer_config │
- │ model/use_attention False │
- │ model/use_lstm False │
- │ model/vf_share_layers False │
- │ model/zero_mean True │
- │ normalize_actions True │
- │ num_aggregator_actors_per_learner 0 │
- │ num_consecutive_env_runner_failures_tolerance 100 │
- │ num_cpus_for_main_process 1 │
- │ num_cpus_per_env_runner 1 │
- │ num_cpus_per_learner auto │
- │ num_cpus_per_offline_eval_runner 1 │
- │ num_env_runners 2 │
- │ num_envs_per_env_runner 1 │
- │ num_epochs 30 │
- │ num_gpus 0 │
- │ num_gpus_per_env_runner 0 │
- │ num_gpus_per_learner 1 │
- │ num_gpus_per_offline_eval_runner 0 │
- │ num_learners 1 │
- │ num_offline_eval_runners 0 │
- │ observation_filter NoFilter │
- │ observation_fn │
- │ observation_space │
- │ offline_data_class │
- │ offline_eval_batch_size_per_runner 256 │
- │ offline_eval_rl_module_inference_only False │
- │ offline_eval_runner_health_probe_timeout_s 30. │
- │ offline_eval_runner_restore_timeout_s 1800. │
- │ offline_evaluation_duration 1 │
- │ offline_evaluation_interval │
- │ offline_evaluation_parallel_to_training False │
- │ offline_evaluation_timeout_s 120. │
- │ offline_loss_for_module_fn │
- │ offline_sampling False │
- │ ope_split_batch_by_episode True │
- │ output │
- │ output_compress_columns ['obs', 'new_obs'] │
- │ output_filesystem │
- │ output_max_file_size 67108864 │
- │ output_max_rows_per_file │
- │ output_write_episodes True │
- │ output_write_method write_parquet │
- │ output_write_remaining_data False │
- │ placement_strategy PACK │
- │ policies/default_policy ...None, None, None) │
- │ policies_to_train │
- │ policy_map_cache -1 │
- │ policy_map_capacity 100 │
- │ policy_mapping_fn ...t 0x7f197f9e8670> │
- │ policy_states_are_swappable False │
- │ postprocess_inputs False │
- │ prelearner_buffer_class │
- │ prelearner_class │
- │ prelearner_module_synch_period 10 │
- │ preprocessor_pref deepmind │
- │ remote_env_batch_wait_ms 0 │
- │ remote_worker_envs False │
- │ render_env False │
- │ replay_sequence_length │
- │ restart_failed_env_runners True │
- │ restart_failed_offline_eval_runners True │
- │ restart_failed_sub_environments False │
- │ rollout_fragment_length auto │
- │ sample_collector ...leListCollector'> │
- │ sample_timeout_s 60. │
- │ sampler_perf_stats_ema_coef │
- │ seed │
- │ sgd_minibatch_size -1 │
- │ shuffle_batch_per_epoch True │
- │ shuffle_buffer_size 0 │
- │ simple_optimizer -1 │
- │ sync_filters_on_rollout_workers_timeout_s 10. │
- │ synchronize_filters -1 │
- │ tf_session_args/allow_soft_placement True │
- │ tf_session_args/device_count/CPU 1 │
- │ tf_session_args/gpu_options/allow_growth True │
- │ tf_session_args/inter_op_parallelism_threads 2 │
- │ tf_session_args/intra_op_parallelism_threads 2 │
- │ tf_session_args/log_device_placement False │
- │ torch_compile_learner False │
- │ torch_compile_learner_dynamo_backend inductor │
- │ torch_compile_learner_dynamo_mode │
- │ torch_compile_learner_what_to_compile ...ile.FORWARD_TRAIN │
- │ torch_compile_worker False │
- │ torch_compile_worker_dynamo_backend onnxrt │
- │ torch_compile_worker_dynamo_mode │
- │ torch_skip_nan_gradients False │
- │ train_batch_size 4000 │
- │ update_worker_filter_stats True │
- │ use_critic True │
- │ use_gae True │
- │ use_kl_loss True │
- │ use_worker_filter_stats True │
- │ validate_env_runners_after_construction True │
- │ validate_offline_eval_runners_after_construction True │
- │ vf_clip_param 10. │
- │ vf_loss_coeff 1. │
- │ vf_share_layers -1 │
- │ worker_cls -1 │
- ╰───────────────────────────────────────────────────────────────────────────╯
- Trial status: 2 RUNNING
- Current time: 2025-07-31 09:09:43. Total running time: 30s
- Logical resource usage: 6.0/256 CPUs, 2.0/2 GPUs (0.0/1.0 accelerator_type:G)
- ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
- │ Trial name status lr iter total time (s) ...lls_per_iteration ..._sampled_lifetime │
- ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
- │ PPO_Pendulum-v1_f8541_00000 RUNNING 0.001 1 9.84908 1 22000 │
- │ PPO_Pendulum-v1_f8541_00001 RUNNING 0.0001 1 9.86235 1 22000 │
- ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
- 2025-07-31 09:10:12,806 ERROR tune_controller.py:1331 -- Trial task failed for trial PPO_Pendulum-v1_f8541_00000
- Traceback (most recent call last):
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
- result = ray.get(future)
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
- return fn(*args, **kwargs)
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
- return func(*args, **kwargs)
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/worker.py", line 2849, in get
- values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/worker.py", line 937, in get_objects
- raise value.as_instanceof_cause()
- ray.exceptions.RayTaskError(RaySystemError): ray::PPO.save() (pid=4074735, ip=10.25.12.104, actor_id=0cd8458b8ccdf603e2e8bfa101000000, repr=PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False))
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 486, in save
- checkpoint_dict_or_path = self.save_checkpoint(checkpoint_dir)
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2690, in save_checkpoint
- self.save_to_path(
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/utils/checkpoints.py", line 300, in save_to_path
- comp_state = self.get_state(components=comp_name)[comp_name]
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2834, in get_state
- state[COMPONENT_LEARNER_GROUP] = self.learner_group.get_state(
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/core/learner/learner_group.py", line 521, in get_state
- state[COMPONENT_LEARNER] = self._get_results(results)[0]
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/core/learner/learner_group.py", line 672, in _get_results
- raise result_or_error
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/utils/actor_manager.py", line 861, in _fetch_result
- result = ray.get(ready)
- ray.exceptions.RaySystemError: System error: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
- traceback: Traceback (most recent call last):
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/storage.py", line 530, in _load_from_bytes
- return torch.load(io.BytesIO(b), weights_only=False)
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1549, in load
- return _legacy_load(
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1807, in _legacy_load
- result = unpickler.load()
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1742, in persistent_load
- obj = restore_location(obj, location)
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 698, in default_restore_location
- result = fn(storage, location)
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 636, in _deserialize
- device = _validate_device(location, backend_name)
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 605, in _validate_device
- raise RuntimeError(
- RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
- Trial PPO_Pendulum-v1_f8541_00000 errored after 5 iterations at 2025-07-31 09:10:12. Total running time: 1min 0s
- Error file: /tmp/ray/session_2025-07-31_09-09-09_900802_4059124/artifacts/2025-07-31_09-09-12/PPO_2025-07-31_09-09-09/driver_artifacts/PPO_Pendulum-v1_f8541_00000_0_lr=0.0010_2025-07-31_09-09-12/error.txt
- ╭───────────────────────────────────────────────────────╮
- │ Trial PPO_Pendulum-v1_f8541_00000 result │
- ├───────────────────────────────────────────────────────┤
- │ env_runners/episode_len_mean 200 │
- │ env_runners/episode_return_mean -1276.32 │
- │ num_env_steps_sampled_lifetime 38000 │
- ╰───────────────────────────────────────────────────────╯
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) Traceback (most recent call last):
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 458, in deserialize_objects
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) obj = self._deserialize_object(data, metadata, object_ref)
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 315, in _deserialize_object
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) return self._deserialize_msgpack_data(data, metadata_fields)
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 270, in _deserialize_msgpack_data
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) python_objects = self._deserialize_pickle5_data(pickle5_data)
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 258, in _deserialize_pickle5_data
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) obj = pickle.loads(in_band, buffers=buffers)
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/storage.py", line 530, in _load_from_bytes
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) return torch.load(io.BytesIO(b), weights_only=False)
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1549, in load
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) return _legacy_load(
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1807, in _legacy_load
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) result = unpickler.load()
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1742, in persistent_load
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) obj = restore_location(obj, location)
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 698, in default_restore_location
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) result = fn(storage, location)
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 636, in _deserialize
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) device = _validate_device(location, backend_name)
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 605, in _validate_device
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) raise RuntimeError(
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) 2025-07-31 09:10:12,804 ERROR actor_manager.py:873 -- Ray error (System error: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) traceback: Traceback (most recent call last):
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 458, in deserialize_objects
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) obj = self._deserialize_object(data, metadata, object_ref)
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 315, in _deserialize_object
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) return self._deserialize_msgpack_data(data, metadata_fields)
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 270, in _deserialize_msgpack_data
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) python_objects = self._deserialize_pickle5_data(pickle5_data)
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 258, in _deserialize_pickle5_data
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) obj = pickle.loads(in_band, buffers=buffers)
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/storage.py", line 530, in _load_from_bytes
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) return torch.load(io.BytesIO(b), weights_only=False)
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1549, in load
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) return _legacy_load(
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1807, in _legacy_load
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) result = unpickler.load()
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1742, in persistent_load
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) obj = restore_location(obj, location)
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 698, in default_restore_location
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) result = fn(storage, location)
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 636, in _deserialize
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) device = _validate_device(location, backend_name)
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 605, in _validate_device
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) raise RuntimeError(
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074735) ), taking actor 0 out of service.
- (_WrappedExecutable pid=4076168) Setting up process group for: env:// [rank=0, world_size=1]
- (PPO pid=4074734) Install gputil for GPU system monitoring.
- Trial status: 1 ERROR | 1 RUNNING
- Current time: 2025-07-31 09:10:13. Total running time: 1min 0s
- Logical resource usage: 6.0/256 CPUs, 2.0/2 GPUs (0.0/1.0 accelerator_type:G)
- ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
- │ Trial name status lr iter total time (s) ...lls_per_iteration ..._sampled_lifetime │
- ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
- │ PPO_Pendulum-v1_f8541_00001 RUNNING 0.0001 4 38.6533 1 34000 │
- │ PPO_Pendulum-v1_f8541_00000 ERROR 0.001 5 47.2436 1 38000 │
- ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
- 2025-07-31 09:10:13,711 ERROR tune_controller.py:1331 -- Trial task failed for trial PPO_Pendulum-v1_f8541_00001
- Traceback (most recent call last):
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
- result = ray.get(future)
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
- return fn(*args, **kwargs)
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
- return func(*args, **kwargs)
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/worker.py", line 2849, in get
- values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/worker.py", line 937, in get_objects
- raise value.as_instanceof_cause()
- ray.exceptions.RayTaskError(RaySystemError): ray::PPO.save() (pid=4074734, ip=10.25.12.104, actor_id=1de47fa4d390bd07305272a501000000, repr=PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False))
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 486, in save
- checkpoint_dict_or_path = self.save_checkpoint(checkpoint_dir)
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2690, in save_checkpoint
- self.save_to_path(
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/utils/checkpoints.py", line 300, in save_to_path
- comp_state = self.get_state(components=comp_name)[comp_name]
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2834, in get_state
- state[COMPONENT_LEARNER_GROUP] = self.learner_group.get_state(
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/core/learner/learner_group.py", line 521, in get_state
- state[COMPONENT_LEARNER] = self._get_results(results)[0]
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/core/learner/learner_group.py", line 672, in _get_results
- raise result_or_error
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/rllib/utils/actor_manager.py", line 861, in _fetch_result
- result = ray.get(ready)
- ray.exceptions.RaySystemError: System error: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
- traceback: Traceback (most recent call last):
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/storage.py", line 530, in _load_from_bytes
- return torch.load(io.BytesIO(b), weights_only=False)
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1549, in load
- return _legacy_load(
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1807, in _legacy_load
- result = unpickler.load()
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1742, in persistent_load
- obj = restore_location(obj, location)
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 698, in default_restore_location
- result = fn(storage, location)
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 636, in _deserialize
- device = _validate_device(location, backend_name)
- File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 605, in _validate_device
- raise RuntimeError(
- RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
- Trial PPO_Pendulum-v1_f8541_00001 errored after 5 iterations at 2025-07-31 09:10:13. Total running time: 1min 0s
- Error file: /tmp/ray/session_2025-07-31_09-09-09_900802_4059124/artifacts/2025-07-31_09-09-12/PPO_2025-07-31_09-09-09/driver_artifacts/PPO_Pendulum-v1_f8541_00001_1_lr=0.0001_2025-07-31_09-09-12/error.txt
- ╭───────────────────────────────────────────────────────╮
- │ Trial PPO_Pendulum-v1_f8541_00001 result │
- ├───────────────────────────────────────────────────────┤
- │ env_runners/episode_len_mean 200 │
- │ env_runners/episode_return_mean -1261.03 │
- │ num_env_steps_sampled_lifetime 38000 │
- ╰───────────────────────────────────────────────────────╯
- 2025-07-31 09:10:13,751 INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/data/home/fzy/ray_results/PPO_2025-07-31_09-09-09' in 0.0336s.
- Trial status: 2 ERROR
- Current time: 2025-07-31 09:10:13. Total running time: 1min 1s
- Logical resource usage: 3.0/256 CPUs, 1.0/2 GPUs (0.0/1.0 accelerator_type:G)
- ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
- │ Trial name status lr iter total time (s) ...lls_per_iteration ..._sampled_lifetime │
- ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
- │ PPO_Pendulum-v1_f8541_00000 ERROR 0.001 5 47.2436 1 38000 │
- │ PPO_Pendulum-v1_f8541_00001 ERROR 0.0001 5 47.9919 1 38000 │
- ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
- Number of errored trials: 2
- ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
- │ Trial name # failures error file │
- ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
- │ PPO_Pendulum-v1_f8541_00000 1 /tmp/ray/session_2025-07-31_09-09-09_900802_4059124/artifacts/2025-07-31_09-09-12/PPO_2025-07-31_09-09-09/driver_artifacts/PPO_Pendulum-v1_f8541_00000_0_lr=0.0010_2025-07-31_09-09-12/error.txt │
- │ PPO_Pendulum-v1_f8541_00001 1 /tmp/ray/session_2025-07-31_09-09-09_900802_4059124/artifacts/2025-07-31_09-09-12/PPO_2025-07-31_09-09-09/driver_artifacts/PPO_Pendulum-v1_f8541_00001_1_lr=0.0001_2025-07-31_09-09-12/error.txt │
- ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
- (_WrappedExecutable pid=4076167) [rank0]:[W731 09:10:14.248110984 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
- 2025-07-31 09:10:14,165 ERROR tune.py:1037 -- Trials did not complete: [PPO_Pendulum-v1_f8541_00000, PPO_Pendulum-v1_f8541_00001]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) Traceback (most recent call last):
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 458, in deserialize_objects [repeated 2x across cluster]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) obj = self._deserialize_object(data, metadata, object_ref) [repeated 2x across cluster]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 315, in _deserialize_object [repeated 2x across cluster]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) return self._deserialize_msgpack_data(data, metadata_fields) [repeated 2x across cluster]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 270, in _deserialize_msgpack_data [repeated 2x across cluster]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) python_objects = self._deserialize_pickle5_data(pickle5_data) [repeated 2x across cluster]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/ray/_private/serialization.py", line 258, in _deserialize_pickle5_data [repeated 2x across cluster]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) obj = pickle.loads(in_band, buffers=buffers) [repeated 2x across cluster]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/storage.py", line 530, in _load_from_bytes [repeated 2x across cluster]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) return torch.load(io.BytesIO(b), weights_only=False) [repeated 2x across cluster]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1549, in load [repeated 2x across cluster]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) return _legacy_load( [repeated 2x across cluster]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1807, in _legacy_load [repeated 2x across cluster]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) result = unpickler.load() [repeated 2x across cluster]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 1742, in persistent_load [repeated 2x across cluster]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) obj = restore_location(obj, location) [repeated 2x across cluster]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 698, in default_restore_location [repeated 2x across cluster]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) result = fn(storage, location) [repeated 2x across cluster]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 636, in _deserialize [repeated 2x across cluster]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) device = _validate_device(location, backend_name) [repeated 2x across cluster]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) File "/data/home/fzy/miniconda3/envs/rl/lib/python3.10/site-packages/torch/serialization.py", line 605, in _validate_device [repeated 2x across cluster]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) raise RuntimeError( [repeated 2x across cluster]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU. [repeated 2x across cluster]
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) 2025-07-31 09:10:13,708 ERROR actor_manager.py:873 -- Ray error (System error: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) traceback: Traceback (most recent call last):
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=4074734) ), taking actor 0 out of service.
- (rl) ~ %
Advertisement
Add Comment
Please, Sign In to add comment