Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- python test.py
- <removed>/envs/rllib/lib/python3.9/site-packages/ray/tune/impl/tuner_internal.py:144: RayDeprecationWarning: The `RunConfig` class should be imported from `ray.tune` when passing it to the Tuner. Please update your imports. See this issue for more context and migration options: https://github.com/ray-project/ray/issues/49454. Disable these warnings by setting the environment variable: RAY_TRAIN_ENABLE_V2_MIGRATION_WARNINGS=0
- _log_deprecation_warning(
- 2025-07-26 05:23:08,028 INFO worker.py:1917 -- Started a local Ray instance.
- 2025-07-26 05:23:08,671 INFO tune.py:253 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `Tuner(...)`.
- 2025-07-26 05:23:08,676 INFO tensorboardx.py:193 -- pip install "ray[tune]" to see TensorBoard files.
- 2025-07-26 05:23:08,676 WARNING callback.py:136 -- The TensorboardX logger cannot be instantiated because either TensorboardX or one of it's dependencies is not installed. Please make sure you have the latest version of TensorboardX installed: `pip install -U tensorboardx`
- ╭────────────────────────────────────────────────────────────╮
- │ Configuration for experiment PPO_2025-07-26_05-23-05 │
- ├────────────────────────────────────────────────────────────┤
- │ Search algorithm BasicVariantGenerator │
- │ Scheduler FIFOScheduler │
- │ Number of trials 2 │
- ╰────────────────────────────────────────────────────────────╯
- View detailed results here: <removed>
- 2025-07-26 05:23:08,687 WARNING algorithm_config.py:5033 -- You are running PPO on the new API stack! This is the new default behavior for this algorithm. If you don't want to use the new API stack, set `config.api_stack(enable_rl_module_and_learner=False,enable_env_runner_and_connector_v2=False)`. For a detailed migration guide, see here: https://docs.ray.io/en/master/rllib/new-api-stack-migration-guide.html
- Trial status: 2 PENDING
- Current time: 2025-07-26 05:23:08. Total running time: 0s
- Logical resource usage: 0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
- ╭─────────────────────────────────────────────────╮
- │ Trial name status lr │
- ├─────────────────────────────────────────────────┤
- │ PPO_Pendulum-v1_86789_00000 PENDING 0.001 │
- │ PPO_Pendulum-v1_86789_00001 PENDING 0.0001 │
- ╰─────────────────────────────────────────────────╯
- (PPO pid=1241) 2025-07-26 05:23:13,597 WARNING algorithm_config.py:5033 -- You are running PPO on the new API stack! This is the new default behavior for this algorithm. If you don't want to use the new API stack, set `config.api_stack(enable_rl_module_and_learner=False,enable_env_runner_and_connector_v2=False)`. For a detailed migration guide, see here: https://docs.ray.io/en/master/rllib/new-api-stack-migration-guide.html
- (SingleAgentEnvRunner pid=1337) 2025-07-26 05:23:18,240 WARNING deprecation.py:50 -- DeprecationWarning: `RLModule(config=[RLModuleConfig object])` has been deprecated. Use `RLModule(observation_space=.., action_space=.., inference_only=.., model_config=.., catalog_class=..)` instead. This will raise an error in the future!
- (_WrappedExecutable pid=1460) Setting up process group for: env:// [rank=0, world_size=1]
- (PPO pid=1241) 2025-07-26 05:23:20,142 WARNING deprecation.py:50 -- DeprecationWarning: `RLModule(config=[RLModuleConfig object])` has been deprecated. Use `RLModule(observation_space=.., action_space=.., inference_only=.., model_config=.., catalog_class=..)` instead. This will raise an error in the future! [repeated 2x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
- Trial PPO_Pendulum-v1_86789_00000 started with configuration:
- ╭───────────────────────────────────────────────────────────────────────────╮
- │ Trial PPO_Pendulum-v1_86789_00000 config │
- ├───────────────────────────────────────────────────────────────────────────┤
- │ _disable_action_flattening False │
- │ _disable_execution_plan_api -1 │
- │ _disable_initialize_loss_from_dummy_batch False │
- │ _disable_preprocessor_api False │
- │ _dont_auto_sync_env_runner_states False │
- │ _enable_rl_module_api -1 │
- │ _env_to_module_connector │
- │ _fake_gpus False │
- │ _is_atari │
- │ _is_online True │
- │ _learner_class │
- │ _learner_connector │
- │ _module_to_env_connector │
- │ _prior_exploration_config/type StochasticSampling │
- │ _rl_module_spec │
- │ _tf_policy_handles_more_than_one_loss False │
- │ _torch_grad_scaler_class │
- │ _torch_lr_scheduler_classes │
- │ _train_batch_size_per_learner │
- │ _use_msgpack_checkpoints False │
- │ _validate_config True │
- │ action_mask_key action_mask │
- │ action_space │
- │ actions_in_input_normalized False │
- │ add_default_connectors_to_env_to_module_pipeline True │
- │ add_default_connectors_to_learner_pipeline True │
- │ add_default_connectors_to_module_to_env_pipeline True │
- │ always_attach_evaluation_results -1 │
- │ auto_wrap_old_gym_envs -1 │
- │ batch_mode complete_episodes │
- │ broadcast_env_runner_states True │
- │ broadcast_offline_eval_runner_states False │
- │ callbacks ...s.RLlibCallback'> │
- │ callbacks_on_algorithm_init │
- │ callbacks_on_checkpoint_loaded │
- │ callbacks_on_env_runners_recreated │
- │ callbacks_on_environment_created │
- │ callbacks_on_episode_created │
- │ callbacks_on_episode_end │
- │ callbacks_on_episode_start │
- │ callbacks_on_episode_step │
- │ callbacks_on_evaluate_end │
- │ callbacks_on_evaluate_offline_end │
- │ callbacks_on_evaluate_offline_start │
- │ callbacks_on_evaluate_start │
- │ callbacks_on_offline_eval_runners_recreated │
- │ callbacks_on_sample_end │
- │ callbacks_on_train_result │
- │ checkpoint_trainable_policies_only False │
- │ clip_actions False │
- │ clip_param 0.3 │
- │ clip_rewards │
- │ compress_observations False │
- │ count_steps_by env_steps │
- │ create_env_on_driver False │
- │ create_local_env_runner True │
- │ custom_async_evaluation_function -1 │
- │ custom_eval_function │
- │ dataset_num_iters_per_eval_runner 1 │
- │ dataset_num_iters_per_learner │
- │ delay_between_env_runner_restarts_s 60. │
- │ disable_env_checking False │
- │ eager_max_retraces 20 │
- │ eager_tracing True │
- │ enable_async_evaluation -1 │
- │ enable_connectors -1 │
- │ enable_env_runner_and_connector_v2 True │
- │ enable_rl_module_and_learner True │
- │ enable_tf1_exec_eagerly False │
- │ entropy_coeff 0. │
- │ entropy_coeff_schedule │
- │ env Pendulum-v1 │
- │ env_runner_cls │
- │ env_runner_health_probe_timeout_s 30. │
- │ env_runner_restore_timeout_s 1800. │
- │ env_task_fn -1 │
- │ episode_lookback_horizon 1 │
- │ episodes_to_numpy True │
- │ evaluation_auto_duration_max_env_steps_per_sample 2000 │
- │ evaluation_auto_duration_min_env_steps_per_sample 100 │
- │ evaluation_config │
- │ evaluation_duration 10 │
- │ evaluation_duration_unit episodes │
- │ evaluation_force_reset_envs_before_iteration True │
- │ evaluation_interval │
- │ evaluation_num_env_runners 0 │
- │ evaluation_parallel_to_training False │
- │ evaluation_sample_timeout_s 120. │
- │ explore True │
- │ export_native_model_files False │
- │ fake_sampler False │
- │ framework torch │
- │ gamma 0.99 │
- │ grad_clip │
- │ grad_clip_by global_norm │
- │ gym_env_vectorize_mode SYNC │
- │ ignore_env_runner_failures False │
- │ ignore_final_observation False │
- │ ignore_offline_eval_runner_failures False │
- │ in_evaluation False │
- │ input sampler │
- │ input_compress_columns ['obs', 'new_obs'] │
- │ input_filesystem │
- │ input_read_batch_size │
- │ input_read_episodes False │
- │ input_read_method read_parquet │
- │ input_read_sample_batches False │
- │ input_spaces_jsonable True │
- │ keep_per_episode_custom_metrics False │
- │ kl_coeff 0.2 │
- │ kl_target 0.01 │
- │ lambda 1. │
- │ local_gpu_idx 0 │
- │ local_tf_session_args/inter_op_parallelism_threads 8 │
- │ local_tf_session_args/intra_op_parallelism_threads 8 │
- │ log_gradients True │
- │ log_level WARN │
- │ log_sys_usage True │
- │ logger_config │
- │ logger_creator │
- │ lr 0.001 │
- │ lr_schedule │
- │ materialize_data False │
- │ materialize_mapped_data True │
- │ max_num_env_runner_restarts 1000 │
- │ max_num_offline_eval_runner_restarts 1000 │
- │ max_requests_in_flight_per_aggregator_actor 3 │
- │ max_requests_in_flight_per_env_runner 1 │
- │ max_requests_in_flight_per_learner 3 │
- │ max_requests_in_flight_per_offline_eval_runner 1 │
- │ merge_env_runner_states training_only │
- │ metrics_episode_collection_timeout_s 60. │
- │ metrics_num_episodes_for_smoothing 100 │
- │ min_sample_timesteps_per_iteration 0 │
- │ min_time_s_per_iteration │
- │ min_train_timesteps_per_iteration 0 │
- │ minibatch_size 128 │
- │ model/_disable_action_flattening False │
- │ model/_disable_preprocessor_api False │
- │ model/_time_major False │
- │ model/_use_default_native_models -1 │
- │ model/always_check_shapes False │
- │ model/attention_dim 64 │
- │ model/attention_head_dim 32 │
- │ model/attention_init_gru_gate_bias 2.0 │
- │ model/attention_memory_inference 50 │
- │ model/attention_memory_training 50 │
- │ model/attention_num_heads 1 │
- │ model/attention_num_transformer_units 1 │
- │ model/attention_position_wise_mlp_dim 32 │
- │ model/attention_use_n_prev_actions 0 │
- │ model/attention_use_n_prev_rewards 0 │
- │ model/conv_activation relu │
- │ model/conv_bias_initializer │
- │ model/conv_bias_initializer_config │
- │ model/conv_filters │
- │ model/conv_kernel_initializer │
- │ model/conv_kernel_initializer_config │
- │ model/conv_transpose_bias_initializer │
- │ model/conv_transpose_bias_initializer_config │
- │ model/conv_transpose_kernel_initializer │
- │ model/conv_transpose_kernel_initializer_config │
- │ model/custom_action_dist │
- │ model/custom_model │
- │ model/custom_preprocessor │
- │ model/dim 84 │
- │ model/encoder_latent_dim │
- │ model/fcnet_activation tanh │
- │ model/fcnet_bias_initializer │
- │ model/fcnet_bias_initializer_config │
- │ model/fcnet_hiddens [256, 256] │
- │ model/fcnet_weights_initializer │
- │ model/fcnet_weights_initializer_config │
- │ model/framestack True │
- │ model/free_log_std False │
- │ model/grayscale False │
- │ model/log_std_clip_param 20.0 │
- │ model/lstm_bias_initializer │
- │ model/lstm_bias_initializer_config │
- │ model/lstm_cell_size 256 │
- │ model/lstm_use_prev_action False │
- │ model/lstm_use_prev_action_reward -1 │
- │ model/lstm_use_prev_reward False │
- │ model/lstm_weights_initializer │
- │ model/lstm_weights_initializer_config │
- │ model/max_seq_len 20 │
- │ model/no_final_linear False │
- │ model/post_fcnet_activation relu │
- │ model/post_fcnet_bias_initializer │
- │ model/post_fcnet_bias_initializer_config │
- │ model/post_fcnet_hiddens [] │
- │ model/post_fcnet_weights_initializer │
- │ model/post_fcnet_weights_initializer_config │
- │ model/use_attention False │
- │ model/use_lstm False │
- │ model/vf_share_layers False │
- │ model/zero_mean True │
- │ normalize_actions True │
- │ num_aggregator_actors_per_learner 0 │
- │ num_consecutive_env_runner_failures_tolerance 100 │
- │ num_cpus_for_main_process 1 │
- │ num_cpus_per_env_runner 1 │
- │ num_cpus_per_learner auto │
- │ num_cpus_per_offline_eval_runner 1 │
- │ num_env_runners 2 │
- │ num_envs_per_env_runner 1 │
- │ num_epochs 30 │
- │ num_gpus 0 │
- │ num_gpus_per_env_runner 0 │
- │ num_gpus_per_learner 1 │
- │ num_gpus_per_offline_eval_runner 0 │
- │ num_learners 1 │
- │ num_offline_eval_runners 0 │
- │ observation_filter NoFilter │
- │ observation_fn │
- │ observation_space │
- │ offline_data_class │
- │ offline_eval_batch_size_per_runner 256 │
- │ offline_eval_rl_module_inference_only False │
- │ offline_eval_runner_class │
- │ offline_eval_runner_health_probe_timeout_s 30. │
- │ offline_eval_runner_restore_timeout_s 1800. │
- │ offline_evaluation_duration 1 │
- │ offline_evaluation_interval │
- │ offline_evaluation_parallel_to_training False │
- │ offline_evaluation_timeout_s 120. │
- │ offline_evaluation_type │
- │ offline_loss_for_module_fn │
- │ offline_sampling False │
- │ ope_split_batch_by_episode True │
- │ output │
- │ output_compress_columns ['obs', 'new_obs'] │
- │ output_filesystem │
- │ output_max_file_size 67108864 │
- │ output_max_rows_per_file │
- │ output_write_episodes True │
- │ output_write_method write_parquet │
- │ output_write_remaining_data False │
- │ placement_strategy PACK │
- │ policies/default_policy ...None, None, None) │
- │ policies_to_train │
- │ policy_map_cache -1 │
- │ policy_map_capacity 100 │
- │ policy_mapping_fn ...t 0x7f4ac3745670> │
- │ policy_states_are_swappable False │
- │ postprocess_inputs False │
- │ prelearner_buffer_class │
- │ prelearner_class │
- │ prelearner_module_synch_period 10 │
- │ preprocessor_pref deepmind │
- │ remote_env_batch_wait_ms 0 │
- │ remote_worker_envs False │
- │ render_env False │
- │ replay_sequence_length │
- │ restart_failed_env_runners True │
- │ restart_failed_offline_eval_runners True │
- │ restart_failed_sub_environments False │
- │ rollout_fragment_length auto │
- │ sample_collector ...leListCollector'> │
- │ sample_timeout_s 60. │
- │ sampler_perf_stats_ema_coef │
- │ seed │
- │ sgd_minibatch_size -1 │
- │ shuffle_batch_per_epoch True │
- │ shuffle_buffer_size 0 │
- │ simple_optimizer -1 │
- │ sync_filters_on_rollout_workers_timeout_s 10. │
- │ synchronize_filters -1 │
- │ tf_session_args/allow_soft_placement True │
- │ tf_session_args/device_count/CPU 1 │
- │ tf_session_args/gpu_options/allow_growth True │
- │ tf_session_args/inter_op_parallelism_threads 2 │
- │ tf_session_args/intra_op_parallelism_threads 2 │
- │ tf_session_args/log_device_placement False │
- │ torch_compile_learner False │
- │ torch_compile_learner_dynamo_backend inductor │
- │ torch_compile_learner_dynamo_mode │
- │ torch_compile_learner_what_to_compile ...ile.FORWARD_TRAIN │
- │ torch_compile_worker False │
- │ torch_compile_worker_dynamo_backend onnxrt │
- │ torch_compile_worker_dynamo_mode │
- │ torch_skip_nan_gradients False │
- │ train_batch_size 4000 │
- │ update_worker_filter_stats True │
- │ use_critic True │
- │ use_gae True │
- │ use_kl_loss True │
- │ use_worker_filter_stats True │
- │ validate_env_runners_after_construction True │
- │ validate_offline_eval_runners_after_construction True │
- │ vf_clip_param 10. │
- │ vf_loss_coeff 1. │
- │ vf_share_layers -1 │
- │ worker_cls -1 │
- ╰───────────────────────────────────────────────────────────────────────────╯
- (PPO pid=1241) Trainable.setup took 17.221 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
- (PPO pid=1241) Install gputil for GPU system monitoring.
- (_WrappedExecutable pid=1460) 2025-07-26 05:23:28,733 WARNING deprecation.py:50 -- DeprecationWarning: `RLModule(config=[RLModuleConfig object])` has been deprecated. Use `RLModule(observation_space=.., action_space=.., inference_only=.., model_config=.., catalog_class=..)` instead. This will raise an error in the future!
- Trial status: 1 RUNNING | 1 PENDING
- Current time: 2025-07-26 05:23:38. Total running time: 30s
- Logical resource usage: 3.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:G)
- ╭─────────────────────────────────────────────────╮
- │ Trial name status lr │
- ├─────────────────────────────────────────────────┤
- │ PPO_Pendulum-v1_86789_00000 RUNNING 0.001 │
- │ PPO_Pendulum-v1_86789_00001 PENDING 0.0001 │
- ╰─────────────────────────────────────────────────╯
- Trial status: 1 RUNNING | 1 PENDING
- Current time: 2025-07-26 05:24:08. Total running time: 1min 0s
- Logical resource usage: 3.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:G)
- ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
- │ Trial name status lr iter total time (s) ...lls_per_iteration ..._sampled_lifetime │
- ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
- │ PPO_Pendulum-v1_86789_00000 RUNNING 0.001 4 37.7262 1 34000 │
- │ PPO_Pendulum-v1_86789_00001 PENDING 0.0001 │
- ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
- Trial PPO_Pendulum-v1_86789_00000 completed after 5 iterations at 2025-07-26 05:24:17. Total running time: 1min 8s
- ╭───────────────────────────────────────────────────────╮
- │ Trial PPO_Pendulum-v1_86789_00000 result │
- ├───────────────────────────────────────────────────────┤
- │ env_runners/episode_len_mean 200 │
- │ env_runners/episode_return_mean -1036.44 │
- │ num_env_steps_sampled_lifetime 38000 │
- ╰───────────────────────────────────────────────────────╯
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=1241) Checkpoint successfully created at: Checkpoint(filesystem=local, path=<removed>)
- (_WrappedExecutable pid=1460) [rank0]:[W726 05:24:18.981079025 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
- (PPO pid=1557) 2025-07-26 05:24:23,947 WARNING algorithm_config.py:5033 -- You are running PPO on the new API stack! This is the new default behavior for this algorithm. If you don't want to use the new API stack, set `config.api_stack(enable_rl_module_and_learner=False,enable_env_runner_and_connector_v2=False)`. For a detailed migration guide, see here: https://docs.ray.io/en/master/rllib/new-api-stack-migration-guide.html
- (SingleAgentEnvRunner pid=1620) 2025-07-26 05:24:28,905 WARNING deprecation.py:50 -- DeprecationWarning: `RLModule(config=[RLModuleConfig object])` has been deprecated. Use `RLModule(observation_space=.., action_space=.., inference_only=.., model_config=.., catalog_class=..)` instead. This will raise an error in the future!
- (_WrappedExecutable pid=1744) Setting up process group for: env:// [rank=0, world_size=1]
- (PPO pid=1557) 2025-07-26 05:24:30,851 WARNING deprecation.py:50 -- DeprecationWarning: `RLModule(config=[RLModuleConfig object])` has been deprecated. Use `RLModule(observation_space=.., action_space=.., inference_only=.., model_config=.., catalog_class=..)` instead. This will raise an error in the future! [repeated 2x across cluster]
- Trial status: 1 TERMINATED | 1 PENDING
- Current time: 2025-07-26 05:24:39. Total running time: 1min 30s
- Logical resource usage: 3.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:G)
- ╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
- │ Trial name status lr iter total time (s) ...lls_per_iteration ..._sampled_lifetime │
- ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
- │ PPO_Pendulum-v1_86789_00000 TERMINATED 0.001 5 46.425 1 38000 │
- │ PPO_Pendulum-v1_86789_00001 PENDING 0.0001 │
- ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
- (PPO pid=1557) Trainable.setup took 17.251 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
- (PPO pid=1557) Install gputil for GPU system monitoring.
- Trial PPO_Pendulum-v1_86789_00001 started with configuration:
- ╭───────────────────────────────────────────────────────────────────────────╮
- │ Trial PPO_Pendulum-v1_86789_00001 config │
- ├───────────────────────────────────────────────────────────────────────────┤
- │ _disable_action_flattening False │
- │ _disable_execution_plan_api -1 │
- │ _disable_initialize_loss_from_dummy_batch False │
- │ _disable_preprocessor_api False │
- │ _dont_auto_sync_env_runner_states False │
- │ _enable_rl_module_api -1 │
- │ _env_to_module_connector │
- │ _fake_gpus False │
- │ _is_atari │
- │ _is_online True │
- │ _learner_class │
- │ _learner_connector │
- │ _module_to_env_connector │
- │ _prior_exploration_config/type StochasticSampling │
- │ _rl_module_spec │
- │ _tf_policy_handles_more_than_one_loss False │
- │ _torch_grad_scaler_class │
- │ _torch_lr_scheduler_classes │
- │ _train_batch_size_per_learner │
- │ _use_msgpack_checkpoints False │
- │ _validate_config True │
- │ action_mask_key action_mask │
- │ action_space │
- │ actions_in_input_normalized False │
- │ add_default_connectors_to_env_to_module_pipeline True │
- │ add_default_connectors_to_learner_pipeline True │
- │ add_default_connectors_to_module_to_env_pipeline True │
- │ always_attach_evaluation_results -1 │
- │ auto_wrap_old_gym_envs -1 │
- │ batch_mode complete_episodes │
- │ broadcast_env_runner_states True │
- │ broadcast_offline_eval_runner_states False │
- │ callbacks ...s.RLlibCallback'> │
- │ callbacks_on_algorithm_init │
- │ callbacks_on_checkpoint_loaded │
- │ callbacks_on_env_runners_recreated │
- │ callbacks_on_environment_created │
- │ callbacks_on_episode_created │
- │ callbacks_on_episode_end │
- │ callbacks_on_episode_start │
- │ callbacks_on_episode_step │
- │ callbacks_on_evaluate_end │
- │ callbacks_on_evaluate_offline_end │
- │ callbacks_on_evaluate_offline_start │
- │ callbacks_on_evaluate_start │
- │ callbacks_on_offline_eval_runners_recreated │
- │ callbacks_on_sample_end │
- │ callbacks_on_train_result │
- │ checkpoint_trainable_policies_only False │
- │ clip_actions False │
- │ clip_param 0.3 │
- │ clip_rewards │
- │ compress_observations False │
- │ count_steps_by env_steps │
- │ create_env_on_driver False │
- │ create_local_env_runner True │
- │ custom_async_evaluation_function -1 │
- │ custom_eval_function │
- │ dataset_num_iters_per_eval_runner 1 │
- │ dataset_num_iters_per_learner │
- │ delay_between_env_runner_restarts_s 60. │
- │ disable_env_checking False │
- │ eager_max_retraces 20 │
- │ eager_tracing True │
- │ enable_async_evaluation -1 │
- │ enable_connectors -1 │
- │ enable_env_runner_and_connector_v2 True │
- │ enable_rl_module_and_learner True │
- │ enable_tf1_exec_eagerly False │
- │ entropy_coeff 0. │
- │ entropy_coeff_schedule │
- │ env Pendulum-v1 │
- │ env_runner_cls │
- │ env_runner_health_probe_timeout_s 30. │
- │ env_runner_restore_timeout_s 1800. │
- │ env_task_fn -1 │
- │ episode_lookback_horizon 1 │
- │ episodes_to_numpy True │
- │ evaluation_auto_duration_max_env_steps_per_sample 2000 │
- │ evaluation_auto_duration_min_env_steps_per_sample 100 │
- │ evaluation_config │
- │ evaluation_duration 10 │
- │ evaluation_duration_unit episodes │
- │ evaluation_force_reset_envs_before_iteration True │
- │ evaluation_interval │
- │ evaluation_num_env_runners 0 │
- │ evaluation_parallel_to_training False │
- │ evaluation_sample_timeout_s 120. │
- │ explore True │
- │ export_native_model_files False │
- │ fake_sampler False │
- │ framework torch │
- │ gamma 0.99 │
- │ grad_clip │
- │ grad_clip_by global_norm │
- │ gym_env_vectorize_mode SYNC │
- │ ignore_env_runner_failures False │
- │ ignore_final_observation False │
- │ ignore_offline_eval_runner_failures False │
- │ in_evaluation False │
- │ input sampler │
- │ input_compress_columns ['obs', 'new_obs'] │
- │ input_filesystem │
- │ input_read_batch_size │
- │ input_read_episodes False │
- │ input_read_method read_parquet │
- │ input_read_sample_batches False │
- │ input_spaces_jsonable True │
- │ keep_per_episode_custom_metrics False │
- │ kl_coeff 0.2 │
- │ kl_target 0.01 │
- │ lambda 1. │
- │ local_gpu_idx 0 │
- │ local_tf_session_args/inter_op_parallelism_threads 8 │
- │ local_tf_session_args/intra_op_parallelism_threads 8 │
- │ log_gradients True │
- │ log_level WARN │
- │ log_sys_usage True │
- │ logger_config │
- │ logger_creator │
- │ lr 0.0001 │
- │ lr_schedule │
- │ materialize_data False │
- │ materialize_mapped_data True │
- │ max_num_env_runner_restarts 1000 │
- │ max_num_offline_eval_runner_restarts 1000 │
- │ max_requests_in_flight_per_aggregator_actor 3 │
- │ max_requests_in_flight_per_env_runner 1 │
- │ max_requests_in_flight_per_learner 3 │
- │ max_requests_in_flight_per_offline_eval_runner 1 │
- │ merge_env_runner_states training_only │
- │ metrics_episode_collection_timeout_s 60. │
- │ metrics_num_episodes_for_smoothing 100 │
- │ min_sample_timesteps_per_iteration 0 │
- │ min_time_s_per_iteration │
- │ min_train_timesteps_per_iteration 0 │
- │ minibatch_size 128 │
- │ model/_disable_action_flattening False │
- │ model/_disable_preprocessor_api False │
- │ model/_time_major False │
- │ model/_use_default_native_models -1 │
- │ model/always_check_shapes False │
- │ model/attention_dim 64 │
- │ model/attention_head_dim 32 │
- │ model/attention_init_gru_gate_bias 2.0 │
- │ model/attention_memory_inference 50 │
- │ model/attention_memory_training 50 │
- │ model/attention_num_heads 1 │
- │ model/attention_num_transformer_units 1 │
- │ model/attention_position_wise_mlp_dim 32 │
- │ model/attention_use_n_prev_actions 0 │
- │ model/attention_use_n_prev_rewards 0 │
- │ model/conv_activation relu │
- │ model/conv_bias_initializer │
- │ model/conv_bias_initializer_config │
- │ model/conv_filters │
- │ model/conv_kernel_initializer │
- │ model/conv_kernel_initializer_config │
- │ model/conv_transpose_bias_initializer │
- │ model/conv_transpose_bias_initializer_config │
- │ model/conv_transpose_kernel_initializer │
- │ model/conv_transpose_kernel_initializer_config │
- │ model/custom_action_dist │
- │ model/custom_model │
- │ model/custom_preprocessor │
- │ model/dim 84 │
- │ model/encoder_latent_dim │
- │ model/fcnet_activation tanh │
- │ model/fcnet_bias_initializer │
- │ model/fcnet_bias_initializer_config │
- │ model/fcnet_hiddens [256, 256] │
- │ model/fcnet_weights_initializer │
- │ model/fcnet_weights_initializer_config │
- │ model/framestack True │
- │ model/free_log_std False │
- │ model/grayscale False │
- │ model/log_std_clip_param 20.0 │
- │ model/lstm_bias_initializer │
- │ model/lstm_bias_initializer_config │
- │ model/lstm_cell_size 256 │
- │ model/lstm_use_prev_action False │
- │ model/lstm_use_prev_action_reward -1 │
- │ model/lstm_use_prev_reward False │
- │ model/lstm_weights_initializer │
- │ model/lstm_weights_initializer_config │
- │ model/max_seq_len 20 │
- │ model/no_final_linear False │
- │ model/post_fcnet_activation relu │
- │ model/post_fcnet_bias_initializer │
- │ model/post_fcnet_bias_initializer_config │
- │ model/post_fcnet_hiddens [] │
- │ model/post_fcnet_weights_initializer │
- │ model/post_fcnet_weights_initializer_config │
- │ model/use_attention False │
- │ model/use_lstm False │
- │ model/vf_share_layers False │
- │ model/zero_mean True │
- │ normalize_actions True │
- │ num_aggregator_actors_per_learner 0 │
- │ num_consecutive_env_runner_failures_tolerance 100 │
- │ num_cpus_for_main_process 1 │
- │ num_cpus_per_env_runner 1 │
- │ num_cpus_per_learner auto │
- │ num_cpus_per_offline_eval_runner 1 │
- │ num_env_runners 2 │
- │ num_envs_per_env_runner 1 │
- │ num_epochs 30 │
- │ num_gpus 0 │
- │ num_gpus_per_env_runner 0 │
- │ num_gpus_per_learner 1 │
- │ num_gpus_per_offline_eval_runner 0 │
- │ num_learners 1 │
- │ num_offline_eval_runners 0 │
- │ observation_filter NoFilter │
- │ observation_fn │
- │ observation_space │
- │ offline_data_class │
- │ offline_eval_batch_size_per_runner 256 │
- │ offline_eval_rl_module_inference_only False │
- │ offline_eval_runner_class │
- │ offline_eval_runner_health_probe_timeout_s 30. │
- │ offline_eval_runner_restore_timeout_s 1800. │
- │ offline_evaluation_duration 1 │
- │ offline_evaluation_interval │
- │ offline_evaluation_parallel_to_training False │
- │ offline_evaluation_timeout_s 120. │
- │ offline_evaluation_type │
- │ offline_loss_for_module_fn │
- │ offline_sampling False │
- │ ope_split_batch_by_episode True │
- │ output │
- │ output_compress_columns ['obs', 'new_obs'] │
- │ output_filesystem │
- │ output_max_file_size 67108864 │
- │ output_max_rows_per_file │
- │ output_write_episodes True │
- │ output_write_method write_parquet │
- │ output_write_remaining_data False │
- │ placement_strategy PACK │
- │ policies/default_policy ...None, None, None) │
- │ policies_to_train │
- │ policy_map_cache -1 │
- │ policy_map_capacity 100 │
- │ policy_mapping_fn ...t 0x7f4ac3745670> │
- │ policy_states_are_swappable False │
- │ postprocess_inputs False │
- │ prelearner_buffer_class │
- │ prelearner_class │
- │ prelearner_module_synch_period 10 │
- │ preprocessor_pref deepmind │
- │ remote_env_batch_wait_ms 0 │
- │ remote_worker_envs False │
- │ render_env False │
- │ replay_sequence_length │
- │ restart_failed_env_runners True │
- │ restart_failed_offline_eval_runners True │
- │ restart_failed_sub_environments False │
- │ rollout_fragment_length auto │
- │ sample_collector ...leListCollector'> │
- │ sample_timeout_s 60. │
- │ sampler_perf_stats_ema_coef │
- │ seed │
- │ sgd_minibatch_size -1 │
- │ shuffle_batch_per_epoch True │
- │ shuffle_buffer_size 0 │
- │ simple_optimizer -1 │
- │ sync_filters_on_rollout_workers_timeout_s 10. │
- │ synchronize_filters -1 │
- │ tf_session_args/allow_soft_placement True │
- │ tf_session_args/device_count/CPU 1 │
- │ tf_session_args/gpu_options/allow_growth True │
- │ tf_session_args/inter_op_parallelism_threads 2 │
- │ tf_session_args/intra_op_parallelism_threads 2 │
- │ tf_session_args/log_device_placement False │
- │ torch_compile_learner False │
- │ torch_compile_learner_dynamo_backend inductor │
- │ torch_compile_learner_dynamo_mode │
- │ torch_compile_learner_what_to_compile ...ile.FORWARD_TRAIN │
- │ torch_compile_worker False │
- │ torch_compile_worker_dynamo_backend onnxrt │
- │ torch_compile_worker_dynamo_mode │
- │ torch_skip_nan_gradients False │
- │ train_batch_size 4000 │
- │ update_worker_filter_stats True │
- │ use_critic True │
- │ use_gae True │
- │ use_kl_loss True │
- │ use_worker_filter_stats True │
- │ validate_env_runners_after_construction True │
- │ validate_offline_eval_runners_after_construction True │
- │ vf_clip_param 10. │
- │ vf_loss_coeff 1. │
- │ vf_share_layers -1 │
- │ worker_cls -1 │
- ╰───────────────────────────────────────────────────────────────────────────╯
- Trial status: 1 TERMINATED | 1 RUNNING
- Current time: 2025-07-26 05:25:09. Total running time: 2min 0s
- Logical resource usage: 3.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:G)
- ╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
- │ Trial name status lr iter total time (s) ...lls_per_iteration ..._sampled_lifetime │
- ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
- │ PPO_Pendulum-v1_86789_00001 RUNNING 0.0001 2 19.0405 1 26000 │
- │ PPO_Pendulum-v1_86789_00000 TERMINATED 0.001 5 46.425 1 38000 │
- ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
- Trial PPO_Pendulum-v1_86789_00001 completed after 5 iterations at 2025-07-26 05:25:28. Total running time: 2min 19s
- ╭───────────────────────────────────────────────────────╮
- │ Trial PPO_Pendulum-v1_86789_00001 result │
- ├───────────────────────────────────────────────────────┤
- │ env_runners/episode_len_mean 200 │
- │ env_runners/episode_return_mean -1239.85 │
- │ num_env_steps_sampled_lifetime 38000 │
- ╰───────────────────────────────────────────────────────╯
- 2025-07-26 05:25:28,608 INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '<removed>' in 0.0109s.
- Trial status: 2 TERMINATED
- Current time: 2025-07-26 05:25:28. Total running time: 2min 19s
- Logical resource usage: 3.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:G)
- ╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
- │ Trial name status lr iter total time (s) ...lls_per_iteration ..._sampled_lifetime │
- ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
- │ PPO_Pendulum-v1_86789_00000 TERMINATED 0.001 5 46.425 1 38000 │
- │ PPO_Pendulum-v1_86789_00001 TERMINATED 0.0001 5 47.2958 1 38000 │
- ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
- (PPO(env=Pendulum-v1; env-runners=2; learners=1; multi-agent=False) pid=1557) Checkpoint successfully created at: Checkpoint(filesystem=local, path=<removed>)
- (_WrappedExecutable pid=1744) 2025-07-26 05:24:39,662 WARNING deprecation.py:50 -- DeprecationWarning: `RLModule(config=[RLModuleConfig object])` has been deprecated. Use `RLModule(observation_space=.., action_space=.., inference_only=.., model_config=.., catalog_class=..)` instead. This will raise an error in the future!
- (_WrappedExecutable pid=1744) [rank0]:[W726 05:25:29.008109760 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Advertisement
Add Comment
Please, Sign In to add comment