Advertisement
pratkpranav

dashboard_agent.log on worker IP which failed

Oct 6th, 2022
56
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 10.24 KB | None | 0 0
  1. 2022-10-05 08:11:31,109 INFO runtime_env_agent.py:410 -- Runtime env already created successfully. Env: {"env_vars": {"OMP_NUM_THREADS": "4"}}, context: {"command_prefix": [], "env_vars": {"OMP_NUM_THREADS": "4"}, "py_executable": "/usr/bin/python3", "resources_dir": null, "container": {}, "java_jars": []}
  2. 2022-10-05 10:30:47,443 ERROR agent.py:217 -- Raylet is terminated: ip=172.29.58.148, id=d2a5c564a71c468ddacdab4a7f8e1c29c69b626acc3d5f1457731b28. Termination is unexpected. Possible reasons include: (1) SIGKILL by the user or system OOM killer, (2) Invalid memory access from Raylet causing SIGSEGV or SIGBUS, (3) Other termination signals. Last 20 lines of the Raylet logs:
  3. [2022-10-05 10:30:47,416 D 288188 288188] (raylet) dependency_manager.cc:113: Starting get request for worker 4217842478183aad4ee085e197151a2d5a9b0f012b048e9f297b8d76
  4. [2022-10-05 10:30:47,416 D 288188 288188] (raylet) dependency_manager.cc:119: Worker 4217842478183aad4ee085e197151a2d5a9b0f012b048e9f297b8d76 called ray.get on object 00418e198a2c7d9b631180897f8183ed75364a5f0100000003000000
  5. [2022-10-05 10:30:47,416 D 288188 288212] (raylet) protocol.cc:607: Sending object info, id: 00418e198a2c7d9b631180897f8183ed75364a5f0100000003000000 data_size: -1 metadata_size: 0
  6. [2022-10-05 10:30:47,416 D 288188 288188] (raylet) pull_manager.cc:65: Start pull request 3084. Bundle size: 1
  7. [2022-10-05 10:30:47,416 D 288188 288188] (raylet) pull_manager.cc:72: Pull of object 00418e198a2c7d9b631180897f8183ed75364a5f0100000003000000
  8. [2022-10-05 10:30:47,416 D 288188 288188] (raylet) core_worker_client_pool.cc:42: Connected to 172.29.58.134:10002
  9. [2022-10-05 10:30:47,416 D 288188 288188] (raylet) subscriber.cc:348: Make a long polling request to 6c3422accadcf97468841e990cd9e02bc588e79b769bbcfdf45e0627
  10. [2022-10-05 10:30:47,416 D 288188 288188] (raylet) dependency_manager.cc:142: Started pull for get request from worker 4217842478183aad4ee085e197151a2d5a9b0f012b048e9f297b8d76 request: 3084
  11. [2022-10-05 10:30:47,416 D 288188 288188] (raylet) node_manager.cc:1190: [Worker] Message FetchOrReconstruct(9) from worker with PID 288252
  12. [2022-10-05 10:30:47,416 D 288188 288188] (raylet) dependency_manager.cc:113: Starting get request for worker 4217842478183aad4ee085e197151a2d5a9b0f012b048e9f297b8d76
  13. [2022-10-05 10:30:47,418 D 288188 288188] (raylet) subscriber.cc:365: Long polling request has replied from 6c3422accadcf97468841e990cd9e02bc588e79b769bbcfdf45e0627
  14. [2022-10-05 10:30:47,418 D 288188 288188] (raylet) subscriber.cc:348: Make a long polling request to 6c3422accadcf97468841e990cd9e02bc588e79b769bbcfdf45e0627
  15. [2022-10-05 10:30:47,418 D 288188 288188] (raylet) ownership_based_object_directory.cc:282: Object 00418e198a2c7d9b631180897f8183ed75364a5f0100000003000000 is on node 9d7a7d67e5090ad628ceafd8f1b501795487b89712cf4acc15b51808 alive? 1
  16. [2022-10-05 10:30:47,418 D 288188 288188] (raylet) ownership_based_object_directory.cc:296: Pushing location updates to subscribers for object 00418e198a2c7d9b631180897f8183ed75364a5f0100000003000000: 1 locations, spilled_url: , spilled node ID: NIL_ID, object size: 58335926, lookup failed: 1
  17. [2022-10-05 10:30:47,418 D 288188 288188] (raylet) pull_manager.cc:150: Activating request 3084 num bytes being pulled: 0 num bytes available: 9990232473
  18. [2022-10-05 10:30:47,418 D 288188 288188] (raylet) pull_manager.cc:158: Activating pull for object 00418e198a2c7d9b631180897f8183ed75364a5f0100000003000000
  19. [2022-10-05 10:30:47,418 D 288188 288212] (raylet) protocol.cc:607: Sending object info, id: 00418e198a2c7d9b631180897f8183ed75364a5f0100000003000000 data_size: -1 metadata_size: 0
  20. [2022-10-05 10:30:47,418 D 288188 288188] (raylet) pull_manager.cc:533: Sending pull request from d2a5c564a71c468ddacdab4a7f8e1c29c69b626acc3d5f1457731b28 to in-memory location at 9d7a7d67e5090ad628ceafd8f1b501795487b89712cf4acc15b51808 of object 00418e198a2c7d9b631180897f8183ed75364a5f0100000003000000
  21. [2022-10-05 10:30:47,418 D 288188 288188] (raylet) pull_manager.cc:428: Updated location of 00418e198a2c7d9b631180897f8183ed75364a5f0100000003000000, num bytes being pulled is now 58335926
  22. [2022-10-05 10:30:47,418 D 288188 288188] (raylet) pull_manager.cc:432: 00418e198a2c7d9b631180897f8183ed75364a5f0100000003000000 OnLocationChange num clients 1
  23.  
  24. 2022-10-05 10:31:47,590 INFO agent.py:102 -- Parent pid is 326097
  25. 2022-10-05 10:31:47,591 INFO agent.py:128 -- Dashboard agent grpc address: 0.0.0.0:60691
  26. 2022-10-05 10:31:47,592 INFO utils.py:105 -- Get all modules by type: DashboardAgentModule
  27. 2022-10-05 10:31:48,269 INFO utils.py:138 -- Available modules: [<class 'ray.dashboard.modules.event.event_agent.EventAgent'>, <class 'ray.dashboard.modules.healthz.healthz_agent.HealthzAgent'>, <class 'ray.dashboard.modules.log.log_agent.LogAgent'>, <class 'ray.dashboard.modules.log.log_agent.LogAgentV1Grpc'>, <class 'ray.dashboard.modules.reporter.reporter_agent.ReporterAgent'>, <class 'ray.dashboard.modules.runtime_env.runtime_env_agent.RuntimeEnvAgent'>, <class 'ray.dashboard.modules.serve.serve_agent.ServeAgent'>]
  28. 2022-10-05 10:31:48,270 INFO agent.py:157 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.event.event_agent.EventAgent'>
  29. 2022-10-05 10:31:48,270 INFO event_agent.py:28 -- Event agent cache buffer size: 10240
  30. 2022-10-05 10:31:48,270 INFO agent.py:157 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.healthz.healthz_agent.HealthzAgent'>
  31. 2022-10-05 10:31:48,270 INFO agent.py:157 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.log.log_agent.LogAgent'>
  32. 2022-10-05 10:31:48,270 INFO agent.py:157 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.log.log_agent.LogAgentV1Grpc'>
  33. 2022-10-05 10:31:48,270 INFO agent.py:157 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.reporter.reporter_agent.ReporterAgent'>
  34. 2022-10-05 10:31:48,276 INFO agent.py:157 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.runtime_env.runtime_env_agent.RuntimeEnvAgent'>
  35. 2022-10-05 10:31:48,276 INFO agent.py:157 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.serve.serve_agent.ServeAgent'>
  36. 2022-10-05 10:31:48,276 INFO agent.py:162 -- Loaded 7 modules.
  37. 2022-10-05 10:31:48,280 INFO http_server_agent.py:71 -- Dashboard agent http address: 0.0.0.0:52365
  38. 2022-10-05 10:31:48,280 INFO http_server_agent.py:78 -- <ResourceRoute [GET] <PlainResource /api/local_raylet_healthz> -> <function HealthzAgent.health_check at 0x7f6b5edb9e50>
  39. 2022-10-05 10:31:48,280 INFO http_server_agent.py:78 -- <ResourceRoute [OPTIONS] <PlainResource /api/local_raylet_healthz> -> <bound method _PreflightHandler._preflight_handler of <aiohttp_cors.cors_config._CorsConfigImpl object at 0x7f6b5c69feb0>>
  40. 2022-10-05 10:31:48,280 INFO http_server_agent.py:78 -- <ResourceRoute [GET] <PlainResource /api/ray/version> -> <function ServeAgent.get_version at 0x7f6b7a967310>
  41. 2022-10-05 10:31:48,280 INFO http_server_agent.py:78 -- <ResourceRoute [OPTIONS] <PlainResource /api/ray/version> -> <bound method _PreflightHandler._preflight_handler of <aiohttp_cors.cors_config._CorsConfigImpl object at 0x7f6b5c69feb0>>
  42. 2022-10-05 10:31:48,280 INFO http_server_agent.py:78 -- <ResourceRoute [GET] <PlainResource /api/serve/deployments/> -> <function ServeAgent.get_all_deployments at 0x7f6b7a9673a0>
  43. 2022-10-05 10:31:48,280 INFO http_server_agent.py:78 -- <ResourceRoute [OPTIONS] <PlainResource /api/serve/deployments/> -> <bound method _PreflightHandler._preflight_handler of <aiohttp_cors.cors_config._CorsConfigImpl object at 0x7f6b5c69feb0>>
  44. 2022-10-05 10:31:48,280 INFO http_server_agent.py:78 -- <ResourceRoute [GET] <PlainResource /api/serve/deployments/status> -> <function ServeAgent.get_all_deployment_statuses at 0x7f6b7a967550>
  45. 2022-10-05 10:31:48,280 INFO http_server_agent.py:78 -- <ResourceRoute [OPTIONS] <PlainResource /api/serve/deployments/status> -> <bound method _PreflightHandler._preflight_handler of <aiohttp_cors.cors_config._CorsConfigImpl object at 0x7f6b5c69feb0>>
  46. 2022-10-05 10:31:48,280 INFO http_server_agent.py:78 -- <ResourceRoute [DELETE] <PlainResource /api/serve/deployments/> -> <function ServeAgent.delete_serve_application at 0x7f6b7a967700>
  47. 2022-10-05 10:31:48,281 INFO http_server_agent.py:78 -- <ResourceRoute [PUT] <PlainResource /api/serve/deployments/> -> <function ServeAgent.put_all_deployments at 0x7f6b7a9678b0>
  48. 2022-10-05 10:31:48,281 INFO http_server_agent.py:78 -- <ResourceRoute [OPTIONS] <PlainResource /api/serve/deployments/> -> <bound method _PreflightHandler._preflight_handler of <aiohttp_cors.cors_config._CorsConfigImpl object at 0x7f6b5c69feb0>>
  49. 2022-10-05 10:31:48,281 INFO http_server_agent.py:78 -- <ResourceRoute [GET] <StaticResource /logs -> PosixPath('/tmp/ray/session_2022-10-05_08-10-32_169619_1588351/logs')> -> <bound method StaticResource._handle of <StaticResource /logs -> PosixPath('/tmp/ray/session_2022-10-05_08-10-32_169619_1588351/logs')>>
  50. 2022-10-05 10:31:48,281 INFO http_server_agent.py:78 -- <ResourceRoute [OPTIONS] <StaticResource /logs -> PosixPath('/tmp/ray/session_2022-10-05_08-10-32_169619_1588351/logs')> -> <bound method _PreflightHandler._preflight_handler of <aiohttp_cors.cors_config._CorsConfigImpl object at 0x7f6b5c69feb0>>
  51. 2022-10-05 10:31:48,281 INFO http_server_agent.py:79 -- Registered 13 routes.
  52. 2022-10-05 10:31:48,287 INFO event_agent.py:46 -- Report events to 172.29.58.24:45667
  53. 2022-10-05 10:31:48,288 INFO event_utils.py:123 -- Monitor events logs modified after 1664978507.7902436 on /tmp/ray/session_2022-10-05_08-10-32_169619_1588351/logs/events, the source types are ['CORE_WORKER', 'RAYLET', 'COMMON'].
  54. 2022-10-05 10:31:49,090 INFO runtime_env_agent.py:481 -- Got request from raylet to decrease reference for runtime env: {"env_vars": {"OMP_NUM_THREADS": "4"}}.
  55. 2022-10-05 10:31:49,090 WARNING runtime_env_agent.py:125 -- Runtime env {"env_vars": {"OMP_NUM_THREADS": "4"}} does not exist.
  56. 2022-10-05 10:31:49,091 INFO runtime_env_agent.py:323 -- Creating runtime env: {"env_vars": {"OMP_NUM_THREADS": "4"}} with timeout 600 seconds.
  57. 2022-10-05 10:31:49,092 INFO runtime_env_agent.py:374 -- Successfully created runtime env: {"env_vars": {"OMP_NUM_THREADS": "4"}}, the context: {"command_prefix": [], "env_vars": {"OMP_NUM_THREADS": "4"}, "py_executable": "/usr/bin/python3", "resources_dir": null, "container": {}, "java_jars": []}
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement