Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ==> /tmp/ray/session_latest/logs/monitor.err <==
- Connection to 172.31.32.125 closed.
- Warning: Permanently added '172.31.32.125' (ECDSA) to the list of known hosts.
- Connection to 172.31.37.219 closed.
- Connection to 172.31.32.125 closed.
- ssh: connect to host 172.31.19.55 port 22: Connection timed out
- ssh: connect to host 172.31.24.19 port 22: Connection timed out
- ssh: connect to host 172.31.17.106 port 22: Connection timed out
- ssh: connect to host 172.31.19.55 port 22: Connection refused
- ssh: connect to host 172.31.17.106 port 22: Connection refused
- ssh: connect to host 172.31.24.19 port 22: Connection refused
- Warning: Permanently added '172.31.17.106' (ECDSA) to the list of known hosts.
- Warning: Permanently added '172.31.19.55' (ECDSA) to the list of known hosts.
- Warning: Permanently added '172.31.24.19' (ECDSA) to the list of known hosts.
- Shared connection to 172.31.17.106 closed.
- Shared connection to 172.31.19.55 closed.
- Shared connection to 172.31.24.19 closed.
- Warning: Permanently added '172.31.17.106' (ECDSA) to the list of known hosts.
- Warning: Permanently added '172.31.19.55' (ECDSA) to the list of known hosts.
- Warning: Permanently added '172.31.24.19' (ECDSA) to the list of known hosts.
- Connection to 172.31.24.19 closed.
- Connection to 172.31.17.106 closed.
- Warning: Permanently added '172.31.24.19' (ECDSA) to the list of known hosts.
- Warning: Permanently added '172.31.17.106' (ECDSA) to the list of known hosts.
- Connection to 172.31.19.55 closed.
- Warning: Permanently added '172.31.19.55' (ECDSA) to the list of known hosts.
- Connection to 172.31.17.106 closed.
- Warning: Permanently added '172.31.17.106' (ECDSA) to the list of known hosts.
- Connection to 172.31.24.19 closed.
- Warning: Permanently added '172.31.24.19' (ECDSA) to the list of known hosts.
- Connection to 172.31.17.106 closed.
- Warning: Permanently added '172.31.17.106' (ECDSA) to the list of known hosts.
- Connection to 172.31.24.19 closed.
- Warning: Permanently added '172.31.24.19' (ECDSA) to the list of known hosts.
- Connection to 172.31.19.55 closed.
- Warning: Permanently added '172.31.19.55' (ECDSA) to the list of known hosts.
- Connection to 172.31.19.55 closed.
- Warning: Permanently added '172.31.19.55' (ECDSA) to the list of known hosts.
- Connection to 172.31.17.106 closed.
- Warning: Permanently added '172.31.17.106' (ECDSA) to the list of known hosts.
- Connection to 172.31.24.19 closed.
- Warning: Permanently added '172.31.24.19' (ECDSA) to the list of known hosts.
- Connection to 172.31.19.55 closed.
- Warning: Permanently added '172.31.19.55' (ECDSA) to the list of known hosts.
- Connection to 172.31.24.19 closed.
- Connection to 172.31.17.106 closed.
- Warning: Permanently added '172.31.24.19' (ECDSA) to the list of known hosts.
- Warning: Permanently added '172.31.17.106' (ECDSA) to the list of known hosts.
- Connection to 172.31.17.106 closed.
- Connection to 172.31.24.19 closed.
- Connection to 172.31.19.55 closed.
- Warning: Permanently added '172.31.19.55' (ECDSA) to the list of known hosts.
- Connection to 172.31.19.55 closed.
- ssh: connect to host 172.31.7.210 port 22: Connection timed out
- ssh: connect to host 172.31.3.75 port 22: Connection timed out
- ssh: connect to host 172.31.5.6 port 22: Connection timed out
- ssh: connect to host 172.31.7.210 port 22: Connection refused
- ssh: connect to host 172.31.5.6 port 22: Connection refused
- ssh: connect to host 172.31.3.75 port 22: Connection refused
- Warning: Permanently added '172.31.7.210' (ECDSA) to the list of known hosts.
- Shared connection to 172.31.7.210 closed.
- Warning: Permanently added '172.31.5.6' (ECDSA) to the list of known hosts.
- Warning: Permanently added '172.31.3.75' (ECDSA) to the list of known hosts.
- Warning: Permanently added '172.31.7.210' (ECDSA) to the list of known hosts.
- Connection to 172.31.7.210 closed.
- Warning: Permanently added '172.31.7.210' (ECDSA) to the list of known hosts.
- Shared connection to 172.31.5.6 closed.
- Shared connection to 172.31.3.75 closed.
- Warning: Permanently added '172.31.3.75' (ECDSA) to the list of known hosts.
- Warning: Permanently added '172.31.5.6' (ECDSA) to the list of known hosts.
- Connection to 172.31.3.75 closed.
- Warning: Permanently added '172.31.3.75' (ECDSA) to the list of known hosts.
- Connection to 172.31.5.6 closed.
- Warning: Permanently added '172.31.5.6' (ECDSA) to the list of known hosts.
- Connection to 172.31.7.210 closed.
- Warning: Permanently added '172.31.7.210' (ECDSA) to the list of known hosts.
- Connection to 172.31.7.210 closed.
- Warning: Permanently added '172.31.7.210' (ECDSA) to the list of known hosts.
- Connection to 172.31.7.210 closed.
- Warning: Permanently added '172.31.7.210' (ECDSA) to the list of known hosts.
- Connection to 172.31.3.75 closed.
- Warning: Permanently added '172.31.3.75' (ECDSA) to the list of known hosts.
- Connection to 172.31.5.6 closed.
- Warning: Permanently added '172.31.5.6' (ECDSA) to the list of known hosts.
- Connection to 172.31.3.75 closed.
- Warning: Permanently added '172.31.3.75' (ECDSA) to the list of known hosts.
- Connection to 172.31.5.6 closed.
- Warning: Permanently added '172.31.5.6' (ECDSA) to the list of known hosts.
- Connection to 172.31.3.75 closed.
- Warning: Permanently added '172.31.3.75' (ECDSA) to the list of known hosts.
- Connection to 172.31.5.6 closed.
- Warning: Permanently added '172.31.5.6' (ECDSA) to the list of known hosts.
- Connection to 172.31.7.210 closed.
- Warning: Permanently added '172.31.7.210' (ECDSA) to the list of known hosts.
- Connection to 172.31.7.210 closed.
- Connection to 172.31.5.6 closed.
- Warning: Permanently added '172.31.5.6' (ECDSA) to the list of known hosts.
- Connection to 172.31.3.75 closed.
- Warning: Permanently added '172.31.3.75' (ECDSA) to the list of known hosts.
- Connection to 172.31.5.6 closed.
- Connection to 172.31.3.75 closed.
- ==> /tmp/ray/session_latest/logs/monitor.log <==
- Demands:
- {'CPU': 1.0}: 20+ pending tasks/actors
- 2021-05-28 11:42:46,408 INFO autoscaler.py:309 --
- ======== Autoscaler status: 2021-05-28 11:42:46.408447 ========
- Node status
- ---------------------------------------------------------------
- Healthy:
- 1 ray.head.default
- Pending:
- 172.31.6.62: ray.head.default, uninitialized
- 172.31.5.6: ray.worker.default, setting-up
- 172.31.3.75: ray.worker.default, setting-up
- 172.31.7.210: ray.worker.default, setting-up
- Recent failures:
- 172.31.19.55: ray.worker.default
- 172.31.24.19: ray.worker.default
- 172.31.17.106: ray.worker.default
- 172.31.32.125: ray.worker.default
- 172.31.39.243: ray.worker.default
- 172.31.37.219: ray.worker.default
- Resources
- ---------------------------------------------------------------
- Usage:
- 2.0/2.0 CPU
- 0.00/4.518 GiB memory
- 0.00/2.259 GiB object_store_memory
- Demands:
- {'CPU': 1.0}: 20+ pending tasks/actors
- 2021-05-28 11:42:51,955 ERROR autoscaler.py:270 -- StandardAutoscaler: i-0c0fac5b7dd7133a4: Terminating failed to setup/initialize node.
- 2021-05-28 11:42:52,254 INFO autoscaler.py:309 --
- ======== Autoscaler status: 2021-05-28 11:42:52.254463 ========
- Node status
- ---------------------------------------------------------------
- Healthy:
- 1 ray.head.default
- Pending:
- 172.31.6.62: ray.head.default, uninitialized
- 172.31.5.6: ray.worker.default, setting-up
- 172.31.3.75: ray.worker.default, setting-up
- Recent failures:
- 172.31.7.210: ray.worker.default
- 172.31.19.55: ray.worker.default
- 172.31.24.19: ray.worker.default
- 172.31.17.106: ray.worker.default
- 172.31.32.125: ray.worker.default
- 172.31.39.243: ray.worker.default
- 172.31.37.219: ray.worker.default
- Resources
- ---------------------------------------------------------------
- Usage:
- 2.0/2.0 CPU
- 0.00/4.518 GiB memory
- 0.00/2.259 GiB object_store_memory
- Demands:
- {'CPU': 1.0}: 20+ pending tasks/actors
- 2021-05-28 11:42:52,338 INFO monitor.py:192 -- :event_summary:Removing 1 nodes of type ray.worker.default (launch failed).
- 2021-05-28 11:42:57,737 INFO autoscaler.py:705 -- StandardAutoscaler: Queue 1 new nodes for launch
- 2021-05-28 11:42:57,745 INFO node_launcher.py:78 -- NodeLauncher1: Got 1 nodes to launch.
- 2021-05-28 11:42:57,812 ERROR autoscaler.py:270 -- StandardAutoscaler: i-01af7ca3cd041f2e0: Terminating failed to setup/initialize node.
- 2021-05-28 11:42:57,812 ERROR autoscaler.py:270 -- StandardAutoscaler: i-0cddf93ff7d9428df: Terminating failed to setup/initialize node.
- 2021-05-28 11:42:57,837 INFO node_launcher.py:78 -- NodeLauncher1: Launching 1 nodes, type ray.worker.default.
- 2021-05-28 11:42:58,143 INFO autoscaler.py:309 --
- ======== Autoscaler status: 2021-05-28 11:42:58.143289 ========
- Node status
- ---------------------------------------------------------------
- Healthy:
- 1 ray.head.default
- Pending:
- ray.worker.default, 1 launching
- 172.31.6.62: ray.head.default, uninitialized
- Recent failures:
- 172.31.7.210: ray.worker.default
- 172.31.3.75: ray.worker.default
- 172.31.5.6: ray.worker.default
- 172.31.19.55: ray.worker.default
- 172.31.24.19: ray.worker.default
- 172.31.17.106: ray.worker.default
- 172.31.32.125: ray.worker.default
- 172.31.39.243: ray.worker.default
- 172.31.37.219: ray.worker.default
- Resources
- ---------------------------------------------------------------
- Usage:
- 2.0/2.0 CPU
- 0.00/4.518 GiB memory
- 0.00/2.259 GiB object_store_memory
- Demands:
- {'CPU': 1.0}: 20+ pending tasks/actors
- 2021-05-28 11:42:58,201 INFO monitor.py:192 -- :event_summary:Adding 1 nodes of type ray.worker.default.
- 2021-05-28 11:42:58,201 INFO monitor.py:192 -- :event_summary:Removing 2 nodes of type ray.worker.default (launch failed).
- ==> /tmp/ray/session_latest/logs/monitor.out <==
- Setting up docutils-common (0.16+dfsg-2) ...
- Processing triggers for sgml-base (1.29.1) ...
- Setting up python3-docutils (0.16+dfsg-2) ...
- update-alternatives: using /usr/share/docutils/scripts/python3/rst-buildhtml to provide /usr/bin/rst-buildhtml (rst-buildhtml) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2html to provide /usr/bin/rst2html (rst2html) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2html4 to provide /usr/bin/rst2html4 (rst2html4) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2html5 to provide /usr/bin/rst2html5 (rst2html5) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2latex to provide /usr/bin/rst2latex (rst2latex) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2man to provide /usr/bin/rst2man (rst2man) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2odt to provide /usr/bin/rst2odt (rst2odt) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2odt_prepstyles to provide /usr/bin/rst2odt_prepstyles (rst2odt_prepstyles) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2pseudoxml to provide /usr/bin/rst2pseudoxml (rst2pseudoxml) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2s5 to provide /usr/bin/rst2s5 (rst2s5) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2xetex to provide /usr/bin/rst2xetex (rst2xetex) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2xml to provide /usr/bin/rst2xml (rst2xml) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rstpep2html to provide /usr/bin/rstpep2html (rstpep2html) in auto mode
- Setting up python3-botocore (1.16.19+repack-1ubuntu0.20.04.1) ...
- Setting up python3-s3transfer (0.3.3-1) ...
- Setting up awscli (1.18.69-1ubuntu0.20.04.1) ...
- Processing triggers for shared-mime-info (1.15-1) ...
- Processing triggers for shared-mime-info (1.15-1) ...
- Processing triggers for sgml-base (1.29.1) ...
- Setting up docutils-common (0.16+dfsg-2) ...
- Processing triggers for sgml-base (1.29.1) ...
- Setting up python3-docutils (0.16+dfsg-2) ...
- update-alternatives: using /usr/share/docutils/scripts/python3/rst-buildhtml to provide /usr/bin/rst-buildhtml (rst-buildhtml) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2html to provide /usr/bin/rst2html (rst2html) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2html4 to provide /usr/bin/rst2html4 (rst2html4) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2html5 to provide /usr/bin/rst2html5 (rst2html5) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2latex to provide /usr/bin/rst2latex (rst2latex) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2man to provide /usr/bin/rst2man (rst2man) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2odt to provide /usr/bin/rst2odt (rst2odt) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2odt_prepstyles to provide /usr/bin/rst2odt_prepstyles (rst2odt_prepstyles) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2pseudoxml to provide /usr/bin/rst2pseudoxml (rst2pseudoxml) in auto mode
- 2021-05-28 11:42:48,804 VINFO command_runner.py:509 -- Running `export RAY_HEAD_IP=172.31.23.49; aws ecr get-login-password --region eu-central-1 | docker login --username AWS --password-stdin 214830741341.dkr.ecr.eu-central-1.amazonaws.com`
- 2021-05-28 11:42:48,805 VVINFO command_runner.py:511 -- Full command is `ssh -tt -i ~/ray_bootstrap_key.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ConnectTimeout=120s [email protected] bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (export RAY_HEAD_IP=172.31.23.49; aws ecr get-login-password --region eu-central-1 | docker login --username AWS --password-stdin 214830741341.dkr.ecr.eu-central-1.amazonaws.com)'`
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2s5 to provide /usr/bin/rst2s5 (rst2s5) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2xetex to provide /usr/bin/rst2xetex (rst2xetex) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2xml to provide /usr/bin/rst2xml (rst2xml) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rstpep2html to provide /usr/bin/rstpep2html (rstpep2html) in auto mode
- Setting up python3-botocore (1.16.19+repack-1ubuntu0.20.04.1) ...
- Processing triggers for sgml-base (1.29.1) ...
- Setting up docutils-common (0.16+dfsg-2) ...
- Setting up python3-s3transfer (0.3.3-1) ...
- Processing triggers for sgml-base (1.29.1) ...
- Setting up python3-docutils (0.16+dfsg-2) ...
- update-alternatives: using /usr/share/docutils/scripts/python3/rst-buildhtml to provide /usr/bin/rst-buildhtml (rst-buildhtml) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2html to provide /usr/bin/rst2html (rst2html) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2html4 to provide /usr/bin/rst2html4 (rst2html4) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2html5 to provide /usr/bin/rst2html5 (rst2html5) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2latex to provide /usr/bin/rst2latex (rst2latex) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2man to provide /usr/bin/rst2man (rst2man) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2odt to provide /usr/bin/rst2odt (rst2odt) in auto mode
- Setting up awscli (1.18.69-1ubuntu0.20.04.1) ...
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2odt_prepstyles to provide /usr/bin/rst2odt_prepstyles (rst2odt_prepstyles) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2pseudoxml to provide /usr/bin/rst2pseudoxml (rst2pseudoxml) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2s5 to provide /usr/bin/rst2s5 (rst2s5) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2xetex to provide /usr/bin/rst2xetex (rst2xetex) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rst2xml to provide /usr/bin/rst2xml (rst2xml) in auto mode
- update-alternatives: using /usr/share/docutils/scripts/python3/rstpep2html to provide /usr/bin/rstpep2html (rstpep2html) in auto mode
- Unable to locate credentials. You can configure credentials by running "aws configure".
- Error: Cannot perform an interactive login from a non TTY device
- 2021-05-28 11:42:49,811 INFO log_timer.py:25 -- NodeUpdater: i-0c0fac5b7dd7133a4: Initialization commands failed [LogTimer=63522ms]
- 2021-05-28 11:42:49,811 INFO log_timer.py:25 -- NodeUpdater: i-0c0fac5b7dd7133a4: Applied config 15a70e450983425551f140b92c089dc940ec7759 [LogTimer=83210ms]
- Setting up python3-botocore (1.16.19+repack-1ubuntu0.20.04.1) ...
- Setting up python3-s3transfer (0.3.3-1) ...
- Setting up awscli (1.18.69-1ubuntu0.20.04.1) ...
- 2021-05-28 11:42:50,914 INFO log_timer.py:25 -- AWSNodeProvider: Set tag ray-node-status=update-failed on ['i-0c0fac5b7dd7133a4'] [LogTimer=101ms]
- 2021-05-28 11:42:50,914 ERR updater.py:132 -- New status: update-failed
- 2021-05-28 11:42:50,914 ERR updater.py:134 -- !!!
- 2021-05-28 11:42:50,915 VERR updater.py:140 -- {'message': 'SSH command failed.'}
- 2021-05-28 11:42:50,915 ERR updater.py:142 -- SSH command failed.
- 2021-05-28 11:42:50,915 ERR updater.py:144 -- !!!
- 2021-05-28 11:42:51,711 VINFO command_runner.py:509 -- Running `export RAY_HEAD_IP=172.31.23.49; aws ecr get-login-password --region eu-central-1 | docker login --username AWS --password-stdin 214830741341.dkr.ecr.eu-central-1.amazonaws.com`
- 2021-05-28 11:42:51,711 VVINFO command_runner.py:511 -- Full command is `ssh -tt -i ~/ray_bootstrap_key.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ConnectTimeout=120s [email protected] bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (export RAY_HEAD_IP=172.31.23.49; aws ecr get-login-password --region eu-central-1 | docker login --username AWS --password-stdin 214830741341.dkr.ecr.eu-central-1.amazonaws.com)'`
- 2021-05-28 11:42:51,956 INFO node_provider.py:462 -- Terminating instances i-0c0fac5b7dd7133a4 (cannot stop spot instances, only terminate)
- 2021-05-28 11:42:52,409 VINFO command_runner.py:509 -- Running `export RAY_HEAD_IP=172.31.23.49; aws ecr get-login-password --region eu-central-1 | docker login --username AWS --password-stdin 214830741341.dkr.ecr.eu-central-1.amazonaws.com`
- 2021-05-28 11:42:52,409 VVINFO command_runner.py:511 -- Full command is `ssh -tt -i ~/ray_bootstrap_key.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ConnectTimeout=120s [email protected] bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (export RAY_HEAD_IP=172.31.23.49; aws ecr get-login-password --region eu-central-1 | docker login --username AWS --password-stdin 214830741341.dkr.ecr.eu-central-1.amazonaws.com)'`
- Unable to locate credentials. You can configure credentials by running "aws configure".
- Error: Cannot perform an interactive login from a non TTY device
- 2021-05-28 11:42:52,762 INFO log_timer.py:25 -- NodeUpdater: i-01af7ca3cd041f2e0: Initialization commands failed [LogTimer=63262ms]
- 2021-05-28 11:42:52,763 INFO log_timer.py:25 -- NodeUpdater: i-01af7ca3cd041f2e0: Applied config 15a70e450983425551f140b92c089dc940ec7759 [LogTimer=86179ms]
- Unable to locate credentials. You can configure credentials by running "aws configure".
- Error: Cannot perform an interactive login from a non TTY device
- 2021-05-28 11:42:53,487 INFO log_timer.py:25 -- NodeUpdater: i-0cddf93ff7d9428df: Initialization commands failed [LogTimer=63983ms]
- 2021-05-28 11:42:53,488 INFO log_timer.py:25 -- NodeUpdater: i-0cddf93ff7d9428df: Applied config 15a70e450983425551f140b92c089dc940ec7759 [LogTimer=86893ms]
- 2021-05-28 11:42:53,877 INFO log_timer.py:25 -- AWSNodeProvider: Set tag ray-node-status=update-failed on ['i-01af7ca3cd041f2e0', 'i-0cddf93ff7d9428df'] [LogTimer=113ms]
- 2021-05-28 11:42:53,878 ERR updater.py:132 -- New status: update-failed
- 2021-05-28 11:42:53,878 ERR updater.py:134 -- !!!
- 2021-05-28 11:42:53,878 VERR updater.py:140 -- {'message': 'SSH command failed.'}
- 2021-05-28 11:42:53,878 ERR updater.py:142 -- SSH command failed.
- 2021-05-28 11:42:53,878 ERR updater.py:144 -- !!!
- 2021-05-28 11:42:53,877 ERR updater.py:132 -- New status: update-failed
- 2021-05-28 11:42:53,879 ERR updater.py:134 -- !!!
- 2021-05-28 11:42:53,879 VERR updater.py:140 -- {'message': 'SSH command failed.'}
- 2021-05-28 11:42:53,879 ERR updater.py:142 -- SSH command failed.
- 2021-05-28 11:42:53,879 ERR updater.py:144 -- !!!
- 2021-05-28 11:42:57,813 INFO node_provider.py:462 -- Terminating instances i-01af7ca3cd041f2e0, i-0cddf93ff7d9428df (cannot stop spot instances, only terminate)
- 2021-05-28 11:42:59,363 INFO node_provider.py:376 -- Launched 1 nodes [subnet_id=subnet-e04e7daa]
- 2021-05-28 11:42:59,364 INFO node_provider.py:393 -- Launched instance i-0e071357c46946603 [state=pending, info=pending]
- ==> /tmp/ray/session_latest/logs/monitor.log <==
- 2021-05-28 11:43:03,584 INFO autoscaler.py:705 -- StandardAutoscaler: Queue 2 new nodes for launch
- 2021-05-28 11:43:03,587 INFO node_launcher.py:78 -- NodeLauncher0: Got 2 nodes to launch.
- 2021-05-28 11:43:03,653 INFO node_launcher.py:78 -- NodeLauncher0: Launching 2 nodes, type ray.worker.default.
- 2021-05-28 11:43:03,656 INFO autoscaler.py:659 -- Creating new (spawn_updater) updater thread for node i-0e071357c46946603.
- 2021-05-28 11:43:03,730 INFO autoscaler.py:309 --
- ======== Autoscaler status: 2021-05-28 11:43:03.730037 ========
- Node status
- ---------------------------------------------------------------
- Healthy:
- 1 ray.head.default
- Pending:
- ray.worker.default, 2 launching
- 172.31.6.62: ray.head.default, uninitialized
- 172.31.44.105: ray.worker.default, waiting-for-ssh
- Recent failures:
- 172.31.7.210: ray.worker.default
- 172.31.3.75: ray.worker.default
- 172.31.5.6: ray.worker.default
- 172.31.19.55: ray.worker.default
- 172.31.24.19: ray.worker.default
- 172.31.17.106: ray.worker.default
- 172.31.32.125: ray.worker.default
- 172.31.39.243: ray.worker.default
- 172.31.37.219: ray.worker.default
- Resources
- ---------------------------------------------------------------
- Usage:
- 2.0/2.0 CPU
- 0.00/4.518 GiB memory
- 0.00/2.259 GiB object_store_memory
- Demands:
- {'CPU': 1.0}: 20+ pending tasks/actors
- 2021-05-28 11:43:03,802 INFO monitor.py:192 -- :event_summary:Adding 2 nodes of type ray.worker.default.
- ==> /tmp/ray/session_latest/logs/monitor.out <==
- 2021-05-28 11:43:04,778 INFO log_timer.py:25 -- AWSNodeProvider: Set tag ray-node-status=waiting-for-ssh on ['i-0e071357c46946603'] [LogTimer=110ms]
- 2021-05-28 11:43:04,779 INFO updater.py:286 -- New status: waiting-for-ssh
- 2021-05-28 11:43:04,779 INFO updater.py:232 -- [1/7] Waiting for SSH to become available
- 2021-05-28 11:43:04,779 INFO updater.py:237 -- Running `uptime` as a test.
- 2021-05-28 11:43:04,779 INFO command_runner.py:357 -- Fetched IP: 172.31.44.105
- 2021-05-28 11:43:04,779 INFO log_timer.py:25 -- NodeUpdater: i-0e071357c46946603: Got IP [LogTimer=0ms]
- 2021-05-28 11:43:04,779 VINFO command_runner.py:509 -- Running `uptime`
- 2021-05-28 11:43:04,779 VVINFO command_runner.py:511 -- Full command is `ssh -tt -i ~/ray_bootstrap_key.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_1d41c853af/ddec11ab83/%C -o ControlPersist=10s -o ConnectTimeout=5s [email protected] bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'`
- 2021-05-28 11:43:05,273 INFO node_provider.py:376 -- Launched 2 nodes [subnet_id=subnet-1733896c]
- 2021-05-28 11:43:05,273 INFO node_provider.py:393 -- Launched instance i-06b78fab1fb000a25 [state=pending, info=pending]
- 2021-05-28 11:43:05,273 INFO node_provider.py:393 -- Launched instance i-0e068a78413d29afc [state=pending, info=pending]
- ==> /tmp/ray/session_latest/logs/monitor.log <==
- 2021-05-28 11:43:09,338 INFO autoscaler.py:659 -- Creating new (spawn_updater) updater thread for node i-0e068a78413d29afc.
- 2021-05-28 11:43:09,342 INFO autoscaler.py:659 -- Creating new (spawn_updater) updater thread for node i-06b78fab1fb000a25.
- 2021-05-28 11:43:09,416 INFO autoscaler.py:309 --
- ======== Autoscaler status: 2021-05-28 11:43:09.416880 ========
- Node status
- ---------------------------------------------------------------
- Healthy:
- 1 ray.head.default
- Pending:
- 172.31.6.62: ray.head.default, uninitialized
- 172.31.27.253: ray.worker.default, waiting-for-ssh
- 172.31.23.115: ray.worker.default, waiting-for-ssh
- 172.31.44.105: ray.worker.default, waiting-for-ssh
- Recent failures:
- 172.31.7.210: ray.worker.default
- 172.31.3.75: ray.worker.default
- 172.31.5.6: ray.worker.default
- 172.31.19.55: ray.worker.default
- 172.31.24.19: ray.worker.default
- 172.31.17.106: ray.worker.default
- 172.31.32.125: ray.worker.default
- 172.31.39.243: ray.worker.default
- 172.31.37.219: ray.worker.default
- Resources
- ---------------------------------------------------------------
- Usage:
- 2.0/2.0 CPU
- 0.00/4.518 GiB memory
- 0.00/2.259 GiB object_store_memory
- Demands:
- {'CPU': 1.0}: 20+ pending tasks/actors
- ==> /tmp/ray/session_latest/logs/monitor.err <==
- ssh: connect to host 172.31.44.105 port 22: Connection timed out
- ==> /tmp/ray/session_latest/logs/monitor.out <==
- 2021-05-28 11:43:09,805 INFO updater.py:274 -- SSH still not available (SSH command failed.), retrying in 5 seconds.
- 2021-05-28 11:43:10,503 INFO log_timer.py:25 -- AWSNodeProvider: Set tag ray-node-status=waiting-for-ssh on ['i-0e068a78413d29afc', 'i-06b78fab1fb000a25'] [LogTimer=163ms]
- 2021-05-28 11:43:10,504 INFO updater.py:286 -- New status: waiting-for-ssh2021-05-28 11:43:10,504 INFO updater.py:286 -- New status: waiting-for-ssh
- 2021-05-28 11:43:10,504 INFO updater.py:232 -- [1/7] Waiting for SSH to become available2021-05-28 11:43:10,504 INFO updater.py:232 -- [1/7] Waiting for SSH to become available
- 2021-05-28 11:43:10,504 INFO updater.py:237 -- Running `uptime` as a test.2021-05-28 11:43:10,504 INFO updater.py:237 -- Running `uptime` as a test.
- 2021-05-28 11:43:10,505 INFO command_runner.py:357 -- Fetched IP: 172.31.27.2532021-05-28 11:43:10,505 INFO command_runner.py:357 -- Fetched IP: 172.31.23.115
- 2021-05-28 11:43:10,505 INFO log_timer.py:25 -- NodeUpdater: i-0e068a78413d29afc: Got IP [LogTimer=0ms]2021-05-28 11:43:10,505 INFO log_timer.py:25 -- NodeUpdater: i-06b78fab1fb000a25: Got IP [LogTimer=0ms]
- 2021-05-28 11:43:10,505 VINFO command_runner.py:509 -- Running `uptime`2021-05-28 11:43:10,505 VINFO command_runner.py:509 -- Running `uptime`
- 2021-05-28 11:43:10,506 VVINFO command_runner.py:511 -- Full command is `ssh -tt -i ~/ray_bootstrap_key.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_1d41c853af/ddec11ab83/%C -o ControlPersist=10s -o ConnectTimeout=5s [email protected] bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'`2021-05-28 11:43:10,506 VVINFO command_runner.py:511 -- Full command is `ssh -tt -i ~/ray_bootstrap_key.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_1d41c853af/ddec11ab83/%C -o ControlPersist=10s -o ConnectTimeout=5s [email protected] bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'`
- ==> /tmp/ray/session_latest/logs/monitor.err <==
- ssh: connect to host 172.31.27.253 port 22: Connection timed out
- ==> /tmp/ray/session_latest/logs/monitor.log <==
- 2021-05-28 11:43:15,258 INFO autoscaler.py:309 --
- ======== Autoscaler status: 2021-05-28 11:43:15.258314 ========
- Node status
- ---------------------------------------------------------------
- Healthy:
- 1 ray.head.default
- Pending:
- 172.31.6.62: ray.head.default, uninitialized
- 172.31.27.253: ray.worker.default, waiting-for-ssh
- 172.31.23.115: ray.worker.default, waiting-for-ssh
- 172.31.44.105: ray.worker.default, waiting-for-ssh
- Recent failures:
- 172.31.7.210: ray.worker.default
- 172.31.3.75: ray.worker.default
- 172.31.5.6: ray.worker.default
- 172.31.19.55: ray.worker.default
- 172.31.24.19: ray.worker.default
- 172.31.17.106: ray.worker.default
- 172.31.32.125: ray.worker.default
- 172.31.39.243: ray.worker.default
- 172.31.37.219: ray.worker.default
- Resources
- ---------------------------------------------------------------
- Usage:
- 2.0/2.0 CPU
- 0.00/4.518 GiB memory
- 0.00/2.259 GiB object_store_memory
- Demands:
- {'CPU': 1.0}: 20+ pending tasks/actors
- ==> /tmp/ray/session_latest/logs/monitor.out <==
- 2021-05-28 11:43:14,810 VINFO command_runner.py:509 -- Running `uptime`
- 2021-05-28 11:43:14,811 VVINFO command_runner.py:511 -- Full command is `ssh -tt -i ~/ray_bootstrap_key.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_1d41c853af/ddec11ab83/%C -o ControlPersist=10s -o ConnectTimeout=5s [email protected] bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'`
- 2021-05-28 11:43:15,531 INFO updater.py:274 -- SSH still not available (SSH command failed.), retrying in 5 seconds.
- ==> /tmp/ray/session_latest/logs/monitor.err <==
- ssh: connect to host 172.31.23.115 port 22: Connection timed out
- ==> /tmp/ray/session_latest/logs/monitor.out <==
- 2021-05-28 11:43:15,543 INFO updater.py:274 -- SSH still not available (SSH command failed.), retrying in 5 seconds.
- ==> /tmp/ray/session_latest/logs/monitor.err <==
- ssh: connect to host 172.31.44.105 port 22: Connection refused
- ==> /tmp/ray/session_latest/logs/monitor.out <==
- 2021-05-28 11:43:17,888 INFO updater.py:274 -- SSH still not available (SSH command failed.), retrying in 5 seconds.
- 2021-05-28 11:43:20,538 VINFO command_runner.py:509 -- Running `uptime`
- 2021-05-28 11:43:20,538 VVINFO command_runner.py:511 -- Full command is `ssh -tt -i ~/ray_bootstrap_key.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_1d41c853af/ddec11ab83/%C -o ControlPersist=10s -o ConnectTimeout=5s [email protected] bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'`^C
- Shared connection to 3.120.108.100 closed.
- Error: Command failed:
- ssh -tt -i /Users/mlubej/.ssh/ray-autoscaler_1_eu-central-1.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_570a62982a/ddec11ab83/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (docker exec -it ray_container /bin/bash -c '"'"'bash --login -c -i '"'"'"'"'"'"'"'"'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (tail -n 100 -f /tmp/ray/session_latest/logs/monitor*)'"'"'"'"'"'"'"'"''"'"' )'
- Loaded cached provider configuration
- If you experience issues with the cloud provider, try re-running the command with --no-config-cache.
- Fetched IP: 3.120.108.100
Advertisement
Add Comment
Please, Sign In to add comment