Guest User

Untitled

a guest
May 28th, 2021
85
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 32.03 KB | None | 0 0
  1. ==> /tmp/ray/session_latest/logs/monitor.err <==
  2. Connection to 172.31.32.125 closed.
  3.  
  4. Warning: Permanently added '172.31.32.125' (ECDSA) to the list of known hosts.
  5.  
  6. Connection to 172.31.37.219 closed.
  7.  
  8. Connection to 172.31.32.125 closed.
  9.  
  10. ssh: connect to host 172.31.19.55 port 22: Connection timed out
  11.  
  12. ssh: connect to host 172.31.24.19 port 22: Connection timed out
  13.  
  14. ssh: connect to host 172.31.17.106 port 22: Connection timed out
  15.  
  16. ssh: connect to host 172.31.19.55 port 22: Connection refused
  17.  
  18. ssh: connect to host 172.31.17.106 port 22: Connection refused
  19.  
  20. ssh: connect to host 172.31.24.19 port 22: Connection refused
  21.  
  22. Warning: Permanently added '172.31.17.106' (ECDSA) to the list of known hosts.
  23.  
  24. Warning: Permanently added '172.31.19.55' (ECDSA) to the list of known hosts.
  25.  
  26. Warning: Permanently added '172.31.24.19' (ECDSA) to the list of known hosts.
  27.  
  28. Shared connection to 172.31.17.106 closed.
  29.  
  30. Shared connection to 172.31.19.55 closed.
  31.  
  32. Shared connection to 172.31.24.19 closed.
  33.  
  34. Warning: Permanently added '172.31.17.106' (ECDSA) to the list of known hosts.
  35.  
  36. Warning: Permanently added '172.31.19.55' (ECDSA) to the list of known hosts.
  37.  
  38. Warning: Permanently added '172.31.24.19' (ECDSA) to the list of known hosts.
  39.  
  40. Connection to 172.31.24.19 closed.
  41.  
  42. Connection to 172.31.17.106 closed.
  43.  
  44. Warning: Permanently added '172.31.24.19' (ECDSA) to the list of known hosts.
  45.  
  46. Warning: Permanently added '172.31.17.106' (ECDSA) to the list of known hosts.
  47.  
  48. Connection to 172.31.19.55 closed.
  49.  
  50. Warning: Permanently added '172.31.19.55' (ECDSA) to the list of known hosts.
  51.  
  52. Connection to 172.31.17.106 closed.
  53.  
  54. Warning: Permanently added '172.31.17.106' (ECDSA) to the list of known hosts.
  55.  
  56. Connection to 172.31.24.19 closed.
  57.  
  58. Warning: Permanently added '172.31.24.19' (ECDSA) to the list of known hosts.
  59.  
  60. Connection to 172.31.17.106 closed.
  61.  
  62. Warning: Permanently added '172.31.17.106' (ECDSA) to the list of known hosts.
  63.  
  64. Connection to 172.31.24.19 closed.
  65.  
  66. Warning: Permanently added '172.31.24.19' (ECDSA) to the list of known hosts.
  67.  
  68. Connection to 172.31.19.55 closed.
  69.  
  70. Warning: Permanently added '172.31.19.55' (ECDSA) to the list of known hosts.
  71.  
  72. Connection to 172.31.19.55 closed.
  73.  
  74. Warning: Permanently added '172.31.19.55' (ECDSA) to the list of known hosts.
  75.  
  76. Connection to 172.31.17.106 closed.
  77.  
  78. Warning: Permanently added '172.31.17.106' (ECDSA) to the list of known hosts.
  79.  
  80. Connection to 172.31.24.19 closed.
  81.  
  82. Warning: Permanently added '172.31.24.19' (ECDSA) to the list of known hosts.
  83.  
  84. Connection to 172.31.19.55 closed.
  85.  
  86. Warning: Permanently added '172.31.19.55' (ECDSA) to the list of known hosts.
  87.  
  88. Connection to 172.31.24.19 closed.
  89.  
  90. Connection to 172.31.17.106 closed.
  91.  
  92. Warning: Permanently added '172.31.24.19' (ECDSA) to the list of known hosts.
  93.  
  94. Warning: Permanently added '172.31.17.106' (ECDSA) to the list of known hosts.
  95.  
  96. Connection to 172.31.17.106 closed.
  97.  
  98. Connection to 172.31.24.19 closed.
  99.  
  100. Connection to 172.31.19.55 closed.
  101.  
  102. Warning: Permanently added '172.31.19.55' (ECDSA) to the list of known hosts.
  103.  
  104. Connection to 172.31.19.55 closed.
  105.  
  106. ssh: connect to host 172.31.7.210 port 22: Connection timed out
  107.  
  108. ssh: connect to host 172.31.3.75 port 22: Connection timed out
  109.  
  110. ssh: connect to host 172.31.5.6 port 22: Connection timed out
  111.  
  112. ssh: connect to host 172.31.7.210 port 22: Connection refused
  113.  
  114. ssh: connect to host 172.31.5.6 port 22: Connection refused
  115.  
  116. ssh: connect to host 172.31.3.75 port 22: Connection refused
  117.  
  118. Warning: Permanently added '172.31.7.210' (ECDSA) to the list of known hosts.
  119.  
  120. Shared connection to 172.31.7.210 closed.
  121.  
  122. Warning: Permanently added '172.31.5.6' (ECDSA) to the list of known hosts.
  123.  
  124. Warning: Permanently added '172.31.3.75' (ECDSA) to the list of known hosts.
  125.  
  126. Warning: Permanently added '172.31.7.210' (ECDSA) to the list of known hosts.
  127.  
  128. Connection to 172.31.7.210 closed.
  129.  
  130. Warning: Permanently added '172.31.7.210' (ECDSA) to the list of known hosts.
  131.  
  132. Shared connection to 172.31.5.6 closed.
  133.  
  134. Shared connection to 172.31.3.75 closed.
  135.  
  136. Warning: Permanently added '172.31.3.75' (ECDSA) to the list of known hosts.
  137.  
  138. Warning: Permanently added '172.31.5.6' (ECDSA) to the list of known hosts.
  139.  
  140. Connection to 172.31.3.75 closed.
  141.  
  142. Warning: Permanently added '172.31.3.75' (ECDSA) to the list of known hosts.
  143.  
  144. Connection to 172.31.5.6 closed.
  145.  
  146. Warning: Permanently added '172.31.5.6' (ECDSA) to the list of known hosts.
  147.  
  148. Connection to 172.31.7.210 closed.
  149.  
  150. Warning: Permanently added '172.31.7.210' (ECDSA) to the list of known hosts.
  151.  
  152. Connection to 172.31.7.210 closed.
  153.  
  154. Warning: Permanently added '172.31.7.210' (ECDSA) to the list of known hosts.
  155.  
  156. Connection to 172.31.7.210 closed.
  157.  
  158. Warning: Permanently added '172.31.7.210' (ECDSA) to the list of known hosts.
  159.  
  160. Connection to 172.31.3.75 closed.
  161.  
  162. Warning: Permanently added '172.31.3.75' (ECDSA) to the list of known hosts.
  163.  
  164. Connection to 172.31.5.6 closed.
  165.  
  166. Warning: Permanently added '172.31.5.6' (ECDSA) to the list of known hosts.
  167.  
  168. Connection to 172.31.3.75 closed.
  169.  
  170. Warning: Permanently added '172.31.3.75' (ECDSA) to the list of known hosts.
  171.  
  172. Connection to 172.31.5.6 closed.
  173.  
  174. Warning: Permanently added '172.31.5.6' (ECDSA) to the list of known hosts.
  175.  
  176. Connection to 172.31.3.75 closed.
  177.  
  178. Warning: Permanently added '172.31.3.75' (ECDSA) to the list of known hosts.
  179.  
  180. Connection to 172.31.5.6 closed.
  181.  
  182. Warning: Permanently added '172.31.5.6' (ECDSA) to the list of known hosts.
  183.  
  184. Connection to 172.31.7.210 closed.
  185.  
  186. Warning: Permanently added '172.31.7.210' (ECDSA) to the list of known hosts.
  187.  
  188. Connection to 172.31.7.210 closed.
  189.  
  190. Connection to 172.31.5.6 closed.
  191.  
  192. Warning: Permanently added '172.31.5.6' (ECDSA) to the list of known hosts.
  193.  
  194. Connection to 172.31.3.75 closed.
  195.  
  196. Warning: Permanently added '172.31.3.75' (ECDSA) to the list of known hosts.
  197.  
  198. Connection to 172.31.5.6 closed.
  199.  
  200. Connection to 172.31.3.75 closed.
  201.  
  202.  
  203. ==> /tmp/ray/session_latest/logs/monitor.log <==
  204.  
  205. Demands:
  206. {'CPU': 1.0}: 20+ pending tasks/actors
  207. 2021-05-28 11:42:46,408 INFO autoscaler.py:309 --
  208. ======== Autoscaler status: 2021-05-28 11:42:46.408447 ========
  209. Node status
  210. ---------------------------------------------------------------
  211. Healthy:
  212. 1 ray.head.default
  213. Pending:
  214. 172.31.6.62: ray.head.default, uninitialized
  215. 172.31.5.6: ray.worker.default, setting-up
  216. 172.31.3.75: ray.worker.default, setting-up
  217. 172.31.7.210: ray.worker.default, setting-up
  218. Recent failures:
  219. 172.31.19.55: ray.worker.default
  220. 172.31.24.19: ray.worker.default
  221. 172.31.17.106: ray.worker.default
  222. 172.31.32.125: ray.worker.default
  223. 172.31.39.243: ray.worker.default
  224. 172.31.37.219: ray.worker.default
  225.  
  226. Resources
  227. ---------------------------------------------------------------
  228.  
  229. Usage:
  230. 2.0/2.0 CPU
  231. 0.00/4.518 GiB memory
  232. 0.00/2.259 GiB object_store_memory
  233.  
  234. Demands:
  235. {'CPU': 1.0}: 20+ pending tasks/actors
  236. 2021-05-28 11:42:51,955 ERROR autoscaler.py:270 -- StandardAutoscaler: i-0c0fac5b7dd7133a4: Terminating failed to setup/initialize node.
  237. 2021-05-28 11:42:52,254 INFO autoscaler.py:309 --
  238. ======== Autoscaler status: 2021-05-28 11:42:52.254463 ========
  239. Node status
  240. ---------------------------------------------------------------
  241. Healthy:
  242. 1 ray.head.default
  243. Pending:
  244. 172.31.6.62: ray.head.default, uninitialized
  245. 172.31.5.6: ray.worker.default, setting-up
  246. 172.31.3.75: ray.worker.default, setting-up
  247. Recent failures:
  248. 172.31.7.210: ray.worker.default
  249. 172.31.19.55: ray.worker.default
  250. 172.31.24.19: ray.worker.default
  251. 172.31.17.106: ray.worker.default
  252. 172.31.32.125: ray.worker.default
  253. 172.31.39.243: ray.worker.default
  254. 172.31.37.219: ray.worker.default
  255.  
  256. Resources
  257. ---------------------------------------------------------------
  258.  
  259. Usage:
  260. 2.0/2.0 CPU
  261. 0.00/4.518 GiB memory
  262. 0.00/2.259 GiB object_store_memory
  263.  
  264. Demands:
  265. {'CPU': 1.0}: 20+ pending tasks/actors
  266. 2021-05-28 11:42:52,338 INFO monitor.py:192 -- :event_summary:Removing 1 nodes of type ray.worker.default (launch failed).
  267. 2021-05-28 11:42:57,737 INFO autoscaler.py:705 -- StandardAutoscaler: Queue 1 new nodes for launch
  268. 2021-05-28 11:42:57,745 INFO node_launcher.py:78 -- NodeLauncher1: Got 1 nodes to launch.
  269. 2021-05-28 11:42:57,812 ERROR autoscaler.py:270 -- StandardAutoscaler: i-01af7ca3cd041f2e0: Terminating failed to setup/initialize node.
  270. 2021-05-28 11:42:57,812 ERROR autoscaler.py:270 -- StandardAutoscaler: i-0cddf93ff7d9428df: Terminating failed to setup/initialize node.
  271. 2021-05-28 11:42:57,837 INFO node_launcher.py:78 -- NodeLauncher1: Launching 1 nodes, type ray.worker.default.
  272. 2021-05-28 11:42:58,143 INFO autoscaler.py:309 --
  273. ======== Autoscaler status: 2021-05-28 11:42:58.143289 ========
  274. Node status
  275. ---------------------------------------------------------------
  276. Healthy:
  277. 1 ray.head.default
  278. Pending:
  279. ray.worker.default, 1 launching
  280. 172.31.6.62: ray.head.default, uninitialized
  281. Recent failures:
  282. 172.31.7.210: ray.worker.default
  283. 172.31.3.75: ray.worker.default
  284. 172.31.5.6: ray.worker.default
  285. 172.31.19.55: ray.worker.default
  286. 172.31.24.19: ray.worker.default
  287. 172.31.17.106: ray.worker.default
  288. 172.31.32.125: ray.worker.default
  289. 172.31.39.243: ray.worker.default
  290. 172.31.37.219: ray.worker.default
  291.  
  292. Resources
  293. ---------------------------------------------------------------
  294.  
  295. Usage:
  296. 2.0/2.0 CPU
  297. 0.00/4.518 GiB memory
  298. 0.00/2.259 GiB object_store_memory
  299.  
  300. Demands:
  301. {'CPU': 1.0}: 20+ pending tasks/actors
  302. 2021-05-28 11:42:58,201 INFO monitor.py:192 -- :event_summary:Adding 1 nodes of type ray.worker.default.
  303. 2021-05-28 11:42:58,201 INFO monitor.py:192 -- :event_summary:Removing 2 nodes of type ray.worker.default (launch failed).
  304.  
  305. ==> /tmp/ray/session_latest/logs/monitor.out <==
  306. Setting up docutils-common (0.16+dfsg-2) ...
  307.  
  308. Processing triggers for sgml-base (1.29.1) ...
  309.  
  310. Setting up python3-docutils (0.16+dfsg-2) ...
  311.  
  312. update-alternatives: using /usr/share/docutils/scripts/python3/rst-buildhtml to provide /usr/bin/rst-buildhtml (rst-buildhtml) in auto mode
  313.  
  314. update-alternatives: using /usr/share/docutils/scripts/python3/rst2html to provide /usr/bin/rst2html (rst2html) in auto mode
  315.  
  316. update-alternatives: using /usr/share/docutils/scripts/python3/rst2html4 to provide /usr/bin/rst2html4 (rst2html4) in auto mode
  317.  
  318. update-alternatives: using /usr/share/docutils/scripts/python3/rst2html5 to provide /usr/bin/rst2html5 (rst2html5) in auto mode
  319.  
  320. update-alternatives: using /usr/share/docutils/scripts/python3/rst2latex to provide /usr/bin/rst2latex (rst2latex) in auto mode
  321.  
  322. update-alternatives: using /usr/share/docutils/scripts/python3/rst2man to provide /usr/bin/rst2man (rst2man) in auto mode
  323.  
  324. update-alternatives: using /usr/share/docutils/scripts/python3/rst2odt to provide /usr/bin/rst2odt (rst2odt) in auto mode
  325.  
  326. update-alternatives: using /usr/share/docutils/scripts/python3/rst2odt_prepstyles to provide /usr/bin/rst2odt_prepstyles (rst2odt_prepstyles) in auto mode
  327.  
  328. update-alternatives: using /usr/share/docutils/scripts/python3/rst2pseudoxml to provide /usr/bin/rst2pseudoxml (rst2pseudoxml) in auto mode
  329.  
  330. update-alternatives: using /usr/share/docutils/scripts/python3/rst2s5 to provide /usr/bin/rst2s5 (rst2s5) in auto mode
  331.  
  332. update-alternatives: using /usr/share/docutils/scripts/python3/rst2xetex to provide /usr/bin/rst2xetex (rst2xetex) in auto mode
  333.  
  334. update-alternatives: using /usr/share/docutils/scripts/python3/rst2xml to provide /usr/bin/rst2xml (rst2xml) in auto mode
  335.  
  336. update-alternatives: using /usr/share/docutils/scripts/python3/rstpep2html to provide /usr/bin/rstpep2html (rstpep2html) in auto mode
  337.  
  338. Setting up python3-botocore (1.16.19+repack-1ubuntu0.20.04.1) ...
  339.  
  340. Setting up python3-s3transfer (0.3.3-1) ...
  341.  
  342. Setting up awscli (1.18.69-1ubuntu0.20.04.1) ...
  343.  
  344. Processing triggers for shared-mime-info (1.15-1) ...
  345.  
  346. Processing triggers for shared-mime-info (1.15-1) ...
  347.  
  348. Processing triggers for sgml-base (1.29.1) ...
  349.  
  350. Setting up docutils-common (0.16+dfsg-2) ...
  351.  
  352. Processing triggers for sgml-base (1.29.1) ...
  353.  
  354. Setting up python3-docutils (0.16+dfsg-2) ...
  355.  
  356. update-alternatives: using /usr/share/docutils/scripts/python3/rst-buildhtml to provide /usr/bin/rst-buildhtml (rst-buildhtml) in auto mode
  357.  
  358. update-alternatives: using /usr/share/docutils/scripts/python3/rst2html to provide /usr/bin/rst2html (rst2html) in auto mode
  359.  
  360. update-alternatives: using /usr/share/docutils/scripts/python3/rst2html4 to provide /usr/bin/rst2html4 (rst2html4) in auto mode
  361.  
  362. update-alternatives: using /usr/share/docutils/scripts/python3/rst2html5 to provide /usr/bin/rst2html5 (rst2html5) in auto mode
  363.  
  364. update-alternatives: using /usr/share/docutils/scripts/python3/rst2latex to provide /usr/bin/rst2latex (rst2latex) in auto mode
  365.  
  366. update-alternatives: using /usr/share/docutils/scripts/python3/rst2man to provide /usr/bin/rst2man (rst2man) in auto mode
  367.  
  368. update-alternatives: using /usr/share/docutils/scripts/python3/rst2odt to provide /usr/bin/rst2odt (rst2odt) in auto mode
  369.  
  370. update-alternatives: using /usr/share/docutils/scripts/python3/rst2odt_prepstyles to provide /usr/bin/rst2odt_prepstyles (rst2odt_prepstyles) in auto mode
  371.  
  372. update-alternatives: using /usr/share/docutils/scripts/python3/rst2pseudoxml to provide /usr/bin/rst2pseudoxml (rst2pseudoxml) in auto mode
  373.  
  374. 2021-05-28 11:42:48,804 VINFO command_runner.py:509 -- Running `export RAY_HEAD_IP=172.31.23.49; aws ecr get-login-password --region eu-central-1 | docker login --username AWS --password-stdin 214830741341.dkr.ecr.eu-central-1.amazonaws.com`
  375. 2021-05-28 11:42:48,805 VVINFO command_runner.py:511 -- Full command is `ssh -tt -i ~/ray_bootstrap_key.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ConnectTimeout=120s [email protected] bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (export RAY_HEAD_IP=172.31.23.49; aws ecr get-login-password --region eu-central-1 | docker login --username AWS --password-stdin 214830741341.dkr.ecr.eu-central-1.amazonaws.com)'`
  376. update-alternatives: using /usr/share/docutils/scripts/python3/rst2s5 to provide /usr/bin/rst2s5 (rst2s5) in auto mode
  377.  
  378. update-alternatives: using /usr/share/docutils/scripts/python3/rst2xetex to provide /usr/bin/rst2xetex (rst2xetex) in auto mode
  379.  
  380. update-alternatives: using /usr/share/docutils/scripts/python3/rst2xml to provide /usr/bin/rst2xml (rst2xml) in auto mode
  381.  
  382. update-alternatives: using /usr/share/docutils/scripts/python3/rstpep2html to provide /usr/bin/rstpep2html (rstpep2html) in auto mode
  383.  
  384. Setting up python3-botocore (1.16.19+repack-1ubuntu0.20.04.1) ...
  385.  
  386. Processing triggers for sgml-base (1.29.1) ...
  387.  
  388. Setting up docutils-common (0.16+dfsg-2) ...
  389.  
  390. Setting up python3-s3transfer (0.3.3-1) ...
  391.  
  392. Processing triggers for sgml-base (1.29.1) ...
  393.  
  394. Setting up python3-docutils (0.16+dfsg-2) ...
  395.  
  396. update-alternatives: using /usr/share/docutils/scripts/python3/rst-buildhtml to provide /usr/bin/rst-buildhtml (rst-buildhtml) in auto mode
  397.  
  398. update-alternatives: using /usr/share/docutils/scripts/python3/rst2html to provide /usr/bin/rst2html (rst2html) in auto mode
  399.  
  400. update-alternatives: using /usr/share/docutils/scripts/python3/rst2html4 to provide /usr/bin/rst2html4 (rst2html4) in auto mode
  401.  
  402. update-alternatives: using /usr/share/docutils/scripts/python3/rst2html5 to provide /usr/bin/rst2html5 (rst2html5) in auto mode
  403.  
  404. update-alternatives: using /usr/share/docutils/scripts/python3/rst2latex to provide /usr/bin/rst2latex (rst2latex) in auto mode
  405.  
  406. update-alternatives: using /usr/share/docutils/scripts/python3/rst2man to provide /usr/bin/rst2man (rst2man) in auto mode
  407.  
  408. update-alternatives: using /usr/share/docutils/scripts/python3/rst2odt to provide /usr/bin/rst2odt (rst2odt) in auto mode
  409.  
  410. Setting up awscli (1.18.69-1ubuntu0.20.04.1) ...
  411.  
  412. update-alternatives: using /usr/share/docutils/scripts/python3/rst2odt_prepstyles to provide /usr/bin/rst2odt_prepstyles (rst2odt_prepstyles) in auto mode
  413.  
  414. update-alternatives: using /usr/share/docutils/scripts/python3/rst2pseudoxml to provide /usr/bin/rst2pseudoxml (rst2pseudoxml) in auto mode
  415.  
  416. update-alternatives: using /usr/share/docutils/scripts/python3/rst2s5 to provide /usr/bin/rst2s5 (rst2s5) in auto mode
  417.  
  418. update-alternatives: using /usr/share/docutils/scripts/python3/rst2xetex to provide /usr/bin/rst2xetex (rst2xetex) in auto mode
  419.  
  420. update-alternatives: using /usr/share/docutils/scripts/python3/rst2xml to provide /usr/bin/rst2xml (rst2xml) in auto mode
  421.  
  422. update-alternatives: using /usr/share/docutils/scripts/python3/rstpep2html to provide /usr/bin/rstpep2html (rstpep2html) in auto mode
  423.  
  424. Unable to locate credentials. You can configure credentials by running "aws configure".
  425.  
  426. Error: Cannot perform an interactive login from a non TTY device
  427.  
  428. 2021-05-28 11:42:49,811 INFO log_timer.py:25 -- NodeUpdater: i-0c0fac5b7dd7133a4: Initialization commands failed [LogTimer=63522ms]
  429. 2021-05-28 11:42:49,811 INFO log_timer.py:25 -- NodeUpdater: i-0c0fac5b7dd7133a4: Applied config 15a70e450983425551f140b92c089dc940ec7759 [LogTimer=83210ms]
  430. Setting up python3-botocore (1.16.19+repack-1ubuntu0.20.04.1) ...
  431.  
  432. Setting up python3-s3transfer (0.3.3-1) ...
  433.  
  434. Setting up awscli (1.18.69-1ubuntu0.20.04.1) ...
  435.  
  436. 2021-05-28 11:42:50,914 INFO log_timer.py:25 -- AWSNodeProvider: Set tag ray-node-status=update-failed on ['i-0c0fac5b7dd7133a4'] [LogTimer=101ms]
  437. 2021-05-28 11:42:50,914 ERR updater.py:132 -- New status: update-failed
  438. 2021-05-28 11:42:50,914 ERR updater.py:134 -- !!!
  439. 2021-05-28 11:42:50,915 VERR updater.py:140 -- {'message': 'SSH command failed.'}
  440. 2021-05-28 11:42:50,915 ERR updater.py:142 -- SSH command failed.
  441. 2021-05-28 11:42:50,915 ERR updater.py:144 -- !!!
  442. 2021-05-28 11:42:51,711 VINFO command_runner.py:509 -- Running `export RAY_HEAD_IP=172.31.23.49; aws ecr get-login-password --region eu-central-1 | docker login --username AWS --password-stdin 214830741341.dkr.ecr.eu-central-1.amazonaws.com`
  443. 2021-05-28 11:42:51,711 VVINFO command_runner.py:511 -- Full command is `ssh -tt -i ~/ray_bootstrap_key.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ConnectTimeout=120s [email protected] bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (export RAY_HEAD_IP=172.31.23.49; aws ecr get-login-password --region eu-central-1 | docker login --username AWS --password-stdin 214830741341.dkr.ecr.eu-central-1.amazonaws.com)'`
  444. 2021-05-28 11:42:51,956 INFO node_provider.py:462 -- Terminating instances i-0c0fac5b7dd7133a4 (cannot stop spot instances, only terminate)
  445. 2021-05-28 11:42:52,409 VINFO command_runner.py:509 -- Running `export RAY_HEAD_IP=172.31.23.49; aws ecr get-login-password --region eu-central-1 | docker login --username AWS --password-stdin 214830741341.dkr.ecr.eu-central-1.amazonaws.com`
  446. 2021-05-28 11:42:52,409 VVINFO command_runner.py:511 -- Full command is `ssh -tt -i ~/ray_bootstrap_key.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ConnectTimeout=120s [email protected] bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (export RAY_HEAD_IP=172.31.23.49; aws ecr get-login-password --region eu-central-1 | docker login --username AWS --password-stdin 214830741341.dkr.ecr.eu-central-1.amazonaws.com)'`
  447. Unable to locate credentials. You can configure credentials by running "aws configure".
  448.  
  449. Error: Cannot perform an interactive login from a non TTY device
  450.  
  451. 2021-05-28 11:42:52,762 INFO log_timer.py:25 -- NodeUpdater: i-01af7ca3cd041f2e0: Initialization commands failed [LogTimer=63262ms]
  452. 2021-05-28 11:42:52,763 INFO log_timer.py:25 -- NodeUpdater: i-01af7ca3cd041f2e0: Applied config 15a70e450983425551f140b92c089dc940ec7759 [LogTimer=86179ms]
  453. Unable to locate credentials. You can configure credentials by running "aws configure".
  454.  
  455. Error: Cannot perform an interactive login from a non TTY device
  456.  
  457. 2021-05-28 11:42:53,487 INFO log_timer.py:25 -- NodeUpdater: i-0cddf93ff7d9428df: Initialization commands failed [LogTimer=63983ms]
  458. 2021-05-28 11:42:53,488 INFO log_timer.py:25 -- NodeUpdater: i-0cddf93ff7d9428df: Applied config 15a70e450983425551f140b92c089dc940ec7759 [LogTimer=86893ms]
  459. 2021-05-28 11:42:53,877 INFO log_timer.py:25 -- AWSNodeProvider: Set tag ray-node-status=update-failed on ['i-01af7ca3cd041f2e0', 'i-0cddf93ff7d9428df'] [LogTimer=113ms]
  460. 2021-05-28 11:42:53,878 ERR updater.py:132 -- New status: update-failed
  461. 2021-05-28 11:42:53,878 ERR updater.py:134 -- !!!
  462. 2021-05-28 11:42:53,878 VERR updater.py:140 -- {'message': 'SSH command failed.'}
  463. 2021-05-28 11:42:53,878 ERR updater.py:142 -- SSH command failed.
  464. 2021-05-28 11:42:53,878 ERR updater.py:144 -- !!!
  465. 2021-05-28 11:42:53,877 ERR updater.py:132 -- New status: update-failed
  466. 2021-05-28 11:42:53,879 ERR updater.py:134 -- !!!
  467. 2021-05-28 11:42:53,879 VERR updater.py:140 -- {'message': 'SSH command failed.'}
  468. 2021-05-28 11:42:53,879 ERR updater.py:142 -- SSH command failed.
  469. 2021-05-28 11:42:53,879 ERR updater.py:144 -- !!!
  470. 2021-05-28 11:42:57,813 INFO node_provider.py:462 -- Terminating instances i-01af7ca3cd041f2e0, i-0cddf93ff7d9428df (cannot stop spot instances, only terminate)
  471. 2021-05-28 11:42:59,363 INFO node_provider.py:376 -- Launched 1 nodes [subnet_id=subnet-e04e7daa]
  472. 2021-05-28 11:42:59,364 INFO node_provider.py:393 -- Launched instance i-0e071357c46946603 [state=pending, info=pending]
  473.  
  474. ==> /tmp/ray/session_latest/logs/monitor.log <==
  475. 2021-05-28 11:43:03,584 INFO autoscaler.py:705 -- StandardAutoscaler: Queue 2 new nodes for launch
  476. 2021-05-28 11:43:03,587 INFO node_launcher.py:78 -- NodeLauncher0: Got 2 nodes to launch.
  477. 2021-05-28 11:43:03,653 INFO node_launcher.py:78 -- NodeLauncher0: Launching 2 nodes, type ray.worker.default.
  478. 2021-05-28 11:43:03,656 INFO autoscaler.py:659 -- Creating new (spawn_updater) updater thread for node i-0e071357c46946603.
  479. 2021-05-28 11:43:03,730 INFO autoscaler.py:309 --
  480. ======== Autoscaler status: 2021-05-28 11:43:03.730037 ========
  481. Node status
  482. ---------------------------------------------------------------
  483. Healthy:
  484. 1 ray.head.default
  485. Pending:
  486. ray.worker.default, 2 launching
  487. 172.31.6.62: ray.head.default, uninitialized
  488. 172.31.44.105: ray.worker.default, waiting-for-ssh
  489. Recent failures:
  490. 172.31.7.210: ray.worker.default
  491. 172.31.3.75: ray.worker.default
  492. 172.31.5.6: ray.worker.default
  493. 172.31.19.55: ray.worker.default
  494. 172.31.24.19: ray.worker.default
  495. 172.31.17.106: ray.worker.default
  496. 172.31.32.125: ray.worker.default
  497. 172.31.39.243: ray.worker.default
  498. 172.31.37.219: ray.worker.default
  499.  
  500. Resources
  501. ---------------------------------------------------------------
  502.  
  503. Usage:
  504. 2.0/2.0 CPU
  505. 0.00/4.518 GiB memory
  506. 0.00/2.259 GiB object_store_memory
  507.  
  508. Demands:
  509. {'CPU': 1.0}: 20+ pending tasks/actors
  510. 2021-05-28 11:43:03,802 INFO monitor.py:192 -- :event_summary:Adding 2 nodes of type ray.worker.default.
  511.  
  512. ==> /tmp/ray/session_latest/logs/monitor.out <==
  513. 2021-05-28 11:43:04,778 INFO log_timer.py:25 -- AWSNodeProvider: Set tag ray-node-status=waiting-for-ssh on ['i-0e071357c46946603'] [LogTimer=110ms]
  514. 2021-05-28 11:43:04,779 INFO updater.py:286 -- New status: waiting-for-ssh
  515. 2021-05-28 11:43:04,779 INFO updater.py:232 -- [1/7] Waiting for SSH to become available
  516. 2021-05-28 11:43:04,779 INFO updater.py:237 -- Running `uptime` as a test.
  517. 2021-05-28 11:43:04,779 INFO command_runner.py:357 -- Fetched IP: 172.31.44.105
  518. 2021-05-28 11:43:04,779 INFO log_timer.py:25 -- NodeUpdater: i-0e071357c46946603: Got IP [LogTimer=0ms]
  519. 2021-05-28 11:43:04,779 VINFO command_runner.py:509 -- Running `uptime`
  520. 2021-05-28 11:43:04,779 VVINFO command_runner.py:511 -- Full command is `ssh -tt -i ~/ray_bootstrap_key.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_1d41c853af/ddec11ab83/%C -o ControlPersist=10s -o ConnectTimeout=5s [email protected] bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'`
  521. 2021-05-28 11:43:05,273 INFO node_provider.py:376 -- Launched 2 nodes [subnet_id=subnet-1733896c]
  522. 2021-05-28 11:43:05,273 INFO node_provider.py:393 -- Launched instance i-06b78fab1fb000a25 [state=pending, info=pending]
  523. 2021-05-28 11:43:05,273 INFO node_provider.py:393 -- Launched instance i-0e068a78413d29afc [state=pending, info=pending]
  524.  
  525. ==> /tmp/ray/session_latest/logs/monitor.log <==
  526. 2021-05-28 11:43:09,338 INFO autoscaler.py:659 -- Creating new (spawn_updater) updater thread for node i-0e068a78413d29afc.
  527. 2021-05-28 11:43:09,342 INFO autoscaler.py:659 -- Creating new (spawn_updater) updater thread for node i-06b78fab1fb000a25.
  528. 2021-05-28 11:43:09,416 INFO autoscaler.py:309 --
  529. ======== Autoscaler status: 2021-05-28 11:43:09.416880 ========
  530. Node status
  531. ---------------------------------------------------------------
  532. Healthy:
  533. 1 ray.head.default
  534. Pending:
  535. 172.31.6.62: ray.head.default, uninitialized
  536. 172.31.27.253: ray.worker.default, waiting-for-ssh
  537. 172.31.23.115: ray.worker.default, waiting-for-ssh
  538. 172.31.44.105: ray.worker.default, waiting-for-ssh
  539. Recent failures:
  540. 172.31.7.210: ray.worker.default
  541. 172.31.3.75: ray.worker.default
  542. 172.31.5.6: ray.worker.default
  543. 172.31.19.55: ray.worker.default
  544. 172.31.24.19: ray.worker.default
  545. 172.31.17.106: ray.worker.default
  546. 172.31.32.125: ray.worker.default
  547. 172.31.39.243: ray.worker.default
  548. 172.31.37.219: ray.worker.default
  549.  
  550. Resources
  551. ---------------------------------------------------------------
  552.  
  553. Usage:
  554. 2.0/2.0 CPU
  555. 0.00/4.518 GiB memory
  556. 0.00/2.259 GiB object_store_memory
  557.  
  558. Demands:
  559. {'CPU': 1.0}: 20+ pending tasks/actors
  560.  
  561. ==> /tmp/ray/session_latest/logs/monitor.err <==
  562. ssh: connect to host 172.31.44.105 port 22: Connection timed out
  563.  
  564.  
  565. ==> /tmp/ray/session_latest/logs/monitor.out <==
  566. 2021-05-28 11:43:09,805 INFO updater.py:274 -- SSH still not available (SSH command failed.), retrying in 5 seconds.
  567. 2021-05-28 11:43:10,503 INFO log_timer.py:25 -- AWSNodeProvider: Set tag ray-node-status=waiting-for-ssh on ['i-0e068a78413d29afc', 'i-06b78fab1fb000a25'] [LogTimer=163ms]
  568. 2021-05-28 11:43:10,504 INFO updater.py:286 -- New status: waiting-for-ssh2021-05-28 11:43:10,504 INFO updater.py:286 -- New status: waiting-for-ssh
  569.  
  570. 2021-05-28 11:43:10,504 INFO updater.py:232 -- [1/7] Waiting for SSH to become available2021-05-28 11:43:10,504 INFO updater.py:232 -- [1/7] Waiting for SSH to become available
  571.  
  572. 2021-05-28 11:43:10,504 INFO updater.py:237 -- Running `uptime` as a test.2021-05-28 11:43:10,504 INFO updater.py:237 -- Running `uptime` as a test.
  573.  
  574. 2021-05-28 11:43:10,505 INFO command_runner.py:357 -- Fetched IP: 172.31.27.2532021-05-28 11:43:10,505 INFO command_runner.py:357 -- Fetched IP: 172.31.23.115
  575.  
  576. 2021-05-28 11:43:10,505 INFO log_timer.py:25 -- NodeUpdater: i-0e068a78413d29afc: Got IP [LogTimer=0ms]2021-05-28 11:43:10,505 INFO log_timer.py:25 -- NodeUpdater: i-06b78fab1fb000a25: Got IP [LogTimer=0ms]
  577.  
  578. 2021-05-28 11:43:10,505 VINFO command_runner.py:509 -- Running `uptime`2021-05-28 11:43:10,505 VINFO command_runner.py:509 -- Running `uptime`
  579.  
  580. 2021-05-28 11:43:10,506 VVINFO command_runner.py:511 -- Full command is `ssh -tt -i ~/ray_bootstrap_key.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_1d41c853af/ddec11ab83/%C -o ControlPersist=10s -o ConnectTimeout=5s [email protected] bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'`2021-05-28 11:43:10,506 VVINFO command_runner.py:511 -- Full command is `ssh -tt -i ~/ray_bootstrap_key.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_1d41c853af/ddec11ab83/%C -o ControlPersist=10s -o ConnectTimeout=5s [email protected] bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'`
  581.  
  582.  
  583. ==> /tmp/ray/session_latest/logs/monitor.err <==
  584. ssh: connect to host 172.31.27.253 port 22: Connection timed out
  585.  
  586.  
  587. ==> /tmp/ray/session_latest/logs/monitor.log <==
  588. 2021-05-28 11:43:15,258 INFO autoscaler.py:309 --
  589. ======== Autoscaler status: 2021-05-28 11:43:15.258314 ========
  590. Node status
  591. ---------------------------------------------------------------
  592. Healthy:
  593. 1 ray.head.default
  594. Pending:
  595. 172.31.6.62: ray.head.default, uninitialized
  596. 172.31.27.253: ray.worker.default, waiting-for-ssh
  597. 172.31.23.115: ray.worker.default, waiting-for-ssh
  598. 172.31.44.105: ray.worker.default, waiting-for-ssh
  599. Recent failures:
  600. 172.31.7.210: ray.worker.default
  601. 172.31.3.75: ray.worker.default
  602. 172.31.5.6: ray.worker.default
  603. 172.31.19.55: ray.worker.default
  604. 172.31.24.19: ray.worker.default
  605. 172.31.17.106: ray.worker.default
  606. 172.31.32.125: ray.worker.default
  607. 172.31.39.243: ray.worker.default
  608. 172.31.37.219: ray.worker.default
  609.  
  610. Resources
  611. ---------------------------------------------------------------
  612.  
  613. Usage:
  614. 2.0/2.0 CPU
  615. 0.00/4.518 GiB memory
  616. 0.00/2.259 GiB object_store_memory
  617.  
  618. Demands:
  619. {'CPU': 1.0}: 20+ pending tasks/actors
  620.  
  621. ==> /tmp/ray/session_latest/logs/monitor.out <==
  622. 2021-05-28 11:43:14,810 VINFO command_runner.py:509 -- Running `uptime`
  623. 2021-05-28 11:43:14,811 VVINFO command_runner.py:511 -- Full command is `ssh -tt -i ~/ray_bootstrap_key.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_1d41c853af/ddec11ab83/%C -o ControlPersist=10s -o ConnectTimeout=5s [email protected] bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'`
  624. 2021-05-28 11:43:15,531 INFO updater.py:274 -- SSH still not available (SSH command failed.), retrying in 5 seconds.
  625.  
  626. ==> /tmp/ray/session_latest/logs/monitor.err <==
  627. ssh: connect to host 172.31.23.115 port 22: Connection timed out
  628.  
  629.  
  630. ==> /tmp/ray/session_latest/logs/monitor.out <==
  631. 2021-05-28 11:43:15,543 INFO updater.py:274 -- SSH still not available (SSH command failed.), retrying in 5 seconds.
  632.  
  633. ==> /tmp/ray/session_latest/logs/monitor.err <==
  634. ssh: connect to host 172.31.44.105 port 22: Connection refused
  635.  
  636.  
  637. ==> /tmp/ray/session_latest/logs/monitor.out <==
  638. 2021-05-28 11:43:17,888 INFO updater.py:274 -- SSH still not available (SSH command failed.), retrying in 5 seconds.
  639. 2021-05-28 11:43:20,538 VINFO command_runner.py:509 -- Running `uptime`
  640. 2021-05-28 11:43:20,538 VVINFO command_runner.py:511 -- Full command is `ssh -tt -i ~/ray_bootstrap_key.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_1d41c853af/ddec11ab83/%C -o ControlPersist=10s -o ConnectTimeout=5s [email protected] bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'`^C
  641. Shared connection to 3.120.108.100 closed.
  642. Error: Command failed:
  643.  
  644. ssh -tt -i /Users/mlubej/.ssh/ray-autoscaler_1_eu-central-1.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_570a62982a/ddec11ab83/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (docker exec -it ray_container /bin/bash -c '"'"'bash --login -c -i '"'"'"'"'"'"'"'"'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (tail -n 100 -f /tmp/ray/session_latest/logs/monitor*)'"'"'"'"'"'"'"'"''"'"' )'
  645.  
  646. Loaded cached provider configuration
  647. If you experience issues with the cloud provider, try re-running the command with --no-config-cache.
  648. Fetched IP: 3.120.108.100
  649.  
Advertisement
Add Comment
Please, Sign In to add comment