Advertisement
Guest User

out.log

a guest
Nov 24th, 2021
57
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. [0]<stdout>:Running NICE version 3.16.1
  2. [2]<stdout>:Running NICE version 3.16.1
  3. [1]<stdout>:Running NICE version 3.16.1
  4. [3]<stdout>:Running NICE version 3.16.1
  5. [2]<stdout>:[2021-11-24 11:35:50.627693: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/utils/env_parser.cc:107] Using GLOO to perform controller operations.
  6. [2]<stdout>:[2021-11-24 11:35:50.627724: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/utils/env_parser.cc:73] Using GLOO to perform CPU operations.
  7. [2]<stdout>:[2021-11-24 11:35:50.627732: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.h:64] Gloo context enabled.
  8. [0]<stdout>:[2021-11-24 11:35:50.629344: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/utils/env_parser.cc:107] Using GLOO to perform controller operations.
  9. [0]<stdout>:[2021-11-24 11:35:50.629380: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/utils/env_parser.cc:73] Using GLOO to perform CPU operations.
  10. [0]<stdout>:[2021-11-24 11:35:50.629388: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.h:64] Gloo context enabled.
  11. [1]<stdout>:[2021-11-24 11:35:50.636743: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/utils/env_parser.cc:107] Using GLOO to perform controller operations.
  12. [1]<stdout>:[2021-11-24 11:35:50.636771: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/utils/env_parser.cc:73] Using GLOO to perform CPU operations.
  13. [1]<stdout>:[2021-11-24 11:35:50.636779: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.h:64] Gloo context enabled.
  14. [0]<stdout>:[2021-11-24 11:35:50.637571: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:163] rendezvous server address: 127.0.0.1
  15. [0]<stdout>:[2021-11-24 11:35:50.637618: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] global rendezvous started for rank=0, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  16. [3]<stdout>:[2021-11-24 11:35:50.638981: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/utils/env_parser.cc:107] Using GLOO to perform controller operations.
  17. [3]<stdout>:[2021-11-24 11:35:50.639012: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/utils/env_parser.cc:73] Using GLOO to perform CPU operations.
  18. [3]<stdout>:[2021-11-24 11:35:50.639020: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.h:64] Gloo context enabled.
  19. [2]<stdout>:[2021-11-24 11:35:50.641513: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:163] rendezvous server address: 127.0.0.1
  20. [2]<stdout>:[2021-11-24 11:35:50.641553: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] global rendezvous started for rank=2, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  21. [1]<stdout>:[2021-11-24 11:35:50.650301: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:163] rendezvous server address: 127.0.0.1
  22. [1]<stdout>:[2021-11-24 11:35:50.650333: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] global rendezvous started for rank=1, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  23. [3]<stdout>:[2021-11-24 11:35:50.652021: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:163] rendezvous server address: 127.0.0.1
  24. [3]<stdout>:[2021-11-24 11:35:50.652053: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] global rendezvous started for rank=3, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  25. [0]<stdout>:[2021-11-24 11:35:50.694936: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:219] Global Gloo context initialized.
  26. [3]<stdout>:[2021-11-24 11:35:50.695686: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:219] Global Gloo context initialized.
  27. [0]<stdout>:[2021-11-24 11:35:50.695029: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] local_localhost rendezvous started for rank=0, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  28. [3]<stdout>:[2021-11-24 11:35:50.695771: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] local_localhost rendezvous started for rank=3, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  29. [1]<stdout>:[2021-11-24 11:35:50.697220: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:219] Global Gloo context initialized.
  30. [2]<stdout>:[2021-11-24 11:35:50.694280: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:219] Global Gloo context initialized.
  31. [1]<stdout>:[2021-11-24 11:35:50.697304: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] local_localhost rendezvous started for rank=1, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  32. [2]<stdout>:[2021-11-24 11:35:50.694382: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] local_localhost rendezvous started for rank=2, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  33. [0]<stdout>:[2021-11-24 11:35:50.756601: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:226] Local Gloo context initialized.
  34. [2]<stdout>:[2021-11-24 11:35:50.757229: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:226] Local Gloo context initialized.
  35. [0]<stdout>:[2021-11-24 11:35:50.756675: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] cross_0 rendezvous started for rank=0, size=1, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  36. [1]<stdout>:[2021-11-24 11:35:50.758288: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:226] Local Gloo context initialized.
  37. [2]<stdout>:[2021-11-24 11:35:50.757309: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] cross_2 rendezvous started for rank=0, size=1, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  38. [1]<stdout>:[2021-11-24 11:35:50.758362: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] cross_1 rendezvous started for rank=0, size=1, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  39. [3]<stdout>:[2021-11-24 11:35:50.759489: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:226] Local Gloo context initialized.
  40. [3]<stdout>:[2021-11-24 11:35:50.759554: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] cross_3 rendezvous started for rank=0, size=1, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  41. [0]<stdout>:[2021-11-24 11:35:50.766677: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:231] Cross-node Gloo context initialized.
  42. [0]<stdout>:[2021-11-24 11:35:50.766742: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:276] rendezvous server address: 127.0.0.1
  43. [0]<stdout>:[2021-11-24 11:35:50.767786: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:306] Initializing GlooContext for process set: [0,1,2,3,], hash: 1e9d548172549547
  44. [2]<stdout>:[2021-11-24 11:35:50.767701: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:231] Cross-node Gloo context initialized.
  45. [2]<stdout>:[2021-11-24 11:35:50.767765: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:276] rendezvous server address: 127.0.0.1
  46. [2]<stdout>:[2021-11-24 11:35:50.769017: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:306] Initializing GlooContext for process set: [0,1,2,3,], hash: 1e9d548172549547
  47. [3]<stdout>:[2021-11-24 11:35:50.769256: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:231] Cross-node Gloo context initialized.
  48. [3]<stdout>:[2021-11-24 11:35:50.769326: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:276] rendezvous server address: 127.0.0.1
  49. [1]<stdout>:[2021-11-24 11:35:50.769327: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:231] Cross-node Gloo context initialized.
  50. [1]<stdout>:[2021-11-24 11:35:50.769392: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:276] rendezvous server address: 127.0.0.1
  51. [1]<stdout>:[2021-11-24 11:35:50.770957: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:306] Initializing GlooContext for process set: [0,1,2,3,], hash: 1e9d548172549547
  52. [3]<stdout>:[2021-11-24 11:35:50.770975: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:306] Initializing GlooContext for process set: [0,1,2,3,], hash: 1e9d548172549547
  53. [1]<stdout>:[2021-11-24 11:35:50.771670: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:327] Global Gloo context for process set with rank: 1, size: 4
  54. [0]<stdout>:[2021-11-24 11:35:50.771607: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:327] Global Gloo context for process set with rank: 0, size: 4
  55. [2]<stdout>:[2021-11-24 11:35:50.771682: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:327] Global Gloo context for process set with rank: 2, size: 4
  56. [3]<stdout>:[2021-11-24 11:35:50.771705: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:327] Global Gloo context for process set with rank: 3, size: 4
  57. [1]<stdout>:[2021-11-24 11:35:50.771746: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] global_process_set_hash_1e9d548172549547 rendezvous started for rank=1, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  58. [0]<stdout>:[2021-11-24 11:35:50.771679: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] global_process_set_hash_1e9d548172549547 rendezvous started for rank=0, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  59. [2]<stdout>:[2021-11-24 11:35:50.771747: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] global_process_set_hash_1e9d548172549547 rendezvous started for rank=2, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  60. [3]<stdout>:[2021-11-24 11:35:50.771788: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] global_process_set_hash_1e9d548172549547 rendezvous started for rank=3, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  61. [0]<stdout>:[2021-11-24 11:35:50.825247: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:336] Global Gloo context initialized for process set with hash 1e9d548172549547.
  62. [0]<stdout>:[2021-11-24 11:35:50.825308: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:350] Local Gloo context for process set with rank: 0, size: 4
  63. [2]<stdout>:[2021-11-24 11:35:50.826045: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:336] Global Gloo context initialized for process set with hash 1e9d548172549547.
  64. [2]<stdout>:[2021-11-24 11:35:50.826106: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:350] Local Gloo context for process set with rank: 2, size: 4
  65. [2]<stdout>:[2021-11-24 11:35:50.826167: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] local_localhost_process_set_hash_1e9d548172549547 rendezvous started for rank=2, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  66. [1]<stdout>:[2021-11-24 11:35:50.828122: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:336] Global Gloo context initialized for process set with hash 1e9d548172549547.
  67. [3]<stdout>:[2021-11-24 11:35:50.827199: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:336] Global Gloo context initialized for process set with hash 1e9d548172549547.
  68. [0]<stdout>:[2021-11-24 11:35:50.825350: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] local_localhost_process_set_hash_1e9d548172549547 rendezvous started for rank=0, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  69. [1]<stdout>:[2021-11-24 11:35:50.828181: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:350] Local Gloo context for process set with rank: 1, size: 4
  70. [3]<stdout>:[2021-11-24 11:35:50.827270: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:350] Local Gloo context for process set with rank: 3, size: 4
  71. [1]<stdout>:[2021-11-24 11:35:50.828224: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] local_localhost_process_set_hash_1e9d548172549547 rendezvous started for rank=1, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  72. [3]<stdout>:[2021-11-24 11:35:50.827330: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] local_localhost_process_set_hash_1e9d548172549547 rendezvous started for rank=3, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  73. [2]<stdout>:[2021-11-24 11:35:50.897175: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:357] Local Gloo context initialized for process set with hash 1e9d548172549547.
  74. [2]<stdout>:[2021-11-24 11:35:50.897237: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:372] Cross Gloo context for process set with rank: 0, size: 1
  75. [2]<stdout>:[2021-11-24 11:35:50.897285: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] cross_2_process_set_hash_1e9d548172549547 rendezvous started for rank=0, size=1, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  76. [0]<stdout>:[2021-11-24 11:35:50.898058: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:357] Local Gloo context initialized for process set with hash 1e9d548172549547.
  77. [0]<stdout>:[2021-11-24 11:35:50.898112: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:372] Cross Gloo context for process set with rank: 0, size: 1
  78. [0]<stdout>:[2021-11-24 11:35:50.898152: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] cross_0_process_set_hash_1e9d548172549547 rendezvous started for rank=0, size=1, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  79. [1]<stdout>:[2021-11-24 11:35:50.898921: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:357] Local Gloo context initialized for process set with hash 1e9d548172549547.
  80. [1]<stdout>:[2021-11-24 11:35:50.898997: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:372] Cross Gloo context for process set with rank: 0, size: 1
  81. [1]<stdout>:[2021-11-24 11:35:50.899046: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] cross_1_process_set_hash_1e9d548172549547 rendezvous started for rank=0, size=1, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  82. [3]<stdout>:[2021-11-24 11:35:50.900167: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:357] Local Gloo context initialized for process set with hash 1e9d548172549547.
  83. [3]<stdout>:[2021-11-24 11:35:50.900228: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:372] Cross Gloo context for process set with rank: 0, size: 1
  84. [3]<stdout>:[2021-11-24 11:35:50.900273: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] cross_3_process_set_hash_1e9d548172549547 rendezvous started for rank=0, size=1, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
  85. [0]<stdout>:[2021-11-24 11:35:50.907806: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:379] Cross-node Gloo context for process set with hash 1e9d548172549547.
  86. [0]<stdout>:[2021-11-24 11:35:50.907870: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_controller.cc:40] Started Horovod with 4 processes
  87. [2]<stdout>:[2021-11-24 11:35:50.908552: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:379] Cross-node Gloo context for process set with hash 1e9d548172549547.
  88. [1]<stdout>:[2021-11-24 11:35:50.909282: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:379] Cross-node Gloo context for process set with hash 1e9d548172549547.
  89. [3]<stdout>:[2021-11-24 11:35:50.910208: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:379] Cross-node Gloo context for process set with hash 1e9d548172549547.
  90. [1]<stdout>:[2021-11-24 11:35:50.911948: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_controller.cc:104] Gloo controller initialized.
  91. [3]<stdout>:[2021-11-24 11:35:50.911934: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_controller.cc:104] Gloo controller initialized.
  92. [0]<stdout>:[2021-11-24 11:35:50.911887: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_controller.cc:104] Gloo controller initialized.
  93. [2]<stdout>:[2021-11-24 11:35:50.911954: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_controller.cc:104] Gloo controller initialized.
  94. [1]<stdout>:[2021-11-24 11:35:50.912755: I /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:642] [1]: Horovod initialized
  95. [0]<stdout>:[2021-11-24 11:35:50.912735: I /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:642] [0]: Horovod initialized
  96. [2]<stdout>:[2021-11-24 11:35:50.912761: I /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:642] [2]: Horovod initialized
  97. [3]<stdout>:[2021-11-24 11:35:50.912732: I /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:642] [3]: Horovod initialized
  98. [2]<stdout>:[2021-11-24 11:35:50.913147: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:856] Background thread init done
  99. [1]<stdout>:[2021-11-24 11:35:50.913418: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:856] Background thread init done
  100. [2]<stdout>:Horovod local rank : 2
  101. [1]<stdout>:Horovod local rank : 1
  102. [2]<stdout>:Is master : False
  103. [3]<stdout>:[2021-11-24 11:35:50.913718: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:856] Background thread init done
  104. [1]<stdout>:Is master : False
  105. [0]<stdout>:[2021-11-24 11:35:50.913793: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:856] Background thread init done
  106. [2]<stdout>:Setting gpu visibility PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU')
  107. [3]<stdout>:Horovod local rank : 3
  108. [1]<stdout>:Setting gpu visibility PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')
  109. [0]<stdout>:Horovod local rank : 0
  110. [2]<stdout>:Training directory : /opt/mt/local_data/gen_enfr
  111. [3]<stdout>:Is master : False
  112. [1]<stdout>:Training directory : /opt/mt/local_data/gen_enfr
  113. [3]<stdout>:Setting gpu visibility PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')
  114. [0]<stdout>:Is master : True
  115. [3]<stdout>:Training directory : /opt/mt/local_data/gen_enfr
  116. [0]<stdout>:Setting gpu visibility PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
  117. [0]<stdout>:Training directory : /opt/mt/local_data/gen_enfr
  118. [3]<stdout>:INFO Create configuration files started...
  119. [3]<stdout>:DEBUG Training properties file: gen_enfr/training_properties.json already exists.
  120. [3]<stdout>:DEBUG Creating OpenNMT config file: gen_enfr/onmt-config.yml
  121. [0]<stdout>:INFO 'Create configuration files' finished. (< 1s)
  122. [2]<stdout>:INFO 'Create configuration files' finished. (< 1s)
  123. [1]<stdout>:INFO 'Create configuration files' finished. (< 1s)
  124. [3]<stdout>:INFO 'Create configuration files' finished. (< 1s)
  125. [3]<stdout>:INFO Training started...
  126. [3]<stdout>:INFO Training model at dir gen_enfr
  127. [3]<stdout>:DEBUG Initializing summary file: gen_enfr/summary.json
  128. [3]<stdout>:DEBUG Reading ONMT config from file: gen_enfr/onmt-config.yml
  129. [2]<stdout>:INFO Training started...
  130. [2]<stdout>:INFO Training model at dir gen_enfr
  131. [2]<stdout>:DEBUG Initializing summary file: gen_enfr/summary.json
  132. [2]<stdout>:DEBUG Reading ONMT config from file: gen_enfr/onmt-config.yml
  133. [1]<stdout>:INFO Training started...
  134. [1]<stdout>:INFO Training model at dir gen_enfr
  135. [1]<stdout>:DEBUG Initializing summary file: gen_enfr/summary.json
  136. [0]<stdout>:INFO Training started...
  137. [0]<stdout>:INFO Training model at dir gen_enfr
  138. [0]<stdout>:DEBUG Initializing summary file: gen_enfr/summary.json
  139. [1]<stdout>:DEBUG Reading ONMT config from file: gen_enfr/onmt-config.yml
  140. [0]<stdout>:DEBUG Reading ONMT config from file: gen_enfr/onmt-config.yml
  141. [3]<stdout>:INFO Training with 1 devices.
  142. [3]<stdout>:ERROR Traceback (most recent call last):
  143. [3]<stdout>:  File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/cdtnice/common/pipeline.py", line 30, in wrapped_f
  144. [3]<stdout>:    function_return_value = f(*args, **kwargs)
  145. [3]<stdout>:  File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/cdtnice/train/train.py", line 48, in run
  146. [3]<stdout>:    final_model_dir, train_summary = runner.train(
  147. [3]<stdout>:  File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/opennmt/runner.py", line 199, in train
  148. [3]<stdout>:    devices = misc.get_devices(count=num_devices, fallback_to_cpu=fallback_to_cpu)
  149. [3]<stdout>:  File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/opennmt/utils/misc.py", line 33, in get_devices
  150. [3]<stdout>:    devices = tf.config.list_logical_devices(device_type=device_type)
  151. [3]<stdout>:  File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/tensorflow/python/framework/config.py", line 452, in list_logical_devices
  152. [3]<stdout>:    return context.context().list_logical_devices(device_type=device_type)
  153. [3]<stdout>:  File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/tensorflow/python/eager/context.py", line 1395, in list_logical_devices
  154. [3]<stdout>:    self.ensure_initialized()
  155. [3]<stdout>:  File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/tensorflow/python/eager/context.py", line 525, in ensure_initialized
  156. [3]<stdout>:    context_handle = pywrap_tfe.TFE_NewContext(opts)
  157. [3]<stdout>:tensorflow.python.framework.errors_impl.AlreadyExistsError: TensorFlow device (GPU:0) is being mapped to multiple devices (3 now, and 0 previously), which is not supported. This may be the result of providing different GPU configurations (ConfigProto.gpu_options, for example different visible_device_list) when creating multiple Sessions in the same process. This is not currently supported, see https://github.com/tensorflow/tensorflow/issues/19083
  158. [3]<stdout>:
  159. [0]<stdout>:[2021-11-24 11:35:54.799402: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:659] [0]: Shutting down background thread
  160. [3]<stdout>:[2021-11-24 11:35:54.799456: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:659] [3]: Shutting down background thread
  161. [2]<stdout>:[2021-11-24 11:35:54.799503: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:659] [2]: Shutting down background thread
  162. [1]<stdout>:[2021-11-24 11:35:54.799520: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:659] [1]: Shutting down background thread
  163. [2]<stdout>:INFO Training with 1 devices.
  164. [2]<stdout>:ERROR Traceback (most recent call last):
  165. [2]<stdout>:  File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/cdtnice/common/pipeline.py", line 30, in wrapped_f
  166. [2]<stdout>:    function_return_value = f(*args, **kwargs)
  167. [2]<stdout>:  File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/cdtnice/train/train.py", line 48, in run
  168. [2]<stdout>:    final_model_dir, train_summary = runner.train(
  169. [2]<stdout>:  File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/opennmt/runner.py", line 199, in train
  170. [2]<stdout>:    devices = misc.get_devices(count=num_devices, fallback_to_cpu=fallback_to_cpu)
  171. [2]<stdout>:  File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/opennmt/utils/misc.py", line 33, in get_devices
  172. [2]<stdout>:    devices = tf.config.list_logical_devices(device_type=device_type)
  173. [2]<stdout>:  File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/tensorflow/python/framework/config.py", line 452, in list_logical_devices
  174. [2]<stdout>:    return context.context().list_logical_devices(device_type=device_type)
  175. [2]<stdout>:  File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/tensorflow/python/eager/context.py", line 1395, in list_logical_devices
  176. [2]<stdout>:    self.ensure_initialized()
  177. [2]<stdout>:  File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/tensorflow/python/eager/context.py", line 525, in ensure_initialized
  178. [2]<stdout>:    context_handle = pywrap_tfe.TFE_NewContext(opts)
  179. [2]<stdout>:tensorflow.python.framework.errors_impl.AlreadyExistsError: TensorFlow device (GPU:0) is being mapped to multiple devices (2 now, and 0 previously), which is not supported. This may be the result of providing different GPU configurations (ConfigProto.gpu_options, for example different visible_device_list) when creating multiple Sessions in the same process. This is not currently supported, see https://github.com/tensorflow/tensorflow/issues/19083
  180. [2]<stdout>:
  181. [1]<stdout>:INFO Training with 1 devices.
  182. [0]<stdout>:INFO Training with 1 devices.
  183. [1]<stdout>:ERROR Traceback (most recent call last):
  184. [1]<stdout>:  File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/cdtnice/common/pipeline.py", line 30, in wrapped_f
  185. [1]<stdout>:    function_return_value = f(*args, **kwargs)
  186. [1]<stdout>:  File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/cdtnice/train/train.py", line 48, in run
  187. [1]<stdout>:    final_model_dir, train_summary = runner.train(
  188. [1]<stdout>:  File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/opennmt/runner.py", line 199, in train
  189. [1]<stdout>:    devices = misc.get_devices(count=num_devices, fallback_to_cpu=fallback_to_cpu)
  190. [1]<stdout>:  File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/opennmt/utils/misc.py", line 33, in get_devices
  191. [1]<stdout>:    devices = tf.config.list_logical_devices(device_type=device_type)
  192. [1]<stdout>:  File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/tensorflow/python/framework/config.py", line 452, in list_logical_devices
  193. [1]<stdout>:    return context.context().list_logical_devices(device_type=device_type)
  194. [1]<stdout>:  File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/tensorflow/python/eager/context.py", line 1395, in list_logical_devices
  195. [1]<stdout>:    self.ensure_initialized()
  196. [1]<stdout>:  File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/tensorflow/python/eager/context.py", line 525, in ensure_initialized
  197. [1]<stdout>:    context_handle = pywrap_tfe.TFE_NewContext(opts)
  198. [1]<stdout>:tensorflow.python.framework.errors_impl.AlreadyExistsError: TensorFlow device (GPU:0) is being mapped to multiple devices (1 now, and 0 previously), which is not supported. This may be the result of providing different GPU configurations (ConfigProto.gpu_options, for example different visible_device_list) when creating multiple Sessions in the same process. This is not currently supported, see https://github.com/tensorflow/tensorflow/issues/19083
  199. [1]<stdout>:
  200. Process 2 exit with status code 1.
  201. Terminating remaining workers after failure of Process 2.
  202. Process 1 exit with status code 1.
  203. Process 3 exit with status code 1.
  204. Process 0 exit with status code 143.
  205.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement