Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- [0]<stdout>:Running NICE version 3.16.1
- [2]<stdout>:Running NICE version 3.16.1
- [1]<stdout>:Running NICE version 3.16.1
- [3]<stdout>:Running NICE version 3.16.1
- [2]<stdout>:[2021-11-24 11:35:50.627693: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/utils/env_parser.cc:107] Using GLOO to perform controller operations.
- [2]<stdout>:[2021-11-24 11:35:50.627724: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/utils/env_parser.cc:73] Using GLOO to perform CPU operations.
- [2]<stdout>:[2021-11-24 11:35:50.627732: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.h:64] Gloo context enabled.
- [0]<stdout>:[2021-11-24 11:35:50.629344: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/utils/env_parser.cc:107] Using GLOO to perform controller operations.
- [0]<stdout>:[2021-11-24 11:35:50.629380: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/utils/env_parser.cc:73] Using GLOO to perform CPU operations.
- [0]<stdout>:[2021-11-24 11:35:50.629388: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.h:64] Gloo context enabled.
- [1]<stdout>:[2021-11-24 11:35:50.636743: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/utils/env_parser.cc:107] Using GLOO to perform controller operations.
- [1]<stdout>:[2021-11-24 11:35:50.636771: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/utils/env_parser.cc:73] Using GLOO to perform CPU operations.
- [1]<stdout>:[2021-11-24 11:35:50.636779: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.h:64] Gloo context enabled.
- [0]<stdout>:[2021-11-24 11:35:50.637571: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:163] rendezvous server address: 127.0.0.1
- [0]<stdout>:[2021-11-24 11:35:50.637618: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] global rendezvous started for rank=0, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [3]<stdout>:[2021-11-24 11:35:50.638981: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/utils/env_parser.cc:107] Using GLOO to perform controller operations.
- [3]<stdout>:[2021-11-24 11:35:50.639012: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/utils/env_parser.cc:73] Using GLOO to perform CPU operations.
- [3]<stdout>:[2021-11-24 11:35:50.639020: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.h:64] Gloo context enabled.
- [2]<stdout>:[2021-11-24 11:35:50.641513: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:163] rendezvous server address: 127.0.0.1
- [2]<stdout>:[2021-11-24 11:35:50.641553: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] global rendezvous started for rank=2, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [1]<stdout>:[2021-11-24 11:35:50.650301: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:163] rendezvous server address: 127.0.0.1
- [1]<stdout>:[2021-11-24 11:35:50.650333: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] global rendezvous started for rank=1, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [3]<stdout>:[2021-11-24 11:35:50.652021: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:163] rendezvous server address: 127.0.0.1
- [3]<stdout>:[2021-11-24 11:35:50.652053: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] global rendezvous started for rank=3, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [0]<stdout>:[2021-11-24 11:35:50.694936: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:219] Global Gloo context initialized.
- [3]<stdout>:[2021-11-24 11:35:50.695686: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:219] Global Gloo context initialized.
- [0]<stdout>:[2021-11-24 11:35:50.695029: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] local_localhost rendezvous started for rank=0, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [3]<stdout>:[2021-11-24 11:35:50.695771: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] local_localhost rendezvous started for rank=3, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [1]<stdout>:[2021-11-24 11:35:50.697220: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:219] Global Gloo context initialized.
- [2]<stdout>:[2021-11-24 11:35:50.694280: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:219] Global Gloo context initialized.
- [1]<stdout>:[2021-11-24 11:35:50.697304: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] local_localhost rendezvous started for rank=1, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [2]<stdout>:[2021-11-24 11:35:50.694382: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] local_localhost rendezvous started for rank=2, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [0]<stdout>:[2021-11-24 11:35:50.756601: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:226] Local Gloo context initialized.
- [2]<stdout>:[2021-11-24 11:35:50.757229: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:226] Local Gloo context initialized.
- [0]<stdout>:[2021-11-24 11:35:50.756675: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] cross_0 rendezvous started for rank=0, size=1, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [1]<stdout>:[2021-11-24 11:35:50.758288: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:226] Local Gloo context initialized.
- [2]<stdout>:[2021-11-24 11:35:50.757309: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] cross_2 rendezvous started for rank=0, size=1, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [1]<stdout>:[2021-11-24 11:35:50.758362: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] cross_1 rendezvous started for rank=0, size=1, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [3]<stdout>:[2021-11-24 11:35:50.759489: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:226] Local Gloo context initialized.
- [3]<stdout>:[2021-11-24 11:35:50.759554: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] cross_3 rendezvous started for rank=0, size=1, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [0]<stdout>:[2021-11-24 11:35:50.766677: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:231] Cross-node Gloo context initialized.
- [0]<stdout>:[2021-11-24 11:35:50.766742: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:276] rendezvous server address: 127.0.0.1
- [0]<stdout>:[2021-11-24 11:35:50.767786: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:306] Initializing GlooContext for process set: [0,1,2,3,], hash: 1e9d548172549547
- [2]<stdout>:[2021-11-24 11:35:50.767701: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:231] Cross-node Gloo context initialized.
- [2]<stdout>:[2021-11-24 11:35:50.767765: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:276] rendezvous server address: 127.0.0.1
- [2]<stdout>:[2021-11-24 11:35:50.769017: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:306] Initializing GlooContext for process set: [0,1,2,3,], hash: 1e9d548172549547
- [3]<stdout>:[2021-11-24 11:35:50.769256: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:231] Cross-node Gloo context initialized.
- [3]<stdout>:[2021-11-24 11:35:50.769326: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:276] rendezvous server address: 127.0.0.1
- [1]<stdout>:[2021-11-24 11:35:50.769327: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:231] Cross-node Gloo context initialized.
- [1]<stdout>:[2021-11-24 11:35:50.769392: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:276] rendezvous server address: 127.0.0.1
- [1]<stdout>:[2021-11-24 11:35:50.770957: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:306] Initializing GlooContext for process set: [0,1,2,3,], hash: 1e9d548172549547
- [3]<stdout>:[2021-11-24 11:35:50.770975: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:306] Initializing GlooContext for process set: [0,1,2,3,], hash: 1e9d548172549547
- [1]<stdout>:[2021-11-24 11:35:50.771670: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:327] Global Gloo context for process set with rank: 1, size: 4
- [0]<stdout>:[2021-11-24 11:35:50.771607: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:327] Global Gloo context for process set with rank: 0, size: 4
- [2]<stdout>:[2021-11-24 11:35:50.771682: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:327] Global Gloo context for process set with rank: 2, size: 4
- [3]<stdout>:[2021-11-24 11:35:50.771705: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:327] Global Gloo context for process set with rank: 3, size: 4
- [1]<stdout>:[2021-11-24 11:35:50.771746: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] global_process_set_hash_1e9d548172549547 rendezvous started for rank=1, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [0]<stdout>:[2021-11-24 11:35:50.771679: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] global_process_set_hash_1e9d548172549547 rendezvous started for rank=0, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [2]<stdout>:[2021-11-24 11:35:50.771747: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] global_process_set_hash_1e9d548172549547 rendezvous started for rank=2, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [3]<stdout>:[2021-11-24 11:35:50.771788: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] global_process_set_hash_1e9d548172549547 rendezvous started for rank=3, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [0]<stdout>:[2021-11-24 11:35:50.825247: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:336] Global Gloo context initialized for process set with hash 1e9d548172549547.
- [0]<stdout>:[2021-11-24 11:35:50.825308: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:350] Local Gloo context for process set with rank: 0, size: 4
- [2]<stdout>:[2021-11-24 11:35:50.826045: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:336] Global Gloo context initialized for process set with hash 1e9d548172549547.
- [2]<stdout>:[2021-11-24 11:35:50.826106: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:350] Local Gloo context for process set with rank: 2, size: 4
- [2]<stdout>:[2021-11-24 11:35:50.826167: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] local_localhost_process_set_hash_1e9d548172549547 rendezvous started for rank=2, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [1]<stdout>:[2021-11-24 11:35:50.828122: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:336] Global Gloo context initialized for process set with hash 1e9d548172549547.
- [3]<stdout>:[2021-11-24 11:35:50.827199: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:336] Global Gloo context initialized for process set with hash 1e9d548172549547.
- [0]<stdout>:[2021-11-24 11:35:50.825350: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] local_localhost_process_set_hash_1e9d548172549547 rendezvous started for rank=0, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [1]<stdout>:[2021-11-24 11:35:50.828181: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:350] Local Gloo context for process set with rank: 1, size: 4
- [3]<stdout>:[2021-11-24 11:35:50.827270: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:350] Local Gloo context for process set with rank: 3, size: 4
- [1]<stdout>:[2021-11-24 11:35:50.828224: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] local_localhost_process_set_hash_1e9d548172549547 rendezvous started for rank=1, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [3]<stdout>:[2021-11-24 11:35:50.827330: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] local_localhost_process_set_hash_1e9d548172549547 rendezvous started for rank=3, size=4, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [2]<stdout>:[2021-11-24 11:35:50.897175: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:357] Local Gloo context initialized for process set with hash 1e9d548172549547.
- [2]<stdout>:[2021-11-24 11:35:50.897237: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:372] Cross Gloo context for process set with rank: 0, size: 1
- [2]<stdout>:[2021-11-24 11:35:50.897285: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] cross_2_process_set_hash_1e9d548172549547 rendezvous started for rank=0, size=1, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [0]<stdout>:[2021-11-24 11:35:50.898058: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:357] Local Gloo context initialized for process set with hash 1e9d548172549547.
- [0]<stdout>:[2021-11-24 11:35:50.898112: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:372] Cross Gloo context for process set with rank: 0, size: 1
- [0]<stdout>:[2021-11-24 11:35:50.898152: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] cross_0_process_set_hash_1e9d548172549547 rendezvous started for rank=0, size=1, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [1]<stdout>:[2021-11-24 11:35:50.898921: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:357] Local Gloo context initialized for process set with hash 1e9d548172549547.
- [1]<stdout>:[2021-11-24 11:35:50.898997: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:372] Cross Gloo context for process set with rank: 0, size: 1
- [1]<stdout>:[2021-11-24 11:35:50.899046: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] cross_1_process_set_hash_1e9d548172549547 rendezvous started for rank=0, size=1, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [3]<stdout>:[2021-11-24 11:35:50.900167: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:357] Local Gloo context initialized for process set with hash 1e9d548172549547.
- [3]<stdout>:[2021-11-24 11:35:50.900228: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:372] Cross Gloo context for process set with rank: 0, size: 1
- [3]<stdout>:[2021-11-24 11:35:50.900273: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:79] cross_3_process_set_hash_1e9d548172549547 rendezvous started for rank=0, size=1, dev={tcp, pci=, iface=lo, speed=-1, addr=[127.0.0.1]}, timeout=30
- [0]<stdout>:[2021-11-24 11:35:50.907806: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:379] Cross-node Gloo context for process set with hash 1e9d548172549547.
- [0]<stdout>:[2021-11-24 11:35:50.907870: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_controller.cc:40] Started Horovod with 4 processes
- [2]<stdout>:[2021-11-24 11:35:50.908552: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:379] Cross-node Gloo context for process set with hash 1e9d548172549547.
- [1]<stdout>:[2021-11-24 11:35:50.909282: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:379] Cross-node Gloo context for process set with hash 1e9d548172549547.
- [3]<stdout>:[2021-11-24 11:35:50.910208: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_context.cc:379] Cross-node Gloo context for process set with hash 1e9d548172549547.
- [1]<stdout>:[2021-11-24 11:35:50.911948: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_controller.cc:104] Gloo controller initialized.
- [3]<stdout>:[2021-11-24 11:35:50.911934: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_controller.cc:104] Gloo controller initialized.
- [0]<stdout>:[2021-11-24 11:35:50.911887: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_controller.cc:104] Gloo controller initialized.
- [2]<stdout>:[2021-11-24 11:35:50.911954: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/gloo/gloo_controller.cc:104] Gloo controller initialized.
- [1]<stdout>:[2021-11-24 11:35:50.912755: I /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:642] [1]: Horovod initialized
- [0]<stdout>:[2021-11-24 11:35:50.912735: I /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:642] [0]: Horovod initialized
- [2]<stdout>:[2021-11-24 11:35:50.912761: I /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:642] [2]: Horovod initialized
- [3]<stdout>:[2021-11-24 11:35:50.912732: I /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:642] [3]: Horovod initialized
- [2]<stdout>:[2021-11-24 11:35:50.913147: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:856] Background thread init done
- [1]<stdout>:[2021-11-24 11:35:50.913418: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:856] Background thread init done
- [2]<stdout>:Horovod local rank : 2
- [1]<stdout>:Horovod local rank : 1
- [2]<stdout>:Is master : False
- [3]<stdout>:[2021-11-24 11:35:50.913718: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:856] Background thread init done
- [1]<stdout>:Is master : False
- [0]<stdout>:[2021-11-24 11:35:50.913793: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:856] Background thread init done
- [2]<stdout>:Setting gpu visibility PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU')
- [3]<stdout>:Horovod local rank : 3
- [1]<stdout>:Setting gpu visibility PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')
- [0]<stdout>:Horovod local rank : 0
- [2]<stdout>:Training directory : /opt/mt/local_data/gen_enfr
- [3]<stdout>:Is master : False
- [1]<stdout>:Training directory : /opt/mt/local_data/gen_enfr
- [3]<stdout>:Setting gpu visibility PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')
- [0]<stdout>:Is master : True
- [3]<stdout>:Training directory : /opt/mt/local_data/gen_enfr
- [0]<stdout>:Setting gpu visibility PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
- [0]<stdout>:Training directory : /opt/mt/local_data/gen_enfr
- [3]<stdout>:INFO Create configuration files started...
- [3]<stdout>:DEBUG Training properties file: gen_enfr/training_properties.json already exists.
- [3]<stdout>:DEBUG Creating OpenNMT config file: gen_enfr/onmt-config.yml
- [0]<stdout>:INFO 'Create configuration files' finished. (< 1s)
- [2]<stdout>:INFO 'Create configuration files' finished. (< 1s)
- [1]<stdout>:INFO 'Create configuration files' finished. (< 1s)
- [3]<stdout>:INFO 'Create configuration files' finished. (< 1s)
- [3]<stdout>:INFO Training started...
- [3]<stdout>:INFO Training model at dir gen_enfr
- [3]<stdout>:DEBUG Initializing summary file: gen_enfr/summary.json
- [3]<stdout>:DEBUG Reading ONMT config from file: gen_enfr/onmt-config.yml
- [2]<stdout>:INFO Training started...
- [2]<stdout>:INFO Training model at dir gen_enfr
- [2]<stdout>:DEBUG Initializing summary file: gen_enfr/summary.json
- [2]<stdout>:DEBUG Reading ONMT config from file: gen_enfr/onmt-config.yml
- [1]<stdout>:INFO Training started...
- [1]<stdout>:INFO Training model at dir gen_enfr
- [1]<stdout>:DEBUG Initializing summary file: gen_enfr/summary.json
- [0]<stdout>:INFO Training started...
- [0]<stdout>:INFO Training model at dir gen_enfr
- [0]<stdout>:DEBUG Initializing summary file: gen_enfr/summary.json
- [1]<stdout>:DEBUG Reading ONMT config from file: gen_enfr/onmt-config.yml
- [0]<stdout>:DEBUG Reading ONMT config from file: gen_enfr/onmt-config.yml
- [3]<stdout>:INFO Training with 1 devices.
- [3]<stdout>:ERROR Traceback (most recent call last):
- [3]<stdout>: File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/cdtnice/common/pipeline.py", line 30, in wrapped_f
- [3]<stdout>: function_return_value = f(*args, **kwargs)
- [3]<stdout>: File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/cdtnice/train/train.py", line 48, in run
- [3]<stdout>: final_model_dir, train_summary = runner.train(
- [3]<stdout>: File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/opennmt/runner.py", line 199, in train
- [3]<stdout>: devices = misc.get_devices(count=num_devices, fallback_to_cpu=fallback_to_cpu)
- [3]<stdout>: File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/opennmt/utils/misc.py", line 33, in get_devices
- [3]<stdout>: devices = tf.config.list_logical_devices(device_type=device_type)
- [3]<stdout>: File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/tensorflow/python/framework/config.py", line 452, in list_logical_devices
- [3]<stdout>: return context.context().list_logical_devices(device_type=device_type)
- [3]<stdout>: File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/tensorflow/python/eager/context.py", line 1395, in list_logical_devices
- [3]<stdout>: self.ensure_initialized()
- [3]<stdout>: File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/tensorflow/python/eager/context.py", line 525, in ensure_initialized
- [3]<stdout>: context_handle = pywrap_tfe.TFE_NewContext(opts)
- [3]<stdout>:tensorflow.python.framework.errors_impl.AlreadyExistsError: TensorFlow device (GPU:0) is being mapped to multiple devices (3 now, and 0 previously), which is not supported. This may be the result of providing different GPU configurations (ConfigProto.gpu_options, for example different visible_device_list) when creating multiple Sessions in the same process. This is not currently supported, see https://github.com/tensorflow/tensorflow/issues/19083
- [3]<stdout>:
- [0]<stdout>:[2021-11-24 11:35:54.799402: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:659] [0]: Shutting down background thread
- [3]<stdout>:[2021-11-24 11:35:54.799456: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:659] [3]: Shutting down background thread
- [2]<stdout>:[2021-11-24 11:35:54.799503: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:659] [2]: Shutting down background thread
- [1]<stdout>:[2021-11-24 11:35:54.799520: D /tmp/pip-install-e0jnkwxz/horovod_78c4138cc9634fec9614ce0ced733dc7/horovod/common/operations.cc:659] [1]: Shutting down background thread
- [2]<stdout>:INFO Training with 1 devices.
- [2]<stdout>:ERROR Traceback (most recent call last):
- [2]<stdout>: File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/cdtnice/common/pipeline.py", line 30, in wrapped_f
- [2]<stdout>: function_return_value = f(*args, **kwargs)
- [2]<stdout>: File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/cdtnice/train/train.py", line 48, in run
- [2]<stdout>: final_model_dir, train_summary = runner.train(
- [2]<stdout>: File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/opennmt/runner.py", line 199, in train
- [2]<stdout>: devices = misc.get_devices(count=num_devices, fallback_to_cpu=fallback_to_cpu)
- [2]<stdout>: File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/opennmt/utils/misc.py", line 33, in get_devices
- [2]<stdout>: devices = tf.config.list_logical_devices(device_type=device_type)
- [2]<stdout>: File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/tensorflow/python/framework/config.py", line 452, in list_logical_devices
- [2]<stdout>: return context.context().list_logical_devices(device_type=device_type)
- [2]<stdout>: File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/tensorflow/python/eager/context.py", line 1395, in list_logical_devices
- [2]<stdout>: self.ensure_initialized()
- [2]<stdout>: File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/tensorflow/python/eager/context.py", line 525, in ensure_initialized
- [2]<stdout>: context_handle = pywrap_tfe.TFE_NewContext(opts)
- [2]<stdout>:tensorflow.python.framework.errors_impl.AlreadyExistsError: TensorFlow device (GPU:0) is being mapped to multiple devices (2 now, and 0 previously), which is not supported. This may be the result of providing different GPU configurations (ConfigProto.gpu_options, for example different visible_device_list) when creating multiple Sessions in the same process. This is not currently supported, see https://github.com/tensorflow/tensorflow/issues/19083
- [2]<stdout>:
- [1]<stdout>:INFO Training with 1 devices.
- [0]<stdout>:INFO Training with 1 devices.
- [1]<stdout>:ERROR Traceback (most recent call last):
- [1]<stdout>: File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/cdtnice/common/pipeline.py", line 30, in wrapped_f
- [1]<stdout>: function_return_value = f(*args, **kwargs)
- [1]<stdout>: File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/cdtnice/train/train.py", line 48, in run
- [1]<stdout>: final_model_dir, train_summary = runner.train(
- [1]<stdout>: File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/opennmt/runner.py", line 199, in train
- [1]<stdout>: devices = misc.get_devices(count=num_devices, fallback_to_cpu=fallback_to_cpu)
- [1]<stdout>: File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/opennmt/utils/misc.py", line 33, in get_devices
- [1]<stdout>: devices = tf.config.list_logical_devices(device_type=device_type)
- [1]<stdout>: File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/tensorflow/python/framework/config.py", line 452, in list_logical_devices
- [1]<stdout>: return context.context().list_logical_devices(device_type=device_type)
- [1]<stdout>: File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/tensorflow/python/eager/context.py", line 1395, in list_logical_devices
- [1]<stdout>: self.ensure_initialized()
- [1]<stdout>: File "/opt/mt/miniconda3/envs/horovod/lib/python3.8/site-packages/tensorflow/python/eager/context.py", line 525, in ensure_initialized
- [1]<stdout>: context_handle = pywrap_tfe.TFE_NewContext(opts)
- [1]<stdout>:tensorflow.python.framework.errors_impl.AlreadyExistsError: TensorFlow device (GPU:0) is being mapped to multiple devices (1 now, and 0 previously), which is not supported. This may be the result of providing different GPU configurations (ConfigProto.gpu_options, for example different visible_device_list) when creating multiple Sessions in the same process. This is not currently supported, see https://github.com/tensorflow/tensorflow/issues/19083
- [1]<stdout>:
- Process 2 exit with status code 1.
- Terminating remaining workers after failure of Process 2.
- Process 1 exit with status code 1.
- Process 3 exit with status code 1.
- Process 0 exit with status code 143.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement