SHARE
TWEET

Untitled

a guest Jan 14th, 2020 75 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. 2020-01-14 23:40:32,422 - INFO - Starting epoch 0
  2. ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
  3. ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
  4. ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
  5. Traceback (most recent call last):
  6.   File "/usr/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
  7.     obj = _ForkingPickler.dumps(obj)
  8.   File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
  9.     cls(buf, protocol).dump(obj)
  10.   File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/reductions.py", line 333, in reduce_storage
  11.     fd, size = storage._share_fd_()
  12. RuntimeError: unable to write to file </torch_176_1132937539>
  13. Traceback (most recent call last):
  14.   File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 724, in _try_get_data
  15.     data = self._data_queue.get(timeout=timeout)
  16.   File "/usr/lib/python3.6/multiprocessing/queues.py", line 104, in get
  17.     if not self._poll(timeout):
  18.   File "/usr/lib/python3.6/multiprocessing/connection.py", line 257, in poll
  19.     return self._poll(timeout)
  20.   File "/usr/lib/python3.6/multiprocessing/connection.py", line 414, in _poll
  21.     r = wait([self], timeout)
  22.   File "/usr/lib/python3.6/multiprocessing/connection.py", line 911, in wait
  23.     ready = selector.select(timeout)
  24.   File "/usr/lib/python3.6/selectors.py", line 376, in select
  25.     fd_event_list = self._poll.poll(timeout)
  26.   File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/signal_handling.py", line 63, in handler
  27.     def handler(signum, frame):
  28.   File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
  29.     _error_if_any_worker_fails()
  30. RuntimeError: DataLoader worker (pid 177) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.
  31. During handling of the above exception, another exception occurred:
  32. Traceback (most recent call last):
  33.   File "NeMo/jasper.py", line 342, in <module>
  34.     main()
  35.   File "NeMo/jasper.py", line 338, in main
  36.     stop_on_nan_loss=args.stop_on_nan_loss)
  37.   File "/home/jovyan/libs/nemo/core/neural_factory.py", line 616, in train
  38.     gradient_predivide=gradient_predivide)
  39.   File "/home/jovyan/libs/nemo/backends/pytorch/actions.py", line 1405, in train
  40.     for _, data in enumerate(train_dataloader, 0):
  41.   File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 804, in __next__
  42.     idx, data = self._get_data()
  43.   File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 771, in _get_data
  44.     success, data = self._try_get_data()
  45.   File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 737, in _try_get_data
  46.     raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
  47. RuntimeError: DataLoader worker (pid(s) 177, 178, 180) exited unexpectedly
  48. Traceback (most recent call last):
  49.   File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
  50.     "__main__", mod_spec)
  51.   File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
  52.     exec(code, run_globals)
  53.   File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 253, in <module>
  54.     main()
  55.   File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 249, in main
  56.     cmd=cmd)
  57. subprocess.CalledProcessError: Command '['/usr/bin/python', '-u', 'NeMo/jasper.py', '--local_rank=0', '--max_steps', '200000', '--model_config', 'NeMo/jasper10x5_ru.yaml']' returned non-zero exit status 1.
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand
 
Top