Advertisement
drbom

Llama2 not running

Jul 23rd, 2023
883
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 3.77 KB | Software | 0 0
  1. D:\llama-main>python -m torch.distributed.launch example_chat_completion.py
  2. NOTE: Redirects are currently not supported in Windows or MacOs.
  3. C:\Python311\Lib\site-packages\torch\distributed\launch.py:181: FutureWarning: The module torch.distributed.launch is deprecated
  4. and will be removed in future. Use torchrun.
  5. Note that --use-env is set by default in torchrun.
  6. If your script expects `--local-rank` argument to be set, please
  7. change it to read from `os.environ['LOCAL_RANK']` instead. See
  8. https://pytorch.org/docs/stable/distributed.html#launch-utility for
  9. further instructions
  10.  
  11. warnings.warn(
  12. [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
  13. [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
  14. [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
  15. [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
  16. Traceback (most recent call last):
  17. File "D:\llama-main\example_chat_completion.py", line 73, in <module>
  18. main(
  19. File "D:\llama-main\example_chat_completion.py", line 20, in main
  20. generator = Llama.build(
  21. ^^^^^^^^^^^^
  22. File "D:\llama-main\llama\generation.py", line 62, in build
  23. torch.distributed.init_process_group("nccl")
  24. File "C:\Python311\Lib\site-packages\torch\distributed\distributed_c10d.py", line 907, in init_process_group
  25. default_pg = _new_process_group_helper(
  26. ^^^^^^^^^^^^^^^^^^^^^^^^^^
  27. File "C:\Python311\Lib\site-packages\torch\distributed\distributed_c10d.py", line 1013, in _new_process_group_helper
  28. raise RuntimeError("Distributed package doesn't have NCCL " "built in")
  29. RuntimeError: Distributed package doesn't have NCCL built in
  30. ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 11956) of binary: C:\Python311\python.exe
  31. Traceback (most recent call last):
  32. File "<frozen runpy>", line 198, in _run_module_as_main
  33. File "<frozen runpy>", line 88, in _run_code
  34. File "C:\Python311\Lib\site-packages\torch\distributed\launch.py", line 196, in <module>
  35. main()
  36. File "C:\Python311\Lib\site-packages\torch\distributed\launch.py", line 192, in main
  37. launch(args)
  38. File "C:\Python311\Lib\site-packages\torch\distributed\launch.py", line 177, in launch
  39. run(args)
  40. File "C:\Python311\Lib\site-packages\torch\distributed\run.py", line 785, in run
  41. elastic_launch(
  42. File "C:\Python311\Lib\site-packages\torch\distributed\launcher\api.py", line 134, in __call__
  43. return launch_agent(self._config, self._entrypoint, list(args))
  44. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  45. File "C:\Python311\Lib\site-packages\torch\distributed\launcher\api.py", line 250, in launch_agent
  46. raise ChildFailedError(
  47. torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
  48. ============================================================
  49. example_chat_completion.py FAILED
  50. ------------------------------------------------------------
  51. Failures:
  52. <NO_OTHER_FAILURES>
  53. ------------------------------------------------------------
  54. Root Cause (first observed failure):
  55. [0]:
  56. time : 2023-07-23_18:28:19
  57. host : Blyatus
  58. rank : 0 (local_rank: 0)
  59. exitcode : 1 (pid: 11956)
  60. error_file: <N/A>
  61. traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
  62. ============================================================
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement