Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- D:\llama-main>python -m torch.distributed.launch example_chat_completion.py
- NOTE: Redirects are currently not supported in Windows or MacOs.
- C:\Python311\Lib\site-packages\torch\distributed\launch.py:181: FutureWarning: The module torch.distributed.launch is deprecated
- and will be removed in future. Use torchrun.
- Note that --use-env is set by default in torchrun.
- If your script expects `--local-rank` argument to be set, please
- change it to read from `os.environ['LOCAL_RANK']` instead. See
- https://pytorch.org/docs/stable/distributed.html#launch-utility for
- further instructions
- warnings.warn(
- [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
- [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
- [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
- [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
- Traceback (most recent call last):
- File "D:\llama-main\example_chat_completion.py", line 73, in <module>
- main(
- File "D:\llama-main\example_chat_completion.py", line 20, in main
- generator = Llama.build(
- ^^^^^^^^^^^^
- File "D:\llama-main\llama\generation.py", line 62, in build
- torch.distributed.init_process_group("nccl")
- File "C:\Python311\Lib\site-packages\torch\distributed\distributed_c10d.py", line 907, in init_process_group
- default_pg = _new_process_group_helper(
- ^^^^^^^^^^^^^^^^^^^^^^^^^^
- File "C:\Python311\Lib\site-packages\torch\distributed\distributed_c10d.py", line 1013, in _new_process_group_helper
- raise RuntimeError("Distributed package doesn't have NCCL " "built in")
- RuntimeError: Distributed package doesn't have NCCL built in
- ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 11956) of binary: C:\Python311\python.exe
- Traceback (most recent call last):
- File "<frozen runpy>", line 198, in _run_module_as_main
- File "<frozen runpy>", line 88, in _run_code
- File "C:\Python311\Lib\site-packages\torch\distributed\launch.py", line 196, in <module>
- main()
- File "C:\Python311\Lib\site-packages\torch\distributed\launch.py", line 192, in main
- launch(args)
- File "C:\Python311\Lib\site-packages\torch\distributed\launch.py", line 177, in launch
- run(args)
- File "C:\Python311\Lib\site-packages\torch\distributed\run.py", line 785, in run
- elastic_launch(
- File "C:\Python311\Lib\site-packages\torch\distributed\launcher\api.py", line 134, in __call__
- return launch_agent(self._config, self._entrypoint, list(args))
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- File "C:\Python311\Lib\site-packages\torch\distributed\launcher\api.py", line 250, in launch_agent
- raise ChildFailedError(
- torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
- ============================================================
- example_chat_completion.py FAILED
- ------------------------------------------------------------
- Failures:
- <NO_OTHER_FAILURES>
- ------------------------------------------------------------
- Root Cause (first observed failure):
- [0]:
- time : 2023-07-23_18:28:19
- host : Blyatus
- rank : 0 (local_rank: 0)
- exitcode : 1 (pid: 11956)
- error_file: <N/A>
- traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
- ============================================================
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement