Guest User

Untitled

a guest
Sep 13th, 2025
23
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 127.64 KB | None | 0 0
  1. ```INFO 09-13 09:49:49 [__init__.py:241] Automatically detected platform cuda.
  2. (APIServer pid=1) INFO 09-13 09:49:51 [api_server.py:1805] vLLM API server version 0.10.1.1
  3. (APIServer pid=1) INFO 09-13 09:49:51 [utils.py:326] non-default args: {'model': 'meta-llama/Llama-2-70b-chat-hf', 'download_dir': '/root/.cache/huggingface', 'tensor_parallel_size': 4}
  4. (APIServer pid=1) INFO 09-13 09:49:56 [__init__.py:711] Resolved architecture: LlamaForCausalLM
  5. (APIServer pid=1) INFO 09-13 09:49:56 [__init__.py:1750] Using max model len 4096
  6. (APIServer pid=1) INFO 09-13 09:49:56 [scheduler.py:222] Chunked prefill is enabled with max_num_batched_tokens=8192.
  7. INFO 09-13 09:50:00 [__init__.py:241] Automatically detected platform cuda.
  8. (EngineCore_0 pid=270) INFO 09-13 09:50:01 [core.py:636] Waiting for init message from front-end.
  9. (EngineCore_0 pid=270) INFO 09-13 09:50:01 [core.py:74] Initializing a V1 LLM engine (v0.10.1.1) with config: model='meta-llama/Llama-2-70b-chat-hf', speculative_config=None, tokenizer='meta-llama/Llama-2-70b-chat-hf', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=4096, download_dir='/root/.cache/huggingface', load_format=auto, tensor_parallel_size=4, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=meta-llama/Llama-2-70b-chat-hf, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":1,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null}
  10. (EngineCore_0 pid=270) WARNING 09-13 09:50:01 [multiproc_worker_utils.py:273] Reducing Torch parallelism from 48 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
  11. (EngineCore_0 pid=270) INFO 09-13 09:50:01 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0, 1, 2, 3], buffer_handle=(4, 16777216, 10, 'psm_ce29d5b4'), local_subscribe_addr='ipc:///tmp/6bb50716-e0bf-4447-8722-037fdcb01d89', remote_subscribe_addr=None, remote_addr_ipv6=False)
  12. INFO 09-13 09:50:03 [__init__.py:241] Automatically detected platform cuda.
  13. INFO 09-13 09:50:03 [__init__.py:241] Automatically detected platform cuda.
  14. INFO 09-13 09:50:04 [__init__.py:241] Automatically detected platform cuda.
  15. INFO 09-13 09:50:04 [__init__.py:241] Automatically detected platform cuda.
  16. (VllmWorker TP0 pid=404) INFO 09-13 09:50:06 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_ddf599bb'), local_subscribe_addr='ipc:///tmp/39f8b153-8786-4c4e-b4a1-56d440039898', remote_subscribe_addr=None, remote_addr_ipv6=False)
  17. (VllmWorker TP2 pid=406) INFO 09-13 09:50:06 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_db502135'), local_subscribe_addr='ipc:///tmp/a40199d2-2072-401d-b5e0-1dab9c842432', remote_subscribe_addr=None, remote_addr_ipv6=False)
  18. (VllmWorker TP3 pid=407) INFO 09-13 09:50:06 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_21510043'), local_subscribe_addr='ipc:///tmp/dbeac1ac-25e3-4650-8917-68aaa5d0016a', remote_subscribe_addr=None, remote_addr_ipv6=False)
  19. (VllmWorker TP1 pid=405) INFO 09-13 09:50:06 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_83241dcf'), local_subscribe_addr='ipc:///tmp/4acf2061-fa03-449e-922b-9170920263b5', remote_subscribe_addr=None, remote_addr_ipv6=False)
  20. (VllmWorker TP0 pid=404) INFO 09-13 09:50:07 [__init__.py:1418] Found nccl from library libnccl.so.2
  21. (VllmWorker TP0 pid=404) INFO 09-13 09:50:07 [pynccl.py:70] vLLM is using nccl==2.26.2
  22. (VllmWorker TP1 pid=405) INFO 09-13 09:50:07 [__init__.py:1418] Found nccl from library libnccl.so.2
  23. (VllmWorker TP3 pid=407) INFO 09-13 09:50:07 [__init__.py:1418] Found nccl from library libnccl.so.2
  24. (VllmWorker TP2 pid=406) INFO 09-13 09:50:07 [__init__.py:1418] Found nccl from library libnccl.so.2
  25. (VllmWorker TP1 pid=405) INFO 09-13 09:50:07 [pynccl.py:70] vLLM is using nccl==2.26.2
  26. (VllmWorker TP3 pid=407) INFO 09-13 09:50:07 [pynccl.py:70] vLLM is using nccl==2.26.2
  27. (VllmWorker TP2 pid=406) INFO 09-13 09:50:07 [pynccl.py:70] vLLM is using nccl==2.26.2
  28. (VllmWorker TP3 pid=407) WARNING 09-13 09:55:29 [custom_all_reduce.py:137] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
  29. (VllmWorker TP1 pid=405) WARNING 09-13 09:55:29 [custom_all_reduce.py:137] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
  30. (VllmWorker TP0 pid=404) WARNING 09-13 09:55:29 [custom_all_reduce.py:137] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
  31. (VllmWorker TP2 pid=406) WARNING 09-13 09:55:29 [custom_all_reduce.py:137] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
  32. (VllmWorker TP0 pid=404) INFO 09-13 09:55:29 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[1, 2, 3], buffer_handle=(3, 4194304, 6, 'psm_a9b6d09f'), local_subscribe_addr='ipc:///tmp/5ca9f5b7-b70e-4963-9312-f410020ebb81', remote_subscribe_addr=None, remote_addr_ipv6=False)
  33. (VllmWorker TP2 pid=406) INFO 09-13 09:55:29 [parallel_state.py:1134] rank 2 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 2, EP rank 2
  34. (VllmWorker TP0 pid=404) INFO 09-13 09:55:29 [parallel_state.py:1134] rank 0 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
  35. (VllmWorker TP3 pid=407) INFO 09-13 09:55:29 [parallel_state.py:1134] rank 3 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 3, EP rank 3
  36. (VllmWorker TP1 pid=405) INFO 09-13 09:55:29 [parallel_state.py:1134] rank 1 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 1, EP rank 1
  37. (VllmWorker TP2 pid=406) INFO 09-13 09:55:29 [topk_topp_sampler.py:50] Using FlashInfer for top-p & top-k sampling.
  38. (VllmWorker TP0 pid=404) INFO 09-13 09:55:29 [topk_topp_sampler.py:50] Using FlashInfer for top-p & top-k sampling.
  39. (VllmWorker TP3 pid=407) INFO 09-13 09:55:29 [topk_topp_sampler.py:50] Using FlashInfer for top-p & top-k sampling.
  40. (VllmWorker TP1 pid=405) INFO 09-13 09:55:29 [topk_topp_sampler.py:50] Using FlashInfer for top-p & top-k sampling.
  41. (VllmWorker TP2 pid=406) INFO 09-13 09:55:29 [gpu_model_runner.py:1953] Starting to load model meta-llama/Llama-2-70b-chat-hf...
  42. (VllmWorker TP0 pid=404) INFO 09-13 09:55:29 [gpu_model_runner.py:1953] Starting to load model meta-llama/Llama-2-70b-chat-hf...
  43. (VllmWorker TP3 pid=407) INFO 09-13 09:55:29 [gpu_model_runner.py:1953] Starting to load model meta-llama/Llama-2-70b-chat-hf...
  44. (VllmWorker TP1 pid=405) INFO 09-13 09:55:29 [gpu_model_runner.py:1953] Starting to load model meta-llama/Llama-2-70b-chat-hf...
  45. (VllmWorker TP0 pid=404) INFO 09-13 09:55:29 [gpu_model_runner.py:1985] Loading model from scratch...
  46. (VllmWorker TP2 pid=406) INFO 09-13 09:55:29 [gpu_model_runner.py:1985] Loading model from scratch...
  47. (VllmWorker TP2 pid=406) INFO 09-13 09:55:29 [cuda.py:328] Using Flash Attention backend on V1 engine.
  48. (VllmWorker TP0 pid=404) INFO 09-13 09:55:29 [cuda.py:328] Using Flash Attention backend on V1 engine.
  49. (VllmWorker TP1 pid=405) INFO 09-13 09:55:29 [gpu_model_runner.py:1985] Loading model from scratch...
  50. (VllmWorker TP3 pid=407) INFO 09-13 09:55:29 [gpu_model_runner.py:1985] Loading model from scratch...
  51. (VllmWorker TP1 pid=405) INFO 09-13 09:55:29 [cuda.py:328] Using Flash Attention backend on V1 engine.
  52. (VllmWorker TP3 pid=407) INFO 09-13 09:55:29 [cuda.py:328] Using Flash Attention backend on V1 engine.
  53. (VllmWorker TP2 pid=406) INFO 09-13 09:55:30 [weight_utils.py:296] Using model weights format ['*.safetensors']
  54. (VllmWorker TP0 pid=404) INFO 09-13 09:55:30 [weight_utils.py:296] Using model weights format ['*.safetensors']
  55. (VllmWorker TP1 pid=405) INFO 09-13 09:55:30 [weight_utils.py:296] Using model weights format ['*.safetensors']
  56. (VllmWorker TP3 pid=407) INFO 09-13 09:55:30 [weight_utils.py:296] Using model weights format ['*.safetensors']
  57. Loading safetensors checkpoint shards: 0% Completed | 0/15 [00:00<?, ?it/s]
  58. Loading safetensors checkpoint shards: 7% Completed | 1/15 [00:00<00:10, 1.37it/s]
  59. Loading safetensors checkpoint shards: 13% Completed | 2/15 [00:01<00:09, 1.37it/s]
  60. Loading safetensors checkpoint shards: 20% Completed | 3/15 [00:02<00:08, 1.37it/s]
  61. Loading safetensors checkpoint shards: 27% Completed | 4/15 [00:02<00:08, 1.34it/s]
  62. Loading safetensors checkpoint shards: 40% Completed | 6/15 [00:03<00:05, 1.79it/s]
  63. Loading safetensors checkpoint shards: 47% Completed | 7/15 [00:04<00:04, 1.67it/s]
  64. Loading safetensors checkpoint shards: 53% Completed | 8/15 [00:05<00:04, 1.60it/s]
  65. Loading safetensors checkpoint shards: 60% Completed | 9/15 [00:05<00:03, 1.51it/s]
  66. Loading safetensors checkpoint shards: 67% Completed | 10/15 [00:06<00:03, 1.47it/s]
  67. Loading safetensors checkpoint shards: 73% Completed | 11/15 [00:07<00:02, 1.43it/s]
  68. Loading safetensors checkpoint shards: 80% Completed | 12/15 [00:08<00:02, 1.40it/s]
  69. Loading safetensors checkpoint shards: 87% Completed | 13/15 [00:08<00:01, 1.41it/s]
  70. Loading safetensors checkpoint shards: 93% Completed | 14/15 [00:09<00:00, 1.39it/s]
  71. (VllmWorker TP2 pid=406) INFO 09-13 09:55:40 [default_loader.py:262] Loading weights took 10.33 seconds
  72. Loading safetensors checkpoint shards: 100% Completed | 15/15 [00:10<00:00, 1.41it/s]
  73. Loading safetensors checkpoint shards: 100% Completed | 15/15 [00:10<00:00, 1.46it/s]
  74. (VllmWorker TP0 pid=404)
  75. (VllmWorker TP2 pid=406) INFO 09-13 09:55:40 [gpu_model_runner.py:2007] Model loading took 32.1248 GiB and 10.945041 seconds
  76. (VllmWorker TP0 pid=404) INFO 09-13 09:55:40 [default_loader.py:262] Loading weights took 10.32 seconds
  77. (VllmWorker TP0 pid=404) INFO 09-13 09:55:41 [gpu_model_runner.py:2007] Model loading took 32.1248 GiB and 11.342089 seconds
  78. (VllmWorker TP3 pid=407) INFO 09-13 09:55:41 [default_loader.py:262] Loading weights took 10.86 seconds
  79. (VllmWorker TP3 pid=407) INFO 09-13 09:55:42 [gpu_model_runner.py:2007] Model loading took 32.1248 GiB and 11.959376 seconds
  80. (VllmWorker TP1 pid=405) INFO 09-13 09:55:43 [default_loader.py:262] Loading weights took 13.35 seconds
  81. (VllmWorker TP1 pid=405) INFO 09-13 09:55:44 [gpu_model_runner.py:2007] Model loading took 32.1248 GiB and 14.052152 seconds
  82. (VllmWorker TP2 pid=406) INFO 09-13 09:55:54 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/9bc7bd544a/rank_2_0/backbone for vLLM's torch.compile
  83. (VllmWorker TP2 pid=406) INFO 09-13 09:55:54 [backends.py:559] Dynamo bytecode transform time: 9.66 s
  84. (VllmWorker TP0 pid=404) INFO 09-13 09:55:54 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/9bc7bd544a/rank_0_0/backbone for vLLM's torch.compile
  85. (VllmWorker TP0 pid=404) INFO 09-13 09:55:54 [backends.py:559] Dynamo bytecode transform time: 9.79 s
  86. (VllmWorker TP1 pid=405) INFO 09-13 09:55:54 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/9bc7bd544a/rank_1_0/backbone for vLLM's torch.compile
  87. (VllmWorker TP1 pid=405) INFO 09-13 09:55:54 [backends.py:559] Dynamo bytecode transform time: 10.03 s
  88. (VllmWorker TP3 pid=407) INFO 09-13 09:55:54 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/9bc7bd544a/rank_3_0/backbone for vLLM's torch.compile
  89. (VllmWorker TP3 pid=407) INFO 09-13 09:55:54 [backends.py:559] Dynamo bytecode transform time: 10.29 s
  90. (VllmWorker TP2 pid=406) INFO 09-13 09:55:59 [backends.py:194] Cache the graph for dynamic shape for later use
  91. (VllmWorker TP0 pid=404) INFO 09-13 09:55:59 [backends.py:194] Cache the graph for dynamic shape for later use
  92. (VllmWorker TP1 pid=405) INFO 09-13 09:55:59 [backends.py:194] Cache the graph for dynamic shape for later use
  93. (VllmWorker TP3 pid=407) INFO 09-13 09:55:59 [backends.py:194] Cache the graph for dynamic shape for later use
  94. (VllmWorker TP0 pid=404) INFO 09-13 09:56:34 [backends.py:215] Compiling a graph for dynamic shape takes 39.38 s
  95. (VllmWorker TP1 pid=405) INFO 09-13 09:56:34 [backends.py:215] Compiling a graph for dynamic shape takes 39.55 s
  96. (VllmWorker TP2 pid=406) INFO 09-13 09:56:34 [backends.py:215] Compiling a graph for dynamic shape takes 39.90 s
  97. (VllmWorker TP3 pid=407) INFO 09-13 09:56:35 [backends.py:215] Compiling a graph for dynamic shape takes 39.90 s
  98. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] WorkerProc hit an exception.
  99. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] Traceback (most recent call last):
  100. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 591, in worker_busy_loop
  101. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = func(*args, **kwargs)
  102. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
  103. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
  104. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
  105. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
  106. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 244, in determine_available_memory
  107. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] self.model_runner.profile_run()
  108. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2622, in profile_run
  109. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] = self._dummy_run(self.max_num_tokens, is_profile=True)
  110. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  111. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
  112. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
  113. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
  114. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2395, in _dummy_run
  115. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outputs = self.model(
  116. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^
  117. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
  118. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
  119. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  120. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
  121. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
  122. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  123. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 577, in forward
  124. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] model_output = self.model(input_ids, positions, intermediate_tensors,
  125. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  126. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 272, in __call__
  127. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = self.compiled_callable(*args, **kwargs)
  128. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  129. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 655, in _fn
  130. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
  131. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
  132. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 361, in forward
  133. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] def forward(
  134. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
  135. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
  136. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  137. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
  138. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
  139. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  140. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
  141. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
  142. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
  143. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 830, in call_wrapped
  144. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._wrapped_call(self, *args, **kwargs)
  145. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  146. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 406, in __call__
  147. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] raise e
  148. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 393, in __call__
  149. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
  150. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  151. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
  152. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
  153. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  154. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
  155. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
  156. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  157. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "<eval_with_key>.162", line 490, in forward
  158. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = None
  159. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  160. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 119, in __call__
  161. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.runnable(*args, **kwargs)
  162. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  163. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 90, in __call__
  164. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.compiled_graph_for_general_shape(*args)
  165. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  166. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
  167. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
  168. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
  169. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1209, in forward
  170. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(full_args)
  171. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^
  172. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 328, in runtime_wrapper
  173. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] all_outs = call_func_at_runtime_with_args(
  174. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  175. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 126, in call_func_at_runtime_with_args
  176. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] out = normalize_as_list(f(args))
  177. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^
  178. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 689, in inner_fn
  179. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outs = compiled_fn(args)
  180. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
  181. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 495, in wrapper
  182. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(runtime_args)
  183. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^
  184. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 460, in __call__
  185. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.current_callable(inputs)
  186. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  187. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 2404, in run
  188. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return model(new_inputs)
  189. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
  190. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/root/.cache/vllm/torch_compile_cache/9bc7bd544a/rank_0_0/inductor_cache/3c/c3cxs4zht2j76bwda7wpi64t6efmxyfdk4bsgojwgwodvkgaxycx.py", line 425, in call
  191. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] extern_kernels.mm(buf4, reinterpret_tensor(arg4_1, (8192, 2560), (1, 8192), 0), out=buf5)
  192. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
  193. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] Traceback (most recent call last):
  194. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 591, in worker_busy_loop
  195. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = func(*args, **kwargs)
  196. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
  197. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
  198. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
  199. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
  200. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 244, in determine_available_memory
  201. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] self.model_runner.profile_run()
  202. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2622, in profile_run
  203. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] = self._dummy_run(self.max_num_tokens, is_profile=True)
  204. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  205. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
  206. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
  207. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
  208. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2395, in _dummy_run
  209. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outputs = self.model(
  210. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^
  211. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
  212. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
  213. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  214. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
  215. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
  216. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  217. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 577, in forward
  218. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] model_output = self.model(input_ids, positions, intermediate_tensors,
  219. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  220. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 272, in __call__
  221. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = self.compiled_callable(*args, **kwargs)
  222. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  223. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 655, in _fn
  224. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
  225. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
  226. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 361, in forward
  227. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] def forward(
  228. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
  229. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
  230. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  231. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
  232. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
  233. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  234. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
  235. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
  236. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
  237. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 830, in call_wrapped
  238. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._wrapped_call(self, *args, **kwargs)
  239. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  240. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 406, in __call__
  241. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] raise e
  242. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 393, in __call__
  243. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
  244. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  245. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
  246. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
  247. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  248. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
  249. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
  250. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  251. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "<eval_with_key>.162", line 490, in forward
  252. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = None
  253. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  254. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 119, in __call__
  255. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.runnable(*args, **kwargs)
  256. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  257. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 90, in __call__
  258. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.compiled_graph_for_general_shape(*args)
  259. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  260. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
  261. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
  262. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
  263. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1209, in forward
  264. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(full_args)
  265. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^
  266. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 328, in runtime_wrapper
  267. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] all_outs = call_func_at_runtime_with_args(
  268. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  269. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 126, in call_func_at_runtime_with_args
  270. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] out = normalize_as_list(f(args))
  271. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^
  272. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 689, in inner_fn
  273. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outs = compiled_fn(args)
  274. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
  275. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 495, in wrapper
  276. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(runtime_args)
  277. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^
  278. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 460, in __call__
  279. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.current_callable(inputs)
  280. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  281. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 2404, in run
  282. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return model(new_inputs)
  283. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
  284. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/root/.cache/vllm/torch_compile_cache/9bc7bd544a/rank_0_0/inductor_cache/3c/c3cxs4zht2j76bwda7wpi64t6efmxyfdk4bsgojwgwodvkgaxycx.py", line 425, in call
  285. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] extern_kernels.mm(buf4, reinterpret_tensor(arg4_1, (8192, 2560), (1, 8192), 0), out=buf5)
  286. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
  287. (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596]
  288. (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] EngineCore failed to start.
  289. (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] Traceback (most recent call last):
  290. (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 691, in run_engine_core
  291. (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] engine_core = EngineCoreProc(*args, **kwargs)
  292. (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  293. (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 492, in __init__
  294. (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] super().__init__(vllm_config, executor_class, log_stats,
  295. (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 89, in __init__
  296. (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] self._initialize_kv_caches(vllm_config)
  297. (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 179, in _initialize_kv_caches
  298. (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] self.model_executor.determine_available_memory())
  299. (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  300. (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 76, in determine_available_memory
  301. (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] output = self.collective_rpc("determine_available_memory")
  302. (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  303. (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 243, in collective_rpc
  304. (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] result = get_response(w, dequeue_timeout)
  305. (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  306. (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 230, in get_response
  307. (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] raise RuntimeError(
  308. (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] RuntimeError: Worker failed with error 'CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`', please check the stack trace above for the root cause
  309. (VllmWorker TP2 pid=406) Exception ignored in: <function WeakValueDictionary.__init__.<locals>.remove at 0x7a48c4224540>
  310. (VllmWorker TP2 pid=406) Traceback (most recent call last):
  311. (VllmWorker TP2 pid=406) File "/usr/lib/python3.12/weakref.py", line 105, in remove
  312. (VllmWorker TP2 pid=406) def remove(wr, selfref=ref(self), _atomic_removal=_remove_dead_weakref):
  313. (VllmWorker TP2 pid=406)
  314. (VllmWorker TP2 pid=406) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 500, in signal_handler
  315. (VllmWorker TP2 pid=406) raise SystemExit()
  316. (VllmWorker TP2 pid=406) SystemExit:
  317. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] WorkerProc hit an exception.
  318. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] Traceback (most recent call last):
  319. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 591, in worker_busy_loop
  320. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = func(*args, **kwargs)
  321. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
  322. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
  323. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
  324. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
  325. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 244, in determine_available_memory
  326. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] self.model_runner.profile_run()
  327. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2622, in profile_run
  328. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] = self._dummy_run(self.max_num_tokens, is_profile=True)
  329. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  330. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
  331. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
  332. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
  333. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2395, in _dummy_run
  334. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outputs = self.model(
  335. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^
  336. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
  337. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
  338. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  339. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
  340. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
  341. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  342. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 577, in forward
  343. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] model_output = self.model(input_ids, positions, intermediate_tensors,
  344. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  345. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 272, in __call__
  346. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = self.compiled_callable(*args, **kwargs)
  347. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  348. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 655, in _fn
  349. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
  350. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
  351. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 361, in forward
  352. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] def forward(
  353. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
  354. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
  355. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  356. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
  357. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
  358. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  359. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
  360. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
  361. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
  362. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 830, in call_wrapped
  363. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._wrapped_call(self, *args, **kwargs)
  364. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  365. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 406, in __call__
  366. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] raise e
  367. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 393, in __call__
  368. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
  369. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  370. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
  371. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
  372. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  373. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
  374. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
  375. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  376. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "<eval_with_key>.162", line 490, in forward
  377. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = None
  378. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  379. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 119, in __call__
  380. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.runnable(*args, **kwargs)
  381. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  382. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 90, in __call__
  383. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.compiled_graph_for_general_shape(*args)
  384. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  385. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
  386. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
  387. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
  388. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1209, in forward
  389. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(full_args)
  390. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^
  391. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 328, in runtime_wrapper
  392. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] all_outs = call_func_at_runtime_with_args(
  393. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  394. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 126, in call_func_at_runtime_with_args
  395. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] out = normalize_as_list(f(args))
  396. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^
  397. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 689, in inner_fn
  398. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outs = compiled_fn(args)
  399. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
  400. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 495, in wrapper
  401. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(runtime_args)
  402. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^
  403. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 460, in __call__
  404. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.current_callable(inputs)
  405. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  406. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 2404, in run
  407. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return model(new_inputs)
  408. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
  409. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/root/.cache/vllm/torch_compile_cache/9bc7bd544a/rank_1_0/inductor_cache/ie/ciezok7spsxdmm4usxwuuqqblbiybh6rysy2j7u5dpkirjtglq5v.py", line 425, in call
  410. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] extern_kernels.mm(buf4, reinterpret_tensor(arg4_1, (8192, 2560), (1, 8192), 0), out=buf5)
  411. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
  412. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] Traceback (most recent call last):
  413. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 591, in worker_busy_loop
  414. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = func(*args, **kwargs)
  415. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
  416. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
  417. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
  418. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
  419. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 244, in determine_available_memory
  420. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] self.model_runner.profile_run()
  421. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2622, in profile_run
  422. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] = self._dummy_run(self.max_num_tokens, is_profile=True)
  423. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  424. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
  425. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
  426. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
  427. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2395, in _dummy_run
  428. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outputs = self.model(
  429. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^
  430. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
  431. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
  432. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  433. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
  434. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
  435. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  436. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 577, in forward
  437. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] model_output = self.model(input_ids, positions, intermediate_tensors,
  438. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  439. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 272, in __call__
  440. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = self.compiled_callable(*args, **kwargs)
  441. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  442. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 655, in _fn
  443. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
  444. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
  445. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 361, in forward
  446. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] def forward(
  447. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
  448. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
  449. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  450. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
  451. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
  452. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  453. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
  454. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
  455. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
  456. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 830, in call_wrapped
  457. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._wrapped_call(self, *args, **kwargs)
  458. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  459. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 406, in __call__
  460. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] raise e
  461. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 393, in __call__
  462. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
  463. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  464. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
  465. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
  466. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  467. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
  468. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
  469. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  470. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "<eval_with_key>.162", line 490, in forward
  471. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = None
  472. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  473. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 119, in __call__
  474. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.runnable(*args, **kwargs)
  475. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  476. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 90, in __call__
  477. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.compiled_graph_for_general_shape(*args)
  478. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  479. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
  480. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
  481. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
  482. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1209, in forward
  483. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(full_args)
  484. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^
  485. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 328, in runtime_wrapper
  486. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] all_outs = call_func_at_runtime_with_args(
  487. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  488. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 126, in call_func_at_runtime_with_args
  489. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] out = normalize_as_list(f(args))
  490. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^
  491. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 689, in inner_fn
  492. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outs = compiled_fn(args)
  493. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
  494. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 495, in wrapper
  495. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(runtime_args)
  496. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^
  497. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 460, in __call__
  498. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.current_callable(inputs)
  499. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  500. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 2404, in run
  501. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return model(new_inputs)
  502. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
  503. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/root/.cache/vllm/torch_compile_cache/9bc7bd544a/rank_1_0/inductor_cache/ie/ciezok7spsxdmm4usxwuuqqblbiybh6rysy2j7u5dpkirjtglq5v.py", line 425, in call
  504. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] extern_kernels.mm(buf4, reinterpret_tensor(arg4_1, (8192, 2560), (1, 8192), 0), out=buf5)
  505. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
  506. (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596]
  507. [rank1]:[W913 09:56:36.100649162 TCPStore.cpp:125] [c10d] recvValue failed on SocketImpl(fd=87, addr=[::ffff:127.0.0.1]:55056, remote=[::ffff:127.0.0.1]:34719): failed to recv, got 0 bytes
  508. Exception raised from recvBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:678 (most recent call first):
  509. frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7d1bb26005e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
  510. frame #1: <unknown function> + 0x5ba8bfe (0x7d1c01cdcbfe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  511. frame #2: <unknown function> + 0x5baaf40 (0x7d1c01cdef40 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  512. frame #3: <unknown function> + 0x5bab84a (0x7d1c01cdf84a in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  513. frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x2a9 (0x7d1c01cd92a9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  514. frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7d1bb337bad9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
  515. frame #6: <unknown function> + 0xdc253 (0x7d1ba35b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
  516. frame #7: <unknown function> + 0x94ac3 (0x7d1c1e332ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
  517. frame #8: clone + 0x44 (0x7d1c1e3c3a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
  518.  
  519. [rank1]:[W913 09:56:36.105026078 ProcessGroupNCCL.cpp:1662] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: failed to recv, got 0 bytes
  520. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] WorkerProc hit an exception.
  521. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] Traceback (most recent call last):
  522. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 591, in worker_busy_loop
  523. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = func(*args, **kwargs)
  524. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
  525. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
  526. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
  527. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
  528. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 244, in determine_available_memory
  529. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] self.model_runner.profile_run()
  530. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2622, in profile_run
  531. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] = self._dummy_run(self.max_num_tokens, is_profile=True)
  532. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  533. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
  534. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
  535. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
  536. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2395, in _dummy_run
  537. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outputs = self.model(
  538. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^
  539. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
  540. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
  541. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  542. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
  543. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
  544. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  545. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 577, in forward
  546. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] model_output = self.model(input_ids, positions, intermediate_tensors,
  547. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  548. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 272, in __call__
  549. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = self.compiled_callable(*args, **kwargs)
  550. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  551. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 655, in _fn
  552. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
  553. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
  554. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 361, in forward
  555. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] def forward(
  556. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
  557. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
  558. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  559. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
  560. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
  561. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  562. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
  563. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
  564. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
  565. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 830, in call_wrapped
  566. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._wrapped_call(self, *args, **kwargs)
  567. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  568. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 406, in __call__
  569. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] raise e
  570. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 393, in __call__
  571. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
  572. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  573. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
  574. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
  575. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  576. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
  577. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
  578. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  579. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "<eval_with_key>.162", line 490, in forward
  580. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = None
  581. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  582. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 119, in __call__
  583. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.runnable(*args, **kwargs)
  584. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  585. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 90, in __call__
  586. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.compiled_graph_for_general_shape(*args)
  587. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  588. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
  589. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
  590. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
  591. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1209, in forward
  592. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(full_args)
  593. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^
  594. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 328, in runtime_wrapper
  595. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] all_outs = call_func_at_runtime_with_args(
  596. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  597. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 126, in call_func_at_runtime_with_args
  598. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] out = normalize_as_list(f(args))
  599. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^
  600. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 689, in inner_fn
  601. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outs = compiled_fn(args)
  602. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
  603. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 495, in wrapper
  604. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(runtime_args)
  605. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^
  606. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 460, in __call__
  607. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.current_callable(inputs)
  608. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  609. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 2404, in run
  610. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return model(new_inputs)
  611. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
  612. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/root/.cache/vllm/torch_compile_cache/9bc7bd544a/rank_2_0/inductor_cache/u5/cu5v4fuorjdwtclvndouq4kmjxc5i5y3kusgekcpb7z7m66rhapp.py", line 425, in call
  613. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] extern_kernels.mm(buf4, reinterpret_tensor(arg4_1, (8192, 2560), (1, 8192), 0), out=buf5)
  614. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
  615. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] Traceback (most recent call last):
  616. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 591, in worker_busy_loop
  617. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = func(*args, **kwargs)
  618. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
  619. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
  620. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
  621. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
  622. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 244, in determine_available_memory
  623. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] self.model_runner.profile_run()
  624. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2622, in profile_run
  625. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] = self._dummy_run(self.max_num_tokens, is_profile=True)
  626. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  627. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
  628. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
  629. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
  630. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2395, in _dummy_run
  631. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outputs = self.model(
  632. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^
  633. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
  634. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
  635. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  636. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
  637. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
  638. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  639. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 577, in forward
  640. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] model_output = self.model(input_ids, positions, intermediate_tensors,
  641. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  642. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 272, in __call__
  643. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = self.compiled_callable(*args, **kwargs)
  644. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  645. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 655, in _fn
  646. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
  647. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
  648. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 361, in forward
  649. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] def forward(
  650. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
  651. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
  652. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  653. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
  654. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
  655. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  656. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
  657. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
  658. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
  659. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 830, in call_wrapped
  660. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._wrapped_call(self, *args, **kwargs)
  661. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  662. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 406, in __call__
  663. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] raise e
  664. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 393, in __call__
  665. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
  666. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  667. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
  668. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
  669. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  670. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
  671. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
  672. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  673. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "<eval_with_key>.162", line 490, in forward
  674. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = None
  675. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  676. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 119, in __call__
  677. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.runnable(*args, **kwargs)
  678. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  679. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 90, in __call__
  680. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.compiled_graph_for_general_shape(*args)
  681. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  682. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
  683. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
  684. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
  685. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1209, in forward
  686. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(full_args)
  687. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^
  688. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 328, in runtime_wrapper
  689. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] all_outs = call_func_at_runtime_with_args(
  690. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  691. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 126, in call_func_at_runtime_with_args
  692. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] out = normalize_as_list(f(args))
  693. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^
  694. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 689, in inner_fn
  695. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outs = compiled_fn(args)
  696. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
  697. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 495, in wrapper
  698. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(runtime_args)
  699. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^
  700. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 460, in __call__
  701. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.current_callable(inputs)
  702. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  703. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 2404, in run
  704. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return model(new_inputs)
  705. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
  706. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/root/.cache/vllm/torch_compile_cache/9bc7bd544a/rank_2_0/inductor_cache/u5/cu5v4fuorjdwtclvndouq4kmjxc5i5y3kusgekcpb7z7m66rhapp.py", line 425, in call
  707. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] extern_kernels.mm(buf4, reinterpret_tensor(arg4_1, (8192, 2560), (1, 8192), 0), out=buf5)
  708. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
  709. (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596]
  710. [rank2]:[W913 09:56:37.749986675 TCPStore.cpp:125] [c10d] recvValue failed on SocketImpl(fd=88, addr=[::ffff:127.0.0.1]:55080, remote=[::ffff:127.0.0.1]:34719): failed to recv, got 0 bytes
  711. Exception raised from recvBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:678 (most recent call first):
  712. frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7a49622005e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
  713. frame #1: <unknown function> + 0x5ba8bfe (0x7a49b18dcbfe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  714. frame #2: <unknown function> + 0x5baaf40 (0x7a49b18def40 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  715. frame #3: <unknown function> + 0x5bab84a (0x7a49b18df84a in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  716. frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x2a9 (0x7a49b18d92a9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  717. frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7a4962f7bad9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
  718. frame #6: <unknown function> + 0xdc253 (0x7a49531b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
  719. frame #7: <unknown function> + 0x94ac3 (0x7a49cdf6dac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
  720. frame #8: clone + 0x44 (0x7a49cdffea04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
  721.  
  722. [rank2]:[W913 09:56:37.763949114 ProcessGroupNCCL.cpp:1662] [PG ID 0 PG GUID 0 Rank 2] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: failed to recv, got 0 bytes
  723. [rank1]:[W913 09:56:37.105308347 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=87, addr=[::ffff:127.0.0.1]:55056, remote=[::ffff:127.0.0.1]:34719): Broken pipe
  724. Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
  725. frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7d1bb26005e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
  726. frame #1: <unknown function> + 0x5ba8bfe (0x7d1c01cdcbfe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  727. frame #2: <unknown function> + 0x5baa458 (0x7d1c01cde458 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  728. frame #3: <unknown function> + 0x5babc3e (0x7d1c01cdfc3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  729. frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7d1c01cd9298 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  730. frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7d1bb337bad9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
  731. frame #6: <unknown function> + 0xdc253 (0x7d1ba35b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
  732. frame #7: <unknown function> + 0x94ac3 (0x7d1c1e332ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
  733. frame #8: clone + 0x44 (0x7d1c1e3c3a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
  734.  
  735. [rank1]:[W913 09:56:37.119108427 ProcessGroupNCCL.cpp:1662] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
  736. (EngineCore_0 pid=270) ERROR 09-13 09:56:37 [multiproc_executor.py:146] Worker proc VllmWorker-0 died unexpectedly, shutting down executor.
  737. (VllmWorker TP1 pid=405) INFO 09-13 09:56:37 [multiproc_executor.py:520] Parent process exited, terminating worker
  738. (VllmWorker TP2 pid=406) INFO 09-13 09:56:37 [multiproc_executor.py:520] Parent process exited, terminating worker
  739. [rank2]:[W913 09:56:38.764217990 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=88, addr=[::ffff:127.0.0.1]:55080, remote=[::ffff:127.0.0.1]:34719): Broken pipe
  740. Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
  741. frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7a49622005e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
  742. frame #1: <unknown function> + 0x5ba8bfe (0x7a49b18dcbfe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  743. frame #2: <unknown function> + 0x5baa458 (0x7a49b18de458 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  744. frame #3: <unknown function> + 0x5babc3e (0x7a49b18dfc3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  745. frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7a49b18d9298 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  746. frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7a4962f7bad9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
  747. frame #6: <unknown function> + 0xdc253 (0x7a49531b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
  748. frame #7: <unknown function> + 0x94ac3 (0x7a49cdf6dac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
  749. frame #8: clone + 0x44 (0x7a49cdffea04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
  750.  
  751. [rank2]:[W913 09:56:38.777772745 ProcessGroupNCCL.cpp:1662] [PG ID 0 PG GUID 0 Rank 2] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
  752. [rank1]:[W913 09:56:38.119383763 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=87, addr=[::ffff:127.0.0.1]:55056, remote=[::ffff:127.0.0.1]:34719): Broken pipe
  753. Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
  754. frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7d1bb26005e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
  755. frame #1: <unknown function> + 0x5ba8bfe (0x7d1c01cdcbfe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  756. frame #2: <unknown function> + 0x5baa458 (0x7d1c01cde458 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  757. frame #3: <unknown function> + 0x5babc3e (0x7d1c01cdfc3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  758. frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7d1c01cd9298 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  759. frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7d1bb337bad9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
  760. frame #6: <unknown function> + 0xdc253 (0x7d1ba35b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
  761. frame #7: <unknown function> + 0x94ac3 (0x7d1c1e332ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
  762. frame #8: clone + 0x44 (0x7d1c1e3c3a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
  763.  
  764. [rank1]:[W913 09:56:38.133034505 ProcessGroupNCCL.cpp:1662] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
  765. [rank2]:[W913 09:56:39.777992842 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=88, addr=[::ffff:127.0.0.1]:55080, remote=[::ffff:127.0.0.1]:34719): Broken pipe
  766. Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
  767. frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7a49622005e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
  768. frame #1: <unknown function> + 0x5ba8bfe (0x7a49b18dcbfe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  769. frame #2: <unknown function> + 0x5baa458 (0x7a49b18de458 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  770. frame #3: <unknown function> + 0x5babc3e (0x7a49b18dfc3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  771. frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7a49b18d9298 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  772. frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7a4962f7bad9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
  773. frame #6: <unknown function> + 0xdc253 (0x7a49531b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
  774. frame #7: <unknown function> + 0x94ac3 (0x7a49cdf6dac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
  775. frame #8: clone + 0x44 (0x7a49cdffea04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
  776.  
  777. [rank2]:[W913 09:56:39.791498628 ProcessGroupNCCL.cpp:1662] [PG ID 0 PG GUID 0 Rank 2] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
  778. [rank1]:[W913 09:56:39.133254134 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=87, addr=[::ffff:127.0.0.1]:55056, remote=[::ffff:127.0.0.1]:34719): Broken pipe
  779. Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
  780. frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7d1bb26005e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
  781. frame #1: <unknown function> + 0x5ba8bfe (0x7d1c01cdcbfe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  782. frame #2: <unknown function> + 0x5baa458 (0x7d1c01cde458 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  783. frame #3: <unknown function> + 0x5babc3e (0x7d1c01cdfc3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  784. frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7d1c01cd9298 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  785. frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7d1bb337bad9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
  786. frame #6: <unknown function> + 0xdc253 (0x7d1ba35b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
  787. frame #7: <unknown function> + 0x94ac3 (0x7d1c1e332ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
  788. frame #8: clone + 0x44 (0x7d1c1e3c3a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
  789.  
  790. [rank1]:[W913 09:56:39.146566590 ProcessGroupNCCL.cpp:1662] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
  791. [rank2]:[W913 09:56:40.791735080 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=88, addr=[::ffff:127.0.0.1]:55080, remote=[::ffff:127.0.0.1]:34719): Broken pipe
  792. Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
  793. frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7a49622005e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
  794. frame #1: <unknown function> + 0x5ba8bfe (0x7a49b18dcbfe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  795. frame #2: <unknown function> + 0x5baa458 (0x7a49b18de458 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  796. frame #3: <unknown function> + 0x5babc3e (0x7a49b18dfc3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  797. frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7a49b18d9298 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  798. frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7a4962f7bad9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
  799. frame #6: <unknown function> + 0xdc253 (0x7a49531b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
  800. frame #7: <unknown function> + 0x94ac3 (0x7a49cdf6dac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
  801. frame #8: clone + 0x44 (0x7a49cdffea04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
  802.  
  803. [rank2]:[W913 09:56:40.805202286 ProcessGroupNCCL.cpp:1662] [PG ID 0 PG GUID 0 Rank 2] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
  804. [rank1]:[W913 09:56:40.146829547 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=87, addr=[::ffff:127.0.0.1]:55056, remote=[::ffff:127.0.0.1]:34719): Broken pipe
  805. Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
  806. frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7d1bb26005e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
  807. frame #1: <unknown function> + 0x5ba8bfe (0x7d1c01cdcbfe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  808. frame #2: <unknown function> + 0x5baa458 (0x7d1c01cde458 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  809. frame #3: <unknown function> + 0x5babc3e (0x7d1c01cdfc3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  810. frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7d1c01cd9298 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  811. frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7d1bb337bad9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
  812. frame #6: <unknown function> + 0xdc253 (0x7d1ba35b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
  813. frame #7: <unknown function> + 0x94ac3 (0x7d1c1e332ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
  814. frame #8: clone + 0x44 (0x7d1c1e3c3a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
  815.  
  816. [rank1]:[W913 09:56:40.160393881 ProcessGroupNCCL.cpp:1662] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
  817. [rank2]:[W913 09:56:41.805410942 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=88, addr=[::ffff:127.0.0.1]:55080, remote=[::ffff:127.0.0.1]:34719): Broken pipe
  818. Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
  819. frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7a49622005e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
  820. frame #1: <unknown function> + 0x5ba8bfe (0x7a49b18dcbfe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  821. frame #2: <unknown function> + 0x5baa458 (0x7a49b18de458 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  822. frame #3: <unknown function> + 0x5babc3e (0x7a49b18dfc3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  823. frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7a49b18d9298 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  824. frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7a4962f7bad9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
  825. frame #6: <unknown function> + 0xdc253 (0x7a49531b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
  826. frame #7: <unknown function> + 0x94ac3 (0x7a49cdf6dac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
  827. frame #8: clone + 0x44 (0x7a49cdffea04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
  828.  
  829. [rank2]:[W913 09:56:41.818811019 ProcessGroupNCCL.cpp:1662] [PG ID 0 PG GUID 0 Rank 2] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
  830. [rank1]:[W913 09:56:41.160617985 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=87, addr=[::ffff:127.0.0.1]:55056, remote=[::ffff:127.0.0.1]:34719): Broken pipe
  831. Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
  832. frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7d1bb26005e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
  833. frame #1: <unknown function> + 0x5ba8bfe (0x7d1c01cdcbfe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  834. frame #2: <unknown function> + 0x5baa458 (0x7d1c01cde458 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  835. frame #3: <unknown function> + 0x5babc3e (0x7d1c01cdfc3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  836. frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7d1c01cd9298 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
  837. frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7d1bb337bad9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
  838. frame #6: <unknown function> + 0xdc253 (0x7d1ba35b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
  839. frame #7: <unknown function> + 0x94ac3 (0x7d1c1e332ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
  840. frame #8: clone + 0x44 (0x7d1c1e3c3a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
  841.  
  842. [rank1]:[W913 09:56:41.174031968 ProcessGroupNCCL.cpp:1662] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
  843. (EngineCore_0 pid=270) Process EngineCore_0:
  844. (EngineCore_0 pid=270) Traceback (most recent call last):
  845. (EngineCore_0 pid=270) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
  846. (EngineCore_0 pid=270) self.run()
  847. (EngineCore_0 pid=270) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
  848. (EngineCore_0 pid=270) self._target(*self._args, **self._kwargs)
  849. (EngineCore_0 pid=270) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 704, in run_engine_core
  850. (EngineCore_0 pid=270) raise e
  851. (EngineCore_0 pid=270) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 691, in run_engine_core
  852. (EngineCore_0 pid=270) engine_core = EngineCoreProc(*args, **kwargs)
  853. (EngineCore_0 pid=270) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  854. (EngineCore_0 pid=270) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 492, in __init__
  855. (EngineCore_0 pid=270) super().__init__(vllm_config, executor_class, log_stats,
  856. (EngineCore_0 pid=270) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 89, in __init__
  857. (EngineCore_0 pid=270) self._initialize_kv_caches(vllm_config)
  858. (EngineCore_0 pid=270) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 179, in _initialize_kv_caches
  859. (EngineCore_0 pid=270) self.model_executor.determine_available_memory())
  860. (EngineCore_0 pid=270) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  861. (EngineCore_0 pid=270) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 76, in determine_available_memory
  862. (EngineCore_0 pid=270) output = self.collective_rpc("determine_available_memory")
  863. (EngineCore_0 pid=270) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  864. (EngineCore_0 pid=270) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 243, in collective_rpc
  865. (EngineCore_0 pid=270) result = get_response(w, dequeue_timeout)
  866. (EngineCore_0 pid=270) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  867. (EngineCore_0 pid=270) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 230, in get_response
  868. (EngineCore_0 pid=270) raise RuntimeError(
  869. (EngineCore_0 pid=270) RuntimeError: Worker failed with error 'CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`', please check the stack trace above for the root cause
  870. (APIServer pid=1) Traceback (most recent call last):
  871. (APIServer pid=1) File "<frozen runpy>", line 198, in _run_module_as_main
  872. (APIServer pid=1) File "<frozen runpy>", line 88, in _run_code
  873. (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1920, in <module>
  874. (APIServer pid=1) uvloop.run(run_server(args))
  875. (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
  876. (APIServer pid=1) return __asyncio.run(
  877. (APIServer pid=1) ^^^^^^^^^^^^^^
  878. (APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
  879. (APIServer pid=1) return runner.run(main)
  880. (APIServer pid=1) ^^^^^^^^^^^^^^^^
  881. (APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
  882. (APIServer pid=1) return self._loop.run_until_complete(task)
  883. (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  884. (APIServer pid=1) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  885. (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
  886. (APIServer pid=1) return await main
  887. (APIServer pid=1) ^^^^^^^^^^
  888. (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1850, in run_server
  889. (APIServer pid=1) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
  890. (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1870, in run_server_worker
  891. (APIServer pid=1) async with build_async_engine_client(
  892. (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
  893. (APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
  894. (APIServer pid=1) return await anext(self.gen)
  895. (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
  896. (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 178, in build_async_engine_client
  897. (APIServer pid=1) async with build_async_engine_client_from_engine_args(
  898. (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  899. (APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
  900. (APIServer pid=1) return await anext(self.gen)
  901. (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
  902. (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 220, in build_async_engine_client_from_engine_args
  903. (APIServer pid=1) async_llm = AsyncLLM.from_vllm_config(
  904. (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
  905. (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 1557, in inner
  906. (APIServer pid=1) return fn(*args, **kwargs)
  907. (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^
  908. (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 174, in from_vllm_config
  909. (APIServer pid=1) return cls(
  910. (APIServer pid=1) ^^^^
  911. (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 120, in __init__
  912. (APIServer pid=1) self.engine_core = EngineCoreClient.make_async_mp_client(
  913. (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  914. (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
  915. (APIServer pid=1) return AsyncMPClient(*client_args)
  916. (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  917. (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 767, in __init__
  918. (APIServer pid=1) super().__init__(
  919. (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 446, in __init__
  920. (APIServer pid=1) with launch_core_engines(vllm_config, executor_class,
  921. (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  922. (APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
  923. (APIServer pid=1) next(self.gen)
  924. (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 706, in launch_core_engines
  925. (APIServer pid=1) wait_for_engine_startup(
  926. (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 759, in wait_for_engine_startup
  927. (APIServer pid=1) raise RuntimeError("Engine core initialization failed. "
  928. (APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {'EngineCore_0': 1}
  929. /usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 2 leaked semaphore objects to clean up at shutdown
  930. warnings.warn('resource_tracker: There appear to be %d '
  931. /usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 2 leaked shared_memory objects to clean up at shutdown
  932. warnings.warn('resource_tracker: There appear to be %d '
  933. ```
Advertisement
Add Comment
Please, Sign In to add comment