Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ```INFO 09-13 09:49:49 [__init__.py:241] Automatically detected platform cuda.
- (APIServer pid=1) INFO 09-13 09:49:51 [api_server.py:1805] vLLM API server version 0.10.1.1
- (APIServer pid=1) INFO 09-13 09:49:51 [utils.py:326] non-default args: {'model': 'meta-llama/Llama-2-70b-chat-hf', 'download_dir': '/root/.cache/huggingface', 'tensor_parallel_size': 4}
- (APIServer pid=1) INFO 09-13 09:49:56 [__init__.py:711] Resolved architecture: LlamaForCausalLM
- (APIServer pid=1) INFO 09-13 09:49:56 [__init__.py:1750] Using max model len 4096
- (APIServer pid=1) INFO 09-13 09:49:56 [scheduler.py:222] Chunked prefill is enabled with max_num_batched_tokens=8192.
- INFO 09-13 09:50:00 [__init__.py:241] Automatically detected platform cuda.
- (EngineCore_0 pid=270) INFO 09-13 09:50:01 [core.py:636] Waiting for init message from front-end.
- (EngineCore_0 pid=270) INFO 09-13 09:50:01 [core.py:74] Initializing a V1 LLM engine (v0.10.1.1) with config: model='meta-llama/Llama-2-70b-chat-hf', speculative_config=None, tokenizer='meta-llama/Llama-2-70b-chat-hf', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=4096, download_dir='/root/.cache/huggingface', load_format=auto, tensor_parallel_size=4, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=meta-llama/Llama-2-70b-chat-hf, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":1,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null}
- (EngineCore_0 pid=270) WARNING 09-13 09:50:01 [multiproc_worker_utils.py:273] Reducing Torch parallelism from 48 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
- (EngineCore_0 pid=270) INFO 09-13 09:50:01 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0, 1, 2, 3], buffer_handle=(4, 16777216, 10, 'psm_ce29d5b4'), local_subscribe_addr='ipc:///tmp/6bb50716-e0bf-4447-8722-037fdcb01d89', remote_subscribe_addr=None, remote_addr_ipv6=False)
- INFO 09-13 09:50:03 [__init__.py:241] Automatically detected platform cuda.
- INFO 09-13 09:50:03 [__init__.py:241] Automatically detected platform cuda.
- INFO 09-13 09:50:04 [__init__.py:241] Automatically detected platform cuda.
- INFO 09-13 09:50:04 [__init__.py:241] Automatically detected platform cuda.
- (VllmWorker TP0 pid=404) INFO 09-13 09:50:06 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_ddf599bb'), local_subscribe_addr='ipc:///tmp/39f8b153-8786-4c4e-b4a1-56d440039898', remote_subscribe_addr=None, remote_addr_ipv6=False)
- (VllmWorker TP2 pid=406) INFO 09-13 09:50:06 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_db502135'), local_subscribe_addr='ipc:///tmp/a40199d2-2072-401d-b5e0-1dab9c842432', remote_subscribe_addr=None, remote_addr_ipv6=False)
- (VllmWorker TP3 pid=407) INFO 09-13 09:50:06 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_21510043'), local_subscribe_addr='ipc:///tmp/dbeac1ac-25e3-4650-8917-68aaa5d0016a', remote_subscribe_addr=None, remote_addr_ipv6=False)
- (VllmWorker TP1 pid=405) INFO 09-13 09:50:06 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_83241dcf'), local_subscribe_addr='ipc:///tmp/4acf2061-fa03-449e-922b-9170920263b5', remote_subscribe_addr=None, remote_addr_ipv6=False)
- (VllmWorker TP0 pid=404) INFO 09-13 09:50:07 [__init__.py:1418] Found nccl from library libnccl.so.2
- (VllmWorker TP0 pid=404) INFO 09-13 09:50:07 [pynccl.py:70] vLLM is using nccl==2.26.2
- (VllmWorker TP1 pid=405) INFO 09-13 09:50:07 [__init__.py:1418] Found nccl from library libnccl.so.2
- (VllmWorker TP3 pid=407) INFO 09-13 09:50:07 [__init__.py:1418] Found nccl from library libnccl.so.2
- (VllmWorker TP2 pid=406) INFO 09-13 09:50:07 [__init__.py:1418] Found nccl from library libnccl.so.2
- (VllmWorker TP1 pid=405) INFO 09-13 09:50:07 [pynccl.py:70] vLLM is using nccl==2.26.2
- (VllmWorker TP3 pid=407) INFO 09-13 09:50:07 [pynccl.py:70] vLLM is using nccl==2.26.2
- (VllmWorker TP2 pid=406) INFO 09-13 09:50:07 [pynccl.py:70] vLLM is using nccl==2.26.2
- (VllmWorker TP3 pid=407) WARNING 09-13 09:55:29 [custom_all_reduce.py:137] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
- (VllmWorker TP1 pid=405) WARNING 09-13 09:55:29 [custom_all_reduce.py:137] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
- (VllmWorker TP0 pid=404) WARNING 09-13 09:55:29 [custom_all_reduce.py:137] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
- (VllmWorker TP2 pid=406) WARNING 09-13 09:55:29 [custom_all_reduce.py:137] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
- (VllmWorker TP0 pid=404) INFO 09-13 09:55:29 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[1, 2, 3], buffer_handle=(3, 4194304, 6, 'psm_a9b6d09f'), local_subscribe_addr='ipc:///tmp/5ca9f5b7-b70e-4963-9312-f410020ebb81', remote_subscribe_addr=None, remote_addr_ipv6=False)
- (VllmWorker TP2 pid=406) INFO 09-13 09:55:29 [parallel_state.py:1134] rank 2 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 2, EP rank 2
- (VllmWorker TP0 pid=404) INFO 09-13 09:55:29 [parallel_state.py:1134] rank 0 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
- (VllmWorker TP3 pid=407) INFO 09-13 09:55:29 [parallel_state.py:1134] rank 3 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 3, EP rank 3
- (VllmWorker TP1 pid=405) INFO 09-13 09:55:29 [parallel_state.py:1134] rank 1 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 1, EP rank 1
- (VllmWorker TP2 pid=406) INFO 09-13 09:55:29 [topk_topp_sampler.py:50] Using FlashInfer for top-p & top-k sampling.
- (VllmWorker TP0 pid=404) INFO 09-13 09:55:29 [topk_topp_sampler.py:50] Using FlashInfer for top-p & top-k sampling.
- (VllmWorker TP3 pid=407) INFO 09-13 09:55:29 [topk_topp_sampler.py:50] Using FlashInfer for top-p & top-k sampling.
- (VllmWorker TP1 pid=405) INFO 09-13 09:55:29 [topk_topp_sampler.py:50] Using FlashInfer for top-p & top-k sampling.
- (VllmWorker TP2 pid=406) INFO 09-13 09:55:29 [gpu_model_runner.py:1953] Starting to load model meta-llama/Llama-2-70b-chat-hf...
- (VllmWorker TP0 pid=404) INFO 09-13 09:55:29 [gpu_model_runner.py:1953] Starting to load model meta-llama/Llama-2-70b-chat-hf...
- (VllmWorker TP3 pid=407) INFO 09-13 09:55:29 [gpu_model_runner.py:1953] Starting to load model meta-llama/Llama-2-70b-chat-hf...
- (VllmWorker TP1 pid=405) INFO 09-13 09:55:29 [gpu_model_runner.py:1953] Starting to load model meta-llama/Llama-2-70b-chat-hf...
- (VllmWorker TP0 pid=404) INFO 09-13 09:55:29 [gpu_model_runner.py:1985] Loading model from scratch...
- (VllmWorker TP2 pid=406) INFO 09-13 09:55:29 [gpu_model_runner.py:1985] Loading model from scratch...
- (VllmWorker TP2 pid=406) INFO 09-13 09:55:29 [cuda.py:328] Using Flash Attention backend on V1 engine.
- (VllmWorker TP0 pid=404) INFO 09-13 09:55:29 [cuda.py:328] Using Flash Attention backend on V1 engine.
- (VllmWorker TP1 pid=405) INFO 09-13 09:55:29 [gpu_model_runner.py:1985] Loading model from scratch...
- (VllmWorker TP3 pid=407) INFO 09-13 09:55:29 [gpu_model_runner.py:1985] Loading model from scratch...
- (VllmWorker TP1 pid=405) INFO 09-13 09:55:29 [cuda.py:328] Using Flash Attention backend on V1 engine.
- (VllmWorker TP3 pid=407) INFO 09-13 09:55:29 [cuda.py:328] Using Flash Attention backend on V1 engine.
- (VllmWorker TP2 pid=406) INFO 09-13 09:55:30 [weight_utils.py:296] Using model weights format ['*.safetensors']
- (VllmWorker TP0 pid=404) INFO 09-13 09:55:30 [weight_utils.py:296] Using model weights format ['*.safetensors']
- (VllmWorker TP1 pid=405) INFO 09-13 09:55:30 [weight_utils.py:296] Using model weights format ['*.safetensors']
- (VllmWorker TP3 pid=407) INFO 09-13 09:55:30 [weight_utils.py:296] Using model weights format ['*.safetensors']
- Loading safetensors checkpoint shards: 0% Completed | 0/15 [00:00<?, ?it/s]
- Loading safetensors checkpoint shards: 7% Completed | 1/15 [00:00<00:10, 1.37it/s]
- Loading safetensors checkpoint shards: 13% Completed | 2/15 [00:01<00:09, 1.37it/s]
- Loading safetensors checkpoint shards: 20% Completed | 3/15 [00:02<00:08, 1.37it/s]
- Loading safetensors checkpoint shards: 27% Completed | 4/15 [00:02<00:08, 1.34it/s]
- Loading safetensors checkpoint shards: 40% Completed | 6/15 [00:03<00:05, 1.79it/s]
- Loading safetensors checkpoint shards: 47% Completed | 7/15 [00:04<00:04, 1.67it/s]
- Loading safetensors checkpoint shards: 53% Completed | 8/15 [00:05<00:04, 1.60it/s]
- Loading safetensors checkpoint shards: 60% Completed | 9/15 [00:05<00:03, 1.51it/s]
- Loading safetensors checkpoint shards: 67% Completed | 10/15 [00:06<00:03, 1.47it/s]
- Loading safetensors checkpoint shards: 73% Completed | 11/15 [00:07<00:02, 1.43it/s]
- Loading safetensors checkpoint shards: 80% Completed | 12/15 [00:08<00:02, 1.40it/s]
- Loading safetensors checkpoint shards: 87% Completed | 13/15 [00:08<00:01, 1.41it/s]
- Loading safetensors checkpoint shards: 93% Completed | 14/15 [00:09<00:00, 1.39it/s]
- (VllmWorker TP2 pid=406) INFO 09-13 09:55:40 [default_loader.py:262] Loading weights took 10.33 seconds
- Loading safetensors checkpoint shards: 100% Completed | 15/15 [00:10<00:00, 1.41it/s]
- Loading safetensors checkpoint shards: 100% Completed | 15/15 [00:10<00:00, 1.46it/s]
- (VllmWorker TP0 pid=404)
- (VllmWorker TP2 pid=406) INFO 09-13 09:55:40 [gpu_model_runner.py:2007] Model loading took 32.1248 GiB and 10.945041 seconds
- (VllmWorker TP0 pid=404) INFO 09-13 09:55:40 [default_loader.py:262] Loading weights took 10.32 seconds
- (VllmWorker TP0 pid=404) INFO 09-13 09:55:41 [gpu_model_runner.py:2007] Model loading took 32.1248 GiB and 11.342089 seconds
- (VllmWorker TP3 pid=407) INFO 09-13 09:55:41 [default_loader.py:262] Loading weights took 10.86 seconds
- (VllmWorker TP3 pid=407) INFO 09-13 09:55:42 [gpu_model_runner.py:2007] Model loading took 32.1248 GiB and 11.959376 seconds
- (VllmWorker TP1 pid=405) INFO 09-13 09:55:43 [default_loader.py:262] Loading weights took 13.35 seconds
- (VllmWorker TP1 pid=405) INFO 09-13 09:55:44 [gpu_model_runner.py:2007] Model loading took 32.1248 GiB and 14.052152 seconds
- (VllmWorker TP2 pid=406) INFO 09-13 09:55:54 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/9bc7bd544a/rank_2_0/backbone for vLLM's torch.compile
- (VllmWorker TP2 pid=406) INFO 09-13 09:55:54 [backends.py:559] Dynamo bytecode transform time: 9.66 s
- (VllmWorker TP0 pid=404) INFO 09-13 09:55:54 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/9bc7bd544a/rank_0_0/backbone for vLLM's torch.compile
- (VllmWorker TP0 pid=404) INFO 09-13 09:55:54 [backends.py:559] Dynamo bytecode transform time: 9.79 s
- (VllmWorker TP1 pid=405) INFO 09-13 09:55:54 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/9bc7bd544a/rank_1_0/backbone for vLLM's torch.compile
- (VllmWorker TP1 pid=405) INFO 09-13 09:55:54 [backends.py:559] Dynamo bytecode transform time: 10.03 s
- (VllmWorker TP3 pid=407) INFO 09-13 09:55:54 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/9bc7bd544a/rank_3_0/backbone for vLLM's torch.compile
- (VllmWorker TP3 pid=407) INFO 09-13 09:55:54 [backends.py:559] Dynamo bytecode transform time: 10.29 s
- (VllmWorker TP2 pid=406) INFO 09-13 09:55:59 [backends.py:194] Cache the graph for dynamic shape for later use
- (VllmWorker TP0 pid=404) INFO 09-13 09:55:59 [backends.py:194] Cache the graph for dynamic shape for later use
- (VllmWorker TP1 pid=405) INFO 09-13 09:55:59 [backends.py:194] Cache the graph for dynamic shape for later use
- (VllmWorker TP3 pid=407) INFO 09-13 09:55:59 [backends.py:194] Cache the graph for dynamic shape for later use
- (VllmWorker TP0 pid=404) INFO 09-13 09:56:34 [backends.py:215] Compiling a graph for dynamic shape takes 39.38 s
- (VllmWorker TP1 pid=405) INFO 09-13 09:56:34 [backends.py:215] Compiling a graph for dynamic shape takes 39.55 s
- (VllmWorker TP2 pid=406) INFO 09-13 09:56:34 [backends.py:215] Compiling a graph for dynamic shape takes 39.90 s
- (VllmWorker TP3 pid=407) INFO 09-13 09:56:35 [backends.py:215] Compiling a graph for dynamic shape takes 39.90 s
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] WorkerProc hit an exception.
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] Traceback (most recent call last):
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 591, in worker_busy_loop
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = func(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 244, in determine_available_memory
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] self.model_runner.profile_run()
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2622, in profile_run
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] = self._dummy_run(self.max_num_tokens, is_profile=True)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2395, in _dummy_run
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outputs = self.model(
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 577, in forward
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] model_output = self.model(input_ids, positions, intermediate_tensors,
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 272, in __call__
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = self.compiled_callable(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 655, in _fn
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 361, in forward
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] def forward(
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 830, in call_wrapped
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._wrapped_call(self, *args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 406, in __call__
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] raise e
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 393, in __call__
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "<eval_with_key>.162", line 490, in forward
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = None
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 119, in __call__
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.runnable(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 90, in __call__
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.compiled_graph_for_general_shape(*args)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1209, in forward
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(full_args)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 328, in runtime_wrapper
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] all_outs = call_func_at_runtime_with_args(
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 126, in call_func_at_runtime_with_args
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] out = normalize_as_list(f(args))
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 689, in inner_fn
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outs = compiled_fn(args)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 495, in wrapper
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(runtime_args)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 460, in __call__
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.current_callable(inputs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 2404, in run
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return model(new_inputs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/root/.cache/vllm/torch_compile_cache/9bc7bd544a/rank_0_0/inductor_cache/3c/c3cxs4zht2j76bwda7wpi64t6efmxyfdk4bsgojwgwodvkgaxycx.py", line 425, in call
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] extern_kernels.mm(buf4, reinterpret_tensor(arg4_1, (8192, 2560), (1, 8192), 0), out=buf5)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] Traceback (most recent call last):
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 591, in worker_busy_loop
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = func(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 244, in determine_available_memory
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] self.model_runner.profile_run()
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2622, in profile_run
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] = self._dummy_run(self.max_num_tokens, is_profile=True)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2395, in _dummy_run
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outputs = self.model(
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 577, in forward
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] model_output = self.model(input_ids, positions, intermediate_tensors,
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 272, in __call__
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = self.compiled_callable(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 655, in _fn
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 361, in forward
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] def forward(
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 830, in call_wrapped
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._wrapped_call(self, *args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 406, in __call__
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] raise e
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 393, in __call__
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "<eval_with_key>.162", line 490, in forward
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = None
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 119, in __call__
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.runnable(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 90, in __call__
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.compiled_graph_for_general_shape(*args)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1209, in forward
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(full_args)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 328, in runtime_wrapper
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] all_outs = call_func_at_runtime_with_args(
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 126, in call_func_at_runtime_with_args
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] out = normalize_as_list(f(args))
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 689, in inner_fn
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outs = compiled_fn(args)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 495, in wrapper
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(runtime_args)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 460, in __call__
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.current_callable(inputs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 2404, in run
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return model(new_inputs)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/root/.cache/vllm/torch_compile_cache/9bc7bd544a/rank_0_0/inductor_cache/3c/c3cxs4zht2j76bwda7wpi64t6efmxyfdk4bsgojwgwodvkgaxycx.py", line 425, in call
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] extern_kernels.mm(buf4, reinterpret_tensor(arg4_1, (8192, 2560), (1, 8192), 0), out=buf5)
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596] RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
- (VllmWorker TP0 pid=404) ERROR 09-13 09:56:36 [multiproc_executor.py:596]
- (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] EngineCore failed to start.
- (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] Traceback (most recent call last):
- (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 691, in run_engine_core
- (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] engine_core = EngineCoreProc(*args, **kwargs)
- (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 492, in __init__
- (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] super().__init__(vllm_config, executor_class, log_stats,
- (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 89, in __init__
- (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] self._initialize_kv_caches(vllm_config)
- (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 179, in _initialize_kv_caches
- (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] self.model_executor.determine_available_memory())
- (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 76, in determine_available_memory
- (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] output = self.collective_rpc("determine_available_memory")
- (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 243, in collective_rpc
- (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] result = get_response(w, dequeue_timeout)
- (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 230, in get_response
- (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] raise RuntimeError(
- (EngineCore_0 pid=270) ERROR 09-13 09:56:36 [core.py:700] RuntimeError: Worker failed with error 'CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`', please check the stack trace above for the root cause
- (VllmWorker TP2 pid=406) Exception ignored in: <function WeakValueDictionary.__init__.<locals>.remove at 0x7a48c4224540>
- (VllmWorker TP2 pid=406) Traceback (most recent call last):
- (VllmWorker TP2 pid=406) File "/usr/lib/python3.12/weakref.py", line 105, in remove
- (VllmWorker TP2 pid=406) def remove(wr, selfref=ref(self), _atomic_removal=_remove_dead_weakref):
- (VllmWorker TP2 pid=406)
- (VllmWorker TP2 pid=406) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 500, in signal_handler
- (VllmWorker TP2 pid=406) raise SystemExit()
- (VllmWorker TP2 pid=406) SystemExit:
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] WorkerProc hit an exception.
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] Traceback (most recent call last):
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 591, in worker_busy_loop
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = func(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 244, in determine_available_memory
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] self.model_runner.profile_run()
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2622, in profile_run
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] = self._dummy_run(self.max_num_tokens, is_profile=True)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2395, in _dummy_run
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outputs = self.model(
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 577, in forward
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] model_output = self.model(input_ids, positions, intermediate_tensors,
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 272, in __call__
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = self.compiled_callable(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 655, in _fn
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 361, in forward
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] def forward(
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 830, in call_wrapped
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._wrapped_call(self, *args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 406, in __call__
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] raise e
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 393, in __call__
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "<eval_with_key>.162", line 490, in forward
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = None
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 119, in __call__
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.runnable(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 90, in __call__
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.compiled_graph_for_general_shape(*args)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1209, in forward
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(full_args)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 328, in runtime_wrapper
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] all_outs = call_func_at_runtime_with_args(
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 126, in call_func_at_runtime_with_args
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] out = normalize_as_list(f(args))
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 689, in inner_fn
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outs = compiled_fn(args)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 495, in wrapper
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(runtime_args)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 460, in __call__
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.current_callable(inputs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 2404, in run
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return model(new_inputs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/root/.cache/vllm/torch_compile_cache/9bc7bd544a/rank_1_0/inductor_cache/ie/ciezok7spsxdmm4usxwuuqqblbiybh6rysy2j7u5dpkirjtglq5v.py", line 425, in call
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] extern_kernels.mm(buf4, reinterpret_tensor(arg4_1, (8192, 2560), (1, 8192), 0), out=buf5)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] Traceback (most recent call last):
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 591, in worker_busy_loop
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = func(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 244, in determine_available_memory
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] self.model_runner.profile_run()
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2622, in profile_run
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] = self._dummy_run(self.max_num_tokens, is_profile=True)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2395, in _dummy_run
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outputs = self.model(
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 577, in forward
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] model_output = self.model(input_ids, positions, intermediate_tensors,
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 272, in __call__
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = self.compiled_callable(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 655, in _fn
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 361, in forward
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] def forward(
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 830, in call_wrapped
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._wrapped_call(self, *args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 406, in __call__
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] raise e
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 393, in __call__
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "<eval_with_key>.162", line 490, in forward
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = None
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 119, in __call__
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.runnable(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 90, in __call__
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.compiled_graph_for_general_shape(*args)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1209, in forward
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(full_args)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 328, in runtime_wrapper
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] all_outs = call_func_at_runtime_with_args(
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 126, in call_func_at_runtime_with_args
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] out = normalize_as_list(f(args))
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 689, in inner_fn
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outs = compiled_fn(args)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 495, in wrapper
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(runtime_args)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 460, in __call__
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.current_callable(inputs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 2404, in run
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return model(new_inputs)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/root/.cache/vllm/torch_compile_cache/9bc7bd544a/rank_1_0/inductor_cache/ie/ciezok7spsxdmm4usxwuuqqblbiybh6rysy2j7u5dpkirjtglq5v.py", line 425, in call
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] extern_kernels.mm(buf4, reinterpret_tensor(arg4_1, (8192, 2560), (1, 8192), 0), out=buf5)
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596] RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
- (VllmWorker TP1 pid=405) ERROR 09-13 09:56:36 [multiproc_executor.py:596]
- [rank1]:[W913 09:56:36.100649162 TCPStore.cpp:125] [c10d] recvValue failed on SocketImpl(fd=87, addr=[::ffff:127.0.0.1]:55056, remote=[::ffff:127.0.0.1]:34719): failed to recv, got 0 bytes
- Exception raised from recvBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:678 (most recent call first):
- frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7d1bb26005e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
- frame #1: <unknown function> + 0x5ba8bfe (0x7d1c01cdcbfe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #2: <unknown function> + 0x5baaf40 (0x7d1c01cdef40 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #3: <unknown function> + 0x5bab84a (0x7d1c01cdf84a in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x2a9 (0x7d1c01cd92a9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7d1bb337bad9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
- frame #6: <unknown function> + 0xdc253 (0x7d1ba35b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
- frame #7: <unknown function> + 0x94ac3 (0x7d1c1e332ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
- frame #8: clone + 0x44 (0x7d1c1e3c3a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
- [rank1]:[W913 09:56:36.105026078 ProcessGroupNCCL.cpp:1662] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: failed to recv, got 0 bytes
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] WorkerProc hit an exception.
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] Traceback (most recent call last):
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 591, in worker_busy_loop
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = func(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 244, in determine_available_memory
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] self.model_runner.profile_run()
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2622, in profile_run
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] = self._dummy_run(self.max_num_tokens, is_profile=True)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2395, in _dummy_run
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outputs = self.model(
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 577, in forward
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] model_output = self.model(input_ids, positions, intermediate_tensors,
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 272, in __call__
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = self.compiled_callable(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 655, in _fn
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 361, in forward
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] def forward(
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 830, in call_wrapped
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._wrapped_call(self, *args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 406, in __call__
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] raise e
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 393, in __call__
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "<eval_with_key>.162", line 490, in forward
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = None
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 119, in __call__
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.runnable(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 90, in __call__
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.compiled_graph_for_general_shape(*args)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1209, in forward
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(full_args)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 328, in runtime_wrapper
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] all_outs = call_func_at_runtime_with_args(
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 126, in call_func_at_runtime_with_args
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] out = normalize_as_list(f(args))
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 689, in inner_fn
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outs = compiled_fn(args)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 495, in wrapper
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(runtime_args)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 460, in __call__
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.current_callable(inputs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 2404, in run
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return model(new_inputs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/root/.cache/vllm/torch_compile_cache/9bc7bd544a/rank_2_0/inductor_cache/u5/cu5v4fuorjdwtclvndouq4kmjxc5i5y3kusgekcpb7z7m66rhapp.py", line 425, in call
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] extern_kernels.mm(buf4, reinterpret_tensor(arg4_1, (8192, 2560), (1, 8192), 0), out=buf5)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] Traceback (most recent call last):
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 591, in worker_busy_loop
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = func(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 244, in determine_available_memory
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] self.model_runner.profile_run()
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2622, in profile_run
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] = self._dummy_run(self.max_num_tokens, is_profile=True)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return func(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2395, in _dummy_run
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outputs = self.model(
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 577, in forward
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] model_output = self.model(input_ids, positions, intermediate_tensors,
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 272, in __call__
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] output = self.compiled_callable(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 655, in _fn
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 361, in forward
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] def forward(
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 830, in call_wrapped
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._wrapped_call(self, *args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 406, in __call__
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] raise e
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 393, in __call__
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self._call_impl(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return forward_call(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "<eval_with_key>.162", line 490, in forward
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = None
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 119, in __call__
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.runnable(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_piecewise_backend.py", line 90, in __call__
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.compiled_graph_for_general_shape(*args)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return fn(*args, **kwargs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1209, in forward
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(full_args)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 328, in runtime_wrapper
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] all_outs = call_func_at_runtime_with_args(
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 126, in call_func_at_runtime_with_args
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] out = normalize_as_list(f(args))
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 689, in inner_fn
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] outs = compiled_fn(args)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 495, in wrapper
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return compiled_fn(runtime_args)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 460, in __call__
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return self.current_callable(inputs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 2404, in run
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] return model(new_inputs)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] File "/root/.cache/vllm/torch_compile_cache/9bc7bd544a/rank_2_0/inductor_cache/u5/cu5v4fuorjdwtclvndouq4kmjxc5i5y3kusgekcpb7z7m66rhapp.py", line 425, in call
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] extern_kernels.mm(buf4, reinterpret_tensor(arg4_1, (8192, 2560), (1, 8192), 0), out=buf5)
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596] RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
- (VllmWorker TP2 pid=406) ERROR 09-13 09:56:36 [multiproc_executor.py:596]
- [rank2]:[W913 09:56:37.749986675 TCPStore.cpp:125] [c10d] recvValue failed on SocketImpl(fd=88, addr=[::ffff:127.0.0.1]:55080, remote=[::ffff:127.0.0.1]:34719): failed to recv, got 0 bytes
- Exception raised from recvBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:678 (most recent call first):
- frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7a49622005e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
- frame #1: <unknown function> + 0x5ba8bfe (0x7a49b18dcbfe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #2: <unknown function> + 0x5baaf40 (0x7a49b18def40 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #3: <unknown function> + 0x5bab84a (0x7a49b18df84a in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x2a9 (0x7a49b18d92a9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7a4962f7bad9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
- frame #6: <unknown function> + 0xdc253 (0x7a49531b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
- frame #7: <unknown function> + 0x94ac3 (0x7a49cdf6dac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
- frame #8: clone + 0x44 (0x7a49cdffea04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
- [rank2]:[W913 09:56:37.763949114 ProcessGroupNCCL.cpp:1662] [PG ID 0 PG GUID 0 Rank 2] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: failed to recv, got 0 bytes
- [rank1]:[W913 09:56:37.105308347 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=87, addr=[::ffff:127.0.0.1]:55056, remote=[::ffff:127.0.0.1]:34719): Broken pipe
- Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
- frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7d1bb26005e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
- frame #1: <unknown function> + 0x5ba8bfe (0x7d1c01cdcbfe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #2: <unknown function> + 0x5baa458 (0x7d1c01cde458 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #3: <unknown function> + 0x5babc3e (0x7d1c01cdfc3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7d1c01cd9298 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7d1bb337bad9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
- frame #6: <unknown function> + 0xdc253 (0x7d1ba35b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
- frame #7: <unknown function> + 0x94ac3 (0x7d1c1e332ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
- frame #8: clone + 0x44 (0x7d1c1e3c3a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
- [rank1]:[W913 09:56:37.119108427 ProcessGroupNCCL.cpp:1662] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
- (EngineCore_0 pid=270) ERROR 09-13 09:56:37 [multiproc_executor.py:146] Worker proc VllmWorker-0 died unexpectedly, shutting down executor.
- (VllmWorker TP1 pid=405) INFO 09-13 09:56:37 [multiproc_executor.py:520] Parent process exited, terminating worker
- (VllmWorker TP2 pid=406) INFO 09-13 09:56:37 [multiproc_executor.py:520] Parent process exited, terminating worker
- [rank2]:[W913 09:56:38.764217990 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=88, addr=[::ffff:127.0.0.1]:55080, remote=[::ffff:127.0.0.1]:34719): Broken pipe
- Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
- frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7a49622005e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
- frame #1: <unknown function> + 0x5ba8bfe (0x7a49b18dcbfe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #2: <unknown function> + 0x5baa458 (0x7a49b18de458 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #3: <unknown function> + 0x5babc3e (0x7a49b18dfc3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7a49b18d9298 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7a4962f7bad9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
- frame #6: <unknown function> + 0xdc253 (0x7a49531b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
- frame #7: <unknown function> + 0x94ac3 (0x7a49cdf6dac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
- frame #8: clone + 0x44 (0x7a49cdffea04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
- [rank2]:[W913 09:56:38.777772745 ProcessGroupNCCL.cpp:1662] [PG ID 0 PG GUID 0 Rank 2] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
- [rank1]:[W913 09:56:38.119383763 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=87, addr=[::ffff:127.0.0.1]:55056, remote=[::ffff:127.0.0.1]:34719): Broken pipe
- Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
- frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7d1bb26005e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
- frame #1: <unknown function> + 0x5ba8bfe (0x7d1c01cdcbfe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #2: <unknown function> + 0x5baa458 (0x7d1c01cde458 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #3: <unknown function> + 0x5babc3e (0x7d1c01cdfc3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7d1c01cd9298 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7d1bb337bad9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
- frame #6: <unknown function> + 0xdc253 (0x7d1ba35b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
- frame #7: <unknown function> + 0x94ac3 (0x7d1c1e332ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
- frame #8: clone + 0x44 (0x7d1c1e3c3a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
- [rank1]:[W913 09:56:38.133034505 ProcessGroupNCCL.cpp:1662] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
- [rank2]:[W913 09:56:39.777992842 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=88, addr=[::ffff:127.0.0.1]:55080, remote=[::ffff:127.0.0.1]:34719): Broken pipe
- Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
- frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7a49622005e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
- frame #1: <unknown function> + 0x5ba8bfe (0x7a49b18dcbfe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #2: <unknown function> + 0x5baa458 (0x7a49b18de458 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #3: <unknown function> + 0x5babc3e (0x7a49b18dfc3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7a49b18d9298 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7a4962f7bad9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
- frame #6: <unknown function> + 0xdc253 (0x7a49531b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
- frame #7: <unknown function> + 0x94ac3 (0x7a49cdf6dac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
- frame #8: clone + 0x44 (0x7a49cdffea04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
- [rank2]:[W913 09:56:39.791498628 ProcessGroupNCCL.cpp:1662] [PG ID 0 PG GUID 0 Rank 2] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
- [rank1]:[W913 09:56:39.133254134 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=87, addr=[::ffff:127.0.0.1]:55056, remote=[::ffff:127.0.0.1]:34719): Broken pipe
- Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
- frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7d1bb26005e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
- frame #1: <unknown function> + 0x5ba8bfe (0x7d1c01cdcbfe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #2: <unknown function> + 0x5baa458 (0x7d1c01cde458 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #3: <unknown function> + 0x5babc3e (0x7d1c01cdfc3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7d1c01cd9298 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7d1bb337bad9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
- frame #6: <unknown function> + 0xdc253 (0x7d1ba35b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
- frame #7: <unknown function> + 0x94ac3 (0x7d1c1e332ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
- frame #8: clone + 0x44 (0x7d1c1e3c3a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
- [rank1]:[W913 09:56:39.146566590 ProcessGroupNCCL.cpp:1662] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
- [rank2]:[W913 09:56:40.791735080 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=88, addr=[::ffff:127.0.0.1]:55080, remote=[::ffff:127.0.0.1]:34719): Broken pipe
- Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
- frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7a49622005e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
- frame #1: <unknown function> + 0x5ba8bfe (0x7a49b18dcbfe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #2: <unknown function> + 0x5baa458 (0x7a49b18de458 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #3: <unknown function> + 0x5babc3e (0x7a49b18dfc3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7a49b18d9298 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7a4962f7bad9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
- frame #6: <unknown function> + 0xdc253 (0x7a49531b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
- frame #7: <unknown function> + 0x94ac3 (0x7a49cdf6dac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
- frame #8: clone + 0x44 (0x7a49cdffea04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
- [rank2]:[W913 09:56:40.805202286 ProcessGroupNCCL.cpp:1662] [PG ID 0 PG GUID 0 Rank 2] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
- [rank1]:[W913 09:56:40.146829547 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=87, addr=[::ffff:127.0.0.1]:55056, remote=[::ffff:127.0.0.1]:34719): Broken pipe
- Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
- frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7d1bb26005e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
- frame #1: <unknown function> + 0x5ba8bfe (0x7d1c01cdcbfe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #2: <unknown function> + 0x5baa458 (0x7d1c01cde458 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #3: <unknown function> + 0x5babc3e (0x7d1c01cdfc3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7d1c01cd9298 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7d1bb337bad9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
- frame #6: <unknown function> + 0xdc253 (0x7d1ba35b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
- frame #7: <unknown function> + 0x94ac3 (0x7d1c1e332ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
- frame #8: clone + 0x44 (0x7d1c1e3c3a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
- [rank1]:[W913 09:56:40.160393881 ProcessGroupNCCL.cpp:1662] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
- [rank2]:[W913 09:56:41.805410942 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=88, addr=[::ffff:127.0.0.1]:55080, remote=[::ffff:127.0.0.1]:34719): Broken pipe
- Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
- frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7a49622005e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
- frame #1: <unknown function> + 0x5ba8bfe (0x7a49b18dcbfe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #2: <unknown function> + 0x5baa458 (0x7a49b18de458 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #3: <unknown function> + 0x5babc3e (0x7a49b18dfc3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7a49b18d9298 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7a4962f7bad9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
- frame #6: <unknown function> + 0xdc253 (0x7a49531b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
- frame #7: <unknown function> + 0x94ac3 (0x7a49cdf6dac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
- frame #8: clone + 0x44 (0x7a49cdffea04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
- [rank2]:[W913 09:56:41.818811019 ProcessGroupNCCL.cpp:1662] [PG ID 0 PG GUID 0 Rank 2] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
- [rank1]:[W913 09:56:41.160617985 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=87, addr=[::ffff:127.0.0.1]:55056, remote=[::ffff:127.0.0.1]:34719): Broken pipe
- Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
- frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7d1bb26005e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
- frame #1: <unknown function> + 0x5ba8bfe (0x7d1c01cdcbfe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #2: <unknown function> + 0x5baa458 (0x7d1c01cde458 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #3: <unknown function> + 0x5babc3e (0x7d1c01cdfc3e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x298 (0x7d1c01cd9298 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
- frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7d1bb337bad9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
- frame #6: <unknown function> + 0xdc253 (0x7d1ba35b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
- frame #7: <unknown function> + 0x94ac3 (0x7d1c1e332ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
- frame #8: clone + 0x44 (0x7d1c1e3c3a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
- [rank1]:[W913 09:56:41.174031968 ProcessGroupNCCL.cpp:1662] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
- (EngineCore_0 pid=270) Process EngineCore_0:
- (EngineCore_0 pid=270) Traceback (most recent call last):
- (EngineCore_0 pid=270) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
- (EngineCore_0 pid=270) self.run()
- (EngineCore_0 pid=270) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
- (EngineCore_0 pid=270) self._target(*self._args, **self._kwargs)
- (EngineCore_0 pid=270) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 704, in run_engine_core
- (EngineCore_0 pid=270) raise e
- (EngineCore_0 pid=270) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 691, in run_engine_core
- (EngineCore_0 pid=270) engine_core = EngineCoreProc(*args, **kwargs)
- (EngineCore_0 pid=270) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (EngineCore_0 pid=270) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 492, in __init__
- (EngineCore_0 pid=270) super().__init__(vllm_config, executor_class, log_stats,
- (EngineCore_0 pid=270) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 89, in __init__
- (EngineCore_0 pid=270) self._initialize_kv_caches(vllm_config)
- (EngineCore_0 pid=270) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 179, in _initialize_kv_caches
- (EngineCore_0 pid=270) self.model_executor.determine_available_memory())
- (EngineCore_0 pid=270) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (EngineCore_0 pid=270) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 76, in determine_available_memory
- (EngineCore_0 pid=270) output = self.collective_rpc("determine_available_memory")
- (EngineCore_0 pid=270) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (EngineCore_0 pid=270) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 243, in collective_rpc
- (EngineCore_0 pid=270) result = get_response(w, dequeue_timeout)
- (EngineCore_0 pid=270) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (EngineCore_0 pid=270) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 230, in get_response
- (EngineCore_0 pid=270) raise RuntimeError(
- (EngineCore_0 pid=270) RuntimeError: Worker failed with error 'CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`', please check the stack trace above for the root cause
- (APIServer pid=1) Traceback (most recent call last):
- (APIServer pid=1) File "<frozen runpy>", line 198, in _run_module_as_main
- (APIServer pid=1) File "<frozen runpy>", line 88, in _run_code
- (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1920, in <module>
- (APIServer pid=1) uvloop.run(run_server(args))
- (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
- (APIServer pid=1) return __asyncio.run(
- (APIServer pid=1) ^^^^^^^^^^^^^^
- (APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
- (APIServer pid=1) return runner.run(main)
- (APIServer pid=1) ^^^^^^^^^^^^^^^^
- (APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
- (APIServer pid=1) return self._loop.run_until_complete(task)
- (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (APIServer pid=1) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
- (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
- (APIServer pid=1) return await main
- (APIServer pid=1) ^^^^^^^^^^
- (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1850, in run_server
- (APIServer pid=1) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
- (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1870, in run_server_worker
- (APIServer pid=1) async with build_async_engine_client(
- (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
- (APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
- (APIServer pid=1) return await anext(self.gen)
- (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
- (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 178, in build_async_engine_client
- (APIServer pid=1) async with build_async_engine_client_from_engine_args(
- (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
- (APIServer pid=1) return await anext(self.gen)
- (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
- (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 220, in build_async_engine_client_from_engine_args
- (APIServer pid=1) async_llm = AsyncLLM.from_vllm_config(
- (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
- (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 1557, in inner
- (APIServer pid=1) return fn(*args, **kwargs)
- (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^
- (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 174, in from_vllm_config
- (APIServer pid=1) return cls(
- (APIServer pid=1) ^^^^
- (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 120, in __init__
- (APIServer pid=1) self.engine_core = EngineCoreClient.make_async_mp_client(
- (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
- (APIServer pid=1) return AsyncMPClient(*client_args)
- (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 767, in __init__
- (APIServer pid=1) super().__init__(
- (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 446, in __init__
- (APIServer pid=1) with launch_core_engines(vllm_config, executor_class,
- (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
- (APIServer pid=1) next(self.gen)
- (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 706, in launch_core_engines
- (APIServer pid=1) wait_for_engine_startup(
- (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 759, in wait_for_engine_startup
- (APIServer pid=1) raise RuntimeError("Engine core initialization failed. "
- (APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {'EngineCore_0': 1}
- /usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 2 leaked semaphore objects to clean up at shutdown
- warnings.warn('resource_tracker: There appear to be %d '
- /usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 2 leaked shared_memory objects to clean up at shutdown
- warnings.warn('resource_tracker: There appear to be %d '
- ```
Advertisement
Add Comment
Please, Sign In to add comment