Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- (APIServer pid=5356) INFO 05-02 22:27:08 [utils.py:299]
- (APIServer pid=5356) INFO 05-02 22:27:08 [utils.py:299] █ █ █▄ ▄█
- (APIServer pid=5356) INFO 05-02 22:27:08 [utils.py:299] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.19.0
- (APIServer pid=5356) INFO 05-02 22:27:08 [utils.py:299] █▄█▀ █ █ █ █ model H:\qwen3.6-windows-server\models\Qwen3.6-27B-int4-AutoRound
- (APIServer pid=5356) INFO 05-02 22:27:08 [utils.py:299] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀
- (APIServer pid=5356) INFO 05-02 22:27:08 [utils.py:299]
- (APIServer pid=5356) INFO 05-02 22:27:08 [utils.py:233] non-default args: {'model_tag': 'H:\\qwen3.6-windows-server\\models\\Qwen3.6-27B-int4-AutoRound', 'chat_template': 'H:\\qwen3.6-windows-server\\templates\\qwen3.5-enhanced.jinja', 'default_chat_template_kwargs': {'preserve_thinking': False}, 'enable_auto_tool_choice': True, 'tool_call_parser': 'qwen3_coder', 'host': '0.0.0.0', 'port': 5001, 'model': 'H:\\qwen3.6-windows-server\\models\\Qwen3.6-27B-int4-AutoRound', 'trust_remote_code': True, 'max_model_len': 120000, 'quantization': 'auto-round', 'served_model_name': ['qwen3.6-27b-autoround'], 'use_tqdm_on_load': False, 'attention_backend': 'TRITON_ATTN', 'reasoning_parser': 'qwen3', 'block_size': 32, 'gpu_memory_utilization': 0.948, 'kv_cache_dtype': 'fp8_e4m3', 'enable_prefix_caching': True, 'limit_mm_per_prompt': {'image': 0, 'video': 0}, 'max_num_batched_tokens': 4128, 'max_num_seqs': 1, 'enable_chunked_prefill': True, 'speculative_config': {'method': 'mtp', 'num_speculative_tokens': 4}}
- (APIServer pid=5356) INFO 05-02 22:27:08 [system_utils.py:279] Windows detected, skipping ulimit adjustment.
- (APIServer pid=5356) WARNING 05-02 22:27:08 [envs.py:1786] Unknown vLLM environment variable detected: VLLM_ATTENTION_BACKEND
- (APIServer pid=5356) WARNING 05-02 22:27:08 [envs.py:1786] Unknown vLLM environment variable detected: VLLM_MODEL_DIR
- (APIServer pid=5356) WARNING 05-02 22:27:08 [envs.py:1786] Unknown vLLM environment variable detected: VLLM_SLEEP_WHEN_IDLE
- (APIServer pid=5356) INFO 05-02 22:27:08 [model.py:554] Resolved architecture: Qwen3_5ForConditionalGeneration
- (APIServer pid=5356) INFO 05-02 22:27:08 [model.py:1685] Using max model len 120000
- (APIServer pid=5356) INFO 05-02 22:27:09 [cache.py:253] Using fp8_e4m3 data type to store kv cache. It reduces the GPU memory footprint and boosts the performance. Meanwhile, it may cause accuracy drop without a proper scaling factor
- (APIServer pid=5356) INFO 05-02 22:27:09 [model.py:554] Resolved architecture: Qwen3_5MTP
- (APIServer pid=5356) INFO 05-02 22:27:09 [model.py:1685] Using max model len 262144
- (APIServer pid=5356) WARNING 05-02 22:27:09 [speculative.py:521] Enabling num_speculative_tokens > 1 will run multiple times of forward on same MTP layer,which may result in lower acceptance rate
- (APIServer pid=5356) INFO 05-02 22:27:09 [scheduler.py:238] Chunked prefill is enabled with max_num_batched_tokens=4128.
- (APIServer pid=5356) WARNING 05-02 22:27:09 [config.py:306] Mamba cache mode is set to 'align' for Qwen3_5ForConditionalGeneration by default when prefix caching is enabled
- (APIServer pid=5356) INFO 05-02 22:27:09 [config.py:326] Warning: Prefix caching in Mamba cache 'align' mode is currently enabled. Its support for Mamba layers is experimental. Please report any issues you may observe.
- (APIServer pid=5356) INFO 05-02 22:27:09 [vllm.py:799] Asynchronous scheduling is enabled.
- (APIServer pid=5356) INFO 05-02 22:27:09 [kernel.py:199] Final IR op priority after setting platform defaults: IrOpPriorityConfig(rms_norm=['native'])
- (APIServer pid=5356) WARNING 05-02 22:27:09 [vllm.py:1362] max_num_scheduled_tokens is set to 4128 based on the speculative decoding settings. This may lead to suboptimal performance. Consider increasing max_num_batched_tokens to accommodate the additional draft token slots, or decrease num_speculative_tokens or max_num_seqs.
- (APIServer pid=5356) INFO 05-02 22:27:10 [registry.py:126] All limits of multimodal modalities supported by the model are set to 0, running in text-only mode.
- (EngineCore pid=25592) INFO 05-02 22:27:19 [core.py:105] Initializing a V1 LLM engine (v0.19.0) with config: model='H:\\qwen3.6-windows-server\\models\\Qwen3.6-27B-int4-AutoRound', speculative_config=SpeculativeConfig(method='mtp', model='H:\\qwen3.6-windows-server\\models\\Qwen3.6-27B-int4-AutoRound', num_spec_tokens=4), tokenizer='H:\\qwen3.6-windows-server\\models\\Qwen3.6-27B-int4-AutoRound', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=120000, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, decode_context_parallel_size=1, dcp_comm_backend=ag_rs, disable_custom_all_reduce=False, quantization=inc, quantization_config=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=fp8_e4m3, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='qwen3', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=qwen3.6-27b-autoround, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'ir_enable_torch_wrap': True, 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::olmo_hybrid_gdn_full_forward', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update', 'vllm::unified_mla_kv_cache_update'], 'compile_mm_encoder': False, 'cudagraph_mm_encoder': False, 'encoder_cudagraph_token_budgets': [], 'encoder_cudagraph_max_images_per_batch': 0, 'compile_sizes': [], 'compile_ranges_endpoints': [4128], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'size_asserts': False, 'alignment_asserts': False, 'scalar_asserts': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 8, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': False, 'static_all_moe_layers': []}, kernel_config=KernelConfig(ir_op_priority=IrOpPriorityConfig(rms_norm=['native']), enable_flashinfer_autotune=True, moe_backend='auto')
- (EngineCore pid=25592) INFO 05-02 22:27:20 [registry.py:126] All limits of multimodal modalities supported by the model are set to 0, running in text-only mode.
- (EngineCore pid=25592) INFO 05-02 22:27:20 [parallel_state.py:1455] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://172.19.0.1:55065 backend=gloo
- [W502 22:27:20.000000000 socket.cpp:764] [c10d] The client socket has failed to connect to [Shush]:55065 (system error: 10049 - .).
- [W502 22:27:20.000000000 socket.cpp:764] [c10d] The client socket has failed to connect to [Shush]:55065 (system error: 10049 - .).
- (EngineCore pid=25592) INFO 05-02 22:27:21 [parallel_state.py:1767] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A, EPLB rank N/A
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] EngineCore failed to start.
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] Traceback (most recent call last):
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\engine\core.py", line 1082, in run_engine_core
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\tracing\otel.py", line 178, in sync_wrapper
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] return func(*args, **kwargs)
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\engine\core.py", line 848, in __init__
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] super().__init__(
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\engine\core.py", line 114, in __init__
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] self.model_executor = executor_class(vllm_config)
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\tracing\otel.py", line 178, in sync_wrapper
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] return func(*args, **kwargs)
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\executor\abstract.py", line 109, in __init__
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] self._init_executor()
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\executor\uniproc_executor.py", line 47, in _init_executor
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] self.driver_worker.init_device()
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\worker\worker_base.py", line 312, in init_device
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] self.worker.init_device() # type: ignore
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\tracing\otel.py", line 178, in sync_wrapper
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] return func(*args, **kwargs)
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\worker\gpu_worker.py", line 311, in init_device
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] self.model_runner = GPUModelRunnerV1(self.vllm_config, self.device)
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\worker\gpu_model_runner.py", line 480, in __init__
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] self.sampler = Sampler(logprobs_mode=self.model_config.logprobs_mode)
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\sample\sampler.py", line 64, in __init__
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] self.topk_topp_sampler = TopKTopPSampler(logprobs_mode)
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\sample\ops\topk_topp_sampler.py", line 40, in __init__
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] from vllm.v1.attention.backends.flashinfer import FlashInferBackend
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\attention\backends\flashinfer.py", line 11, in <module>
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] from flashinfer import (
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] File "H:\qwen3.6-windows-server\python\Lib\site-packages\flashinfer\__init__.py", line 24, in <module>
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] from . import jit as jit
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] File "H:\qwen3.6-windows-server\python\Lib\site-packages\flashinfer\jit\__init__.py", line 125, in <module>
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] raise ValueError(
- (EngineCore pid=25592) ERROR 05-02 22:27:21 [core.py:1108] ValueError: CUDA_LIB_PATH is not set. CUDA_LIB_PATH need to be set with the absolute path to CUDA root folder on Windows (for example, set CUDA_LIB_PATH=C:\CUDA\v12.4)
- (EngineCore pid=25592) Process EngineCore:
- (EngineCore pid=25592) Traceback (most recent call last):
- (EngineCore pid=25592) File "multiprocessing\process.py", line 314, in _bootstrap
- (EngineCore pid=25592) File "multiprocessing\process.py", line 108, in run
- (EngineCore pid=25592) File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\engine\core.py", line 1112, in run_engine_core
- (EngineCore pid=25592) raise e
- (EngineCore pid=25592) File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\engine\core.py", line 1082, in run_engine_core
- (EngineCore pid=25592) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
- (EngineCore pid=25592) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (EngineCore pid=25592) File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\tracing\otel.py", line 178, in sync_wrapper
- (EngineCore pid=25592) return func(*args, **kwargs)
- (EngineCore pid=25592) ^^^^^^^^^^^^^^^^^^^^^
- (EngineCore pid=25592) File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\engine\core.py", line 848, in __init__
- (EngineCore pid=25592) super().__init__(
- (EngineCore pid=25592) File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\engine\core.py", line 114, in __init__
- (EngineCore pid=25592) self.model_executor = executor_class(vllm_config)
- (EngineCore pid=25592) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (EngineCore pid=25592) File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\tracing\otel.py", line 178, in sync_wrapper
- (EngineCore pid=25592) return func(*args, **kwargs)
- (EngineCore pid=25592) ^^^^^^^^^^^^^^^^^^^^^
- (EngineCore pid=25592) File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\executor\abstract.py", line 109, in __init__
- (EngineCore pid=25592) self._init_executor()
- (EngineCore pid=25592) File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\executor\uniproc_executor.py", line 47, in _init_executor
- (EngineCore pid=25592) self.driver_worker.init_device()
- (EngineCore pid=25592) File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\worker\worker_base.py", line 312, in init_device
- (EngineCore pid=25592) self.worker.init_device() # type: ignore
- (EngineCore pid=25592) ^^^^^^^^^^^^^^^^^^^^^^^^^
- (EngineCore pid=25592) File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\tracing\otel.py", line 178, in sync_wrapper
- (EngineCore pid=25592) return func(*args, **kwargs)
- (EngineCore pid=25592) ^^^^^^^^^^^^^^^^^^^^^
- (EngineCore pid=25592) File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\worker\gpu_worker.py", line 311, in init_device
- (EngineCore pid=25592) self.model_runner = GPUModelRunnerV1(self.vllm_config, self.device)
- (EngineCore pid=25592) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (EngineCore pid=25592) File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\worker\gpu_model_runner.py", line 480, in __init__
- (EngineCore pid=25592) self.sampler = Sampler(logprobs_mode=self.model_config.logprobs_mode)
- (EngineCore pid=25592) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (EngineCore pid=25592) File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\sample\sampler.py", line 64, in __init__
- (EngineCore pid=25592) self.topk_topp_sampler = TopKTopPSampler(logprobs_mode)
- (EngineCore pid=25592) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- (EngineCore pid=25592) File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\sample\ops\topk_topp_sampler.py", line 40, in __init__
- (EngineCore pid=25592) from vllm.v1.attention.backends.flashinfer import FlashInferBackend
- (EngineCore pid=25592) File "H:\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\attention\backends\flashinfer.py", line 11, in <module>
- (EngineCore pid=25592) from flashinfer import (
- (EngineCore pid=25592) File "H:\qwen3.6-windows-server\python\Lib\site-packages\flashinfer\__init__.py", line 24, in <module>
- (EngineCore pid=25592) from . import jit as jit
- (EngineCore pid=25592) File "H:\qwen3.6-windows-server\python\Lib\site-packages\flashinfer\jit\__init__.py", line 125, in <module>
- (EngineCore pid=25592) raise ValueError(
- (EngineCore pid=25592) ValueError: CUDA_LIB_PATH is not set. CUDA_LIB_PATH need to be set with the absolute path to CUDA root folder on Windows (for example, set CUDA_LIB_PATH=C:\CUDA\v12.4)
- forrtl: error (200): program aborting due to window-CLOSE event
- Image PC Routine Line Source
- KERNELBASE.dll 00007FFDD586A3CD Unknown Unknown Unknown
- KERNEL32.DLL 00007FFDD794E8D7 Unknown Unknown Unknown
- ntdll.dll 00007FFDD828C3FC Unknown Unknown Unknown
Add Comment
Please, Sign In to add comment