Guest User

Untitled

a guest
Feb 16th, 2024
126
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 5.54 KB | None | 0 0
  1. 2024-02-16 22:15:28.531
  2. [2akn5byerrxpel]
  3. [info]
  4. Finished running generator.
  5. 2024-02-16 22:15:28.531
  6. [2akn5byerrxpel]
  7. [info]
  8. --- Starting Serverless Worker | Version 1.5.3 ---
  9. 2024-02-16 22:15:27.699
  10. [2akn5byerrxpel]
  11. [info]
  12. INFO 02-17 03:15:27 model_runner.py:689] Graph capturing finished in 4 secs.
  13. 2024-02-16 22:15:27.699
  14. [2akn5byerrxpel]
  15. [info]
  16. INFO 02-17 03:15:22 model_runner.py:629] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
  17. 2024-02-16 22:15:27.699
  18. [2akn5byerrxpel]
  19. [info]
  20. INFO 02-17 03:15:22 model_runner.py:625] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
  21. 2024-02-16 22:15:27.699
  22. [2akn5byerrxpel]
  23. [info]
  24. INFO 02-17 03:15:20 llm_engine.py:316] # GPU blocks: 2214, # CPU blocks: 2048
  25. 2024-02-16 22:09:59.747
  26. [dj6at93clwgje7]
  27. [info]
  28. ValueError: The model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (24144). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.
  29. 2024-02-16 22:09:59.747
  30. [dj6at93clwgje7]
  31. [info]
  32. raise ValueError(
  33. 2024-02-16 22:09:59.747
  34. [dj6at93clwgje7]
  35. [info]
  36. File "/vllm-installation/vllm/engine/llm_engine.py", line 325, in _init_cache
  37. 2024-02-16 22:09:59.747
  38. [dj6at93clwgje7]
  39. [info]
  40. self._init_cache()
  41. 2024-02-16 22:09:59.747
  42. [dj6at93clwgje7]
  43. [info]
  44. File "/vllm-installation/vllm/engine/llm_engine.py", line 112, in __init__
  45. 2024-02-16 22:09:59.747
  46. [dj6at93clwgje7]
  47. [info]
  48. return engine_class(*args, **kwargs)
  49. 2024-02-16 22:09:59.747
  50. [dj6at93clwgje7]
  51. [info]
  52. File "/vllm-installation/vllm/engine/async_llm_engine.py", line 366, in _init_engine
  53. 2024-02-16 22:09:59.747
  54. [dj6at93clwgje7]
  55. [info]
  56. self.engine = self._init_engine(*args, **kwargs)
  57. 2024-02-16 22:09:59.747
  58. [dj6at93clwgje7]
  59. [info]
  60. File "/vllm-installation/vllm/engine/async_llm_engine.py", line 321, in __init__
  61. 2024-02-16 22:09:59.747
  62. [dj6at93clwgje7]
  63. [info]
  64. engine = cls(parallel_config.worker_use_ray,
  65. 2024-02-16 22:09:59.747
  66. [dj6at93clwgje7]
  67. [info]
  68. File "/vllm-installation/vllm/engine/async_llm_engine.py", line 617, in from_engine_args
  69. 2024-02-16 22:09:59.747
  70. [dj6at93clwgje7]
  71. [info]
  72. return AsyncLLMEngine.from_engine_args(AsyncEngineArgs(**self.config))
  73. 2024-02-16 22:09:59.747
  74. [dj6at93clwgje7]
  75. [info]
  76. File "/src/engine.py", line 192, in _initialize_llm
  77. 2024-02-16 22:09:59.747
  78. [dj6at93clwgje7]
  79. [info]
  80. raise e
  81. 2024-02-16 22:09:59.747
  82. [dj6at93clwgje7]
  83. [info]
  84. File "/src/engine.py", line 195, in _initialize_llm
  85. 2024-02-16 22:09:59.747
  86. [dj6at93clwgje7]
  87. [info]
  88. self.llm = self._initialize_llm() if engine is None else engine
  89. 2024-02-16 22:09:59.747
  90. [dj6at93clwgje7]
  91. [info]
  92. File "/src/engine.py", line 45, in __init__
  93. 2024-02-16 22:09:59.747
  94. [dj6at93clwgje7]
  95. [info]
  96. vllm_engine = vLLMEngine()
  97. 2024-02-16 22:09:59.747
  98. [dj6at93clwgje7]
  99. [info]
  100. File "/src/handler.py", line 5, in <module>
  101. 2024-02-16 22:09:59.747
  102. [dj6at93clwgje7]
  103. [info]
  104. Traceback (most recent call last):
  105. 2024-02-16 22:09:59.747
  106. [dj6at93clwgje7]
  107. [info]
  108. engine.py :194 2024-02-17 03:09:59,747 Error initializing vLLM engine: The model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (24144). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.
  109. 2024-02-16 22:09:59.747
  110. [dj6at93clwgje7]
  111. [info]
  112. INFO 02-17 03:09:59 llm_engine.py:316] # GPU blocks: 1509, # CPU blocks: 2048
  113. 2024-02-16 22:09:48.879
  114. [dj6at93clwgje7]
  115. [info]
  116. INFO 02-17 03:09:48 llm_engine.py:72] Initializing an LLM engine with config: model='/models/huggingface-cache/hub/models--mistralai--Mistral-7B-v0.1/snapshots/26bca36bde8333b5d7f72e9ed20ccda6a618af24', tokenizer='/models/huggingface-cache/hub/models--mistralai--Mistral-7B-v0.1/snapshots/26bca36bde8333b5d7f72e9ed20ccda6a618af24', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, enforce_eager=False, seed=0)
  117. 2024-02-16 22:09:48.835
  118. [dj6at93clwgje7]
  119. [info]
  120. engine.py :43 2024-02-17 03:09:48,835 vLLM config: {'model': '/models/huggingface-cache/hub/models--mistralai--Mistral-7B-v0.1/snapshots/26bca36bde8333b5d7f72e9ed20ccda6a618af24', 'revision': None, 'download_dir': None, 'quantization': None, 'load_format': 'auto', 'dtype': 'auto', 'tokenizer': '/models/huggingface-cache/hub/models--mistralai--Mistral-7B-v0.1/snapshots/26bca36bde8333b5d7f72e9ed20ccda6a618af24', 'tokenizer_revision': None, 'disable_log_stats': True, 'disable_log_requests': True, 'trust_remote_code': False, 'gpu_memory_utilization': 0.95, 'max_parallel_loading_workers': 24, 'max_model_len': None, 'tensor_parallel_size': 1}
  121. 2024-02-16 22:09:48.834
  122. [dj6at93clwgje7]
  123. [info]
  124. engine.py :212 2024-02-17 03:09:48,834 Using local model at /models/huggingface-cache/hub/models--mistralai--Mistral-7B-v0.1/snapshots/26bca36bde8333b5d7f72e9ed20ccda6a618af24
  125. 2024-02-16 22:09:46.804
  126. [dj6at93clwgje7]
  127. [info]
  128. ValueError: The model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (24144). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.
  129. 2024-02-16 22:09:46.804
  130. [dj6at93clwgje7]
  131. [info]
  132. raise ValueError(
  133.  
Advertisement
Add Comment
Please, Sign In to add comment