Untitled

2025-06-09 10:45:27,211]::[InvokeAI]::INFO --> Executing queue item 532, session 9523b9bf-1d9b-423c-ac4d-874cd211e386
[2025-06-09 10:45:31,389]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '531c0e81-9165-42e3-97f3-9eb7ee890093:text_encoder_2' (T5EncoderModel) onto cuda device in 3.96s. Total model size: 4667.39MB, VRAM: 4667.39MB (100.0%)
[2025-06-09 10:45:31,532]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '531c0e81-9165-42e3-97f3-9eb7ee890093:tokenizer_2' (T5Tokenizer) onto cuda device in 0.00s. Total model size: 0.03MB, VRAM: 0.00MB (0.0%)
/opt/venv/lib/python3.12/site-packages/bitsandbytes/autograd/_functions.py:315: UserWarning: MatMul8bitLt: inputs will be cast from torch.bfloat16 to float16 during quantization
  warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
[2025-06-09 10:45:32,541]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'fff14f82-ca21-486f-90b5-27c224ac4e59:text_encoder' (CLIPTextModel) onto cuda device in 0.11s. Total model size: 469.44MB, VRAM: 469.44MB (100.0%)
[2025-06-09 10:45:32,603]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'fff14f82-ca21-486f-90b5-27c224ac4e59:tokenizer' (CLIPTokenizer) onto cuda device in 0.00s. Total model size: 0.00MB, VRAM: 0.00MB (0.0%)
[2025-06-09 10:45:50,174]::[ModelManagerService]::WARNING --> [MODEL CACHE] Insufficient GPU memory to load model. Aborting
[2025-06-09 10:45:50,179]::[ModelManagerService]::WARNING --> [MODEL CACHE] Insufficient GPU memory to load model. Aborting
[2025-06-09 10:45:50,211]::[InvokeAI]::ERROR --> Error while invoking session 9523b9bf-1d9b-423c-ac4d-874cd211e386, invocation b1c4de60-6b49-4a0a-bb10-862154b16d74 (flux_denoise): CUDA out of memory. Tried to allocate 126.00 MiB. GPU 0 has a total capacity of 23.65 GiB of which 67.50 MiB is free. Process 2287 has 258.00 MiB memory in use. Process 1850797 has 554.22 MiB memory in use. Process 1853540 has 21.97 GiB memory in use. Of the allocated memory 21.63 GiB is allocated by PyTorch, and 31.44 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
[2025-06-09 10:45:50,211]::[InvokeAI]::ERROR --> Traceback (most recent call last):
  File "/opt/invokeai/invokeai/app/services/session_processor/session_processor_default.py", line 129, in run_node
    output = invocation.invoke_internal(context=context, services=self._services)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/invokeai/invokeai/app/invocations/baseinvocation.py", line 241, in invoke_internal
    output = self.invoke(context)
             ^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/invokeai/invokeai/app/invocations/flux_denoise.py", line 155, in invoke
    latents = self._run_diffusion(context)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/invokeai/invokeai/app/invocations/flux_denoise.py", line 335, in _run_diffusion
    (cached_weights, transformer) = exit_stack.enter_context(
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/contextlib.py", line 526, in enter_context
    result = _enter(cm)
             ^^^^^^^^^^
  File "/root/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "/opt/invokeai/invokeai/backend/model_manager/load/load_base.py", line 74, in model_on_device
    self._cache.lock(self._cache_record, working_mem_bytes)
  File "/opt/invokeai/invokeai/backend/model_manager/load/model_cache/model_cache.py", line 53, in wrapper
    return method(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/invokeai/invokeai/backend/model_manager/load/model_cache/model_cache.py", line 336, in lock
    self._load_locked_model(cache_entry, working_mem_bytes)
  File "/opt/invokeai/invokeai/backend/model_manager/load/model_cache/model_cache.py", line 408, in _load_locked_model
    model_bytes_loaded = self._move_model_to_vram(cache_entry, vram_available + MB)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/invokeai/invokeai/backend/model_manager/load/model_cache/model_cache.py", line 432, in _move_model_to_vram
    return cache_entry.cached_model.full_load_to_vram()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/invokeai/invokeai/backend/model_manager/load/model_cache/cached_model/cached_model_only_full_load.py", line 79, in full_load_to_vram
    new_state_dict[k] = v.to(self._compute_device, copy=True)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 126.00 MiB. GPU 0 has a total capacity of 23.65 GiB of which 67.50 MiB is free. Process 2287 has 258.00 MiB memory in use. Process 1850797 has 554.22 MiB memory in use. Process 1853540 has 21.97 GiB memory in use. Of the allocated memory 21.63 GiB is allocated by PyTorch, and 31.44 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
[2025-06-09 10:45:51,961]::[InvokeAI]::INFO --> Graph stats: 9523b9bf-1d9b-423c-ac4d-874cd211e386
                          Node   Calls   Seconds  VRAM Used
             flux_model_loader       1    0.008s     0.000G
             flux_text_encoder       1    5.487s     5.038G
                       collect       1    0.000s     5.034G
                  flux_denoise       1   17.466s    21.628G
TOTAL GRAPH EXECUTION TIME:  22.961s
TOTAL GRAPH WALL TIME:  22.965s
RAM used by InvokeAI process: 22.91G (+22.289G)
RAM used to load models: 27.18G
VRAM in use: 0.012G
RAM cache statistics:
   Model cache hits: 5
   Model cache misses: 5
   Models cached: 1
   Models cleared from cache: 3
   Cache high water mark: 22.17/0.00G