Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- :~/llama.cpp/build/bin# ./llama-server \
- -m ./phi-4-Q8_0.gguf \
- -c 16384 \
- -np 64 \
- -ngl 99 \
- -fa \
- -t 8 \
- --host 0.0.0.0 --port 8000
- ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
- ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
- ggml_cuda_init: found 1 CUDA devices:
- Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
- build: 5501 (cdf94a18) with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
- system info: n_threads = 8, n_threads_batch = 8, total_threads = 16
- system_info: n_threads = 8 (n_threads_batch = 8) / 16 | CUDA : ARCHS = 1200 | F16 = 1 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | FA_ALL_QUANTS = 1 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
- main: binding port with default address family
- main: HTTP server is listening, hostname: 0.0.0.0, port: 8000, http threads: 66
- main: loading model
- srv load_model: loading model './phi-4-Q8_0.gguf'
- llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 5090) - 30843 MiB free
- llama_model_loader: loaded meta data with 40 key-value pairs and 363 tensors from ./phi-4-Q8_0.gguf (version GGUF V3 (latest))
- llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
- llama_model_loader: - kv 0: general.architecture str = llama
- llama_model_loader: - kv 1: general.type str = model
- llama_model_loader: - kv 2: general.name str = Phi 4
- llama_model_loader: - kv 3: general.version str = 4
- llama_model_loader: - kv 4: general.basename str = phi
- llama_model_loader: - kv 5: general.size_label str = 15B
- llama_model_loader: - kv 6: general.license str = mit
- llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/microsoft/phi-...
- llama_model_loader: - kv 8: general.base_model.count u32 = 1
- llama_model_loader: - kv 9: general.base_model.0.name str = Phi 4
- llama_model_loader: - kv 10: general.base_model.0.version str = 4
- llama_model_loader: - kv 11: general.base_model.0.organization str = Microsoft
- llama_model_loader: - kv 12: general.base_model.0.repo_url str = https://huggingface.co/microsoft/phi-4
- llama_model_loader: - kv 13: general.tags arr[str,9] = ["phi", "phi4", "unsloth", "nlp", "ma...
- llama_model_loader: - kv 14: general.languages arr[str,1] = ["en"]
- llama_model_loader: - kv 15: llama.block_count u32 = 40
- llama_model_loader: - kv 16: llama.context_length u32 = 16384
- llama_model_loader: - kv 17: llama.embedding_length u32 = 5120
- llama_model_loader: - kv 18: llama.feed_forward_length u32 = 17920
- llama_model_loader: - kv 19: llama.attention.head_count u32 = 40
- llama_model_loader: - kv 20: llama.attention.head_count_kv u32 = 10
- llama_model_loader: - kv 21: llama.rope.freq_base f32 = 250000.000000
- llama_model_loader: - kv 22: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
- llama_model_loader: - kv 23: llama.attention.key_length u32 = 128
- llama_model_loader: - kv 24: llama.attention.value_length u32 = 128
- llama_model_loader: - kv 25: general.file_type u32 = 7
- llama_model_loader: - kv 26: llama.vocab_size u32 = 100352
- llama_model_loader: - kv 27: llama.rope.dimension_count u32 = 128
- llama_model_loader: - kv 28: tokenizer.ggml.model str = gpt2
- llama_model_loader: - kv 29: tokenizer.ggml.pre str = dbrx
- llama_model_loader: - kv 30: tokenizer.ggml.tokens arr[str,100352] = ["!", "\"", "#", "$", "%", "&", "'", ...
- llama_model_loader: - kv 31: tokenizer.ggml.token_type arr[i32,100352] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
- llama_model_loader: - kv 32: tokenizer.ggml.merges arr[str,100000] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
- llama_model_loader: - kv 33: tokenizer.ggml.bos_token_id u32 = 100257
- llama_model_loader: - kv 34: tokenizer.ggml.eos_token_id u32 = 100265
- llama_model_loader: - kv 35: tokenizer.ggml.unknown_token_id u32 = 5809
- llama_model_loader: - kv 36: tokenizer.ggml.padding_token_id u32 = 100351
- llama_model_loader: - kv 37: tokenizer.chat_template str = {% for message in messages %}{% if (m...
- llama_model_loader: - kv 38: tokenizer.ggml.add_space_prefix bool = false
- llama_model_loader: - kv 39: general.quantization_version u32 = 2
- llama_model_loader: - type f32: 81 tensors
- llama_model_loader: - type q8_0: 282 tensors
- print_info: file format = GGUF V3 (latest)
- print_info: file type = Q8_0
- print_info: file size = 14.51 GiB (8.50 BPW)
- load: special tokens cache size = 97
- load: token to piece cache size = 0.6151 MB
- print_info: arch = llama
- print_info: vocab_only = 0
- print_info: n_ctx_train = 16384
- print_info: n_embd = 5120
- print_info: n_layer = 40
- print_info: n_head = 40
- print_info: n_head_kv = 10
- print_info: n_rot = 128
- print_info: n_swa = 0
- print_info: is_swa_any = 0
- print_info: n_embd_head_k = 128
- print_info: n_embd_head_v = 128
- print_info: n_gqa = 4
- print_info: n_embd_k_gqa = 1280
- print_info: n_embd_v_gqa = 1280
- print_info: f_norm_eps = 0.0e+00
- print_info: f_norm_rms_eps = 1.0e-05
- print_info: f_clamp_kqv = 0.0e+00
- print_info: f_max_alibi_bias = 0.0e+00
- print_info: f_logit_scale = 0.0e+00
- print_info: f_attn_scale = 0.0e+00
- print_info: n_ff = 17920
- print_info: n_expert = 0
- print_info: n_expert_used = 0
- print_info: causal attn = 1
- print_info: pooling type = 0
- print_info: rope type = 0
- print_info: rope scaling = linear
- print_info: freq_base_train = 250000.0
- print_info: freq_scale_train = 1
- print_info: n_ctx_orig_yarn = 16384
- print_info: rope_finetuned = unknown
- print_info: ssm_d_conv = 0
- print_info: ssm_d_inner = 0
- print_info: ssm_d_state = 0
- print_info: ssm_dt_rank = 0
- print_info: ssm_dt_b_c_rms = 0
- print_info: model type = 13B
- print_info: model params = 14.66 B
- print_info: general.name = Phi 4
- print_info: vocab type = BPE
- print_info: n_vocab = 100352
- print_info: n_merges = 100000
- print_info: BOS token = 100257 '<|endoftext|>'
- print_info: EOS token = 100265 '<|im_end|>'
- print_info: EOT token = 100265 '<|im_end|>'
- print_info: UNK token = 5809 '�'
- print_info: PAD token = 100351 '<|dummy_87|>'
- print_info: LF token = 198 'Ċ'
- print_info: FIM PRE token = 100258 '<|fim_prefix|>'
- print_info: FIM SUF token = 100260 '<|fim_suffix|>'
- print_info: FIM MID token = 100259 '<|fim_middle|>'
- print_info: EOG token = 100257 '<|endoftext|>'
- print_info: EOG token = 100265 '<|im_end|>'
- print_info: max token length = 256
- load_tensors: loading model tensors, this can take a while... (mmap = true)
- load_tensors: offloading 40 repeating layers to GPU
- load_tensors: offloading output layer to GPU
- load_tensors: offloaded 41/41 layers to GPU
- load_tensors: CUDA0 model buffer size = 14334.71 MiB
- load_tensors: CPU_Mapped model buffer size = 520.62 MiB
- ...............................................................................................
- llama_context: constructing llama_context
- llama_context: n_seq_max = 64
- llama_context: n_ctx = 16384
- llama_context: n_ctx_per_seq = 256
- llama_context: n_batch = 2048
- llama_context: n_ubatch = 512
- llama_context: causal_attn = 1
- llama_context: flash_attn = 1
- llama_context: freq_base = 250000.0
- llama_context: freq_scale = 1
- llama_context: n_ctx_per_seq (256) < n_ctx_train (16384) -- the full capacity of the model will not be utilized
- llama_context: CUDA_Host output buffer size = 24.50 MiB
- llama_kv_cache_unified: CUDA0 KV buffer size = 3200.00 MiB
- llama_kv_cache_unified: size = 3200.00 MiB ( 16384 cells, 40 layers, 64 seqs), K (f16): 1600.00 MiB, V (f16): 1600.00 MiB
- llama_context: CUDA0 compute buffer size = 206.00 MiB
- llama_context: CUDA_Host compute buffer size = 42.01 MiB
- llama_context: graph nodes = 1287
- llama_context: graph splits = 2
- common_init_from_params: setting dry_penalty_last_n to ctx_size = 16384
- common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
- srv init: initializing slots, n_slots = 64
- slot init: id 0 | task -1 | new slot n_ctx_slot = 256
- slot init: id 1 | task -1 | new slot n_ctx_slot = 256
- slot init: id 2 | task -1 | new slot n_ctx_slot = 256
- slot init: id 3 | task -1 | new slot n_ctx_slot = 256
- slot init: id 4 | task -1 | new slot n_ctx_slot = 256
- slot init: id 5 | task -1 | new slot n_ctx_slot = 256
- slot init: id 6 | task -1 | new slot n_ctx_slot = 256
- slot init: id 7 | task -1 | new slot n_ctx_slot = 256
- slot init: id 8 | task -1 | new slot n_ctx_slot = 256
- slot init: id 9 | task -1 | new slot n_ctx_slot = 256
- slot init: id 10 | task -1 | new slot n_ctx_slot = 256
- slot init: id 11 | task -1 | new slot n_ctx_slot = 256
- slot init: id 12 | task -1 | new slot n_ctx_slot = 256
- slot init: id 13 | task -1 | new slot n_ctx_slot = 256
- slot init: id 14 | task -1 | new slot n_ctx_slot = 256
- slot init: id 15 | task -1 | new slot n_ctx_slot = 256
- slot init: id 16 | task -1 | new slot n_ctx_slot = 256
- slot init: id 17 | task -1 | new slot n_ctx_slot = 256
- slot init: id 18 | task -1 | new slot n_ctx_slot = 256
- slot init: id 19 | task -1 | new slot n_ctx_slot = 256
- slot init: id 20 | task -1 | new slot n_ctx_slot = 256
- slot init: id 21 | task -1 | new slot n_ctx_slot = 256
- slot init: id 22 | task -1 | new slot n_ctx_slot = 256
- slot init: id 23 | task -1 | new slot n_ctx_slot = 256
- slot init: id 24 | task -1 | new slot n_ctx_slot = 256
- slot init: id 25 | task -1 | new slot n_ctx_slot = 256
- slot init: id 26 | task -1 | new slot n_ctx_slot = 256
- slot init: id 27 | task -1 | new slot n_ctx_slot = 256
- slot init: id 28 | task -1 | new slot n_ctx_slot = 256
- slot init: id 29 | task -1 | new slot n_ctx_slot = 256
- slot init: id 30 | task -1 | new slot n_ctx_slot = 256
- slot init: id 31 | task -1 | new slot n_ctx_slot = 256
- slot init: id 32 | task -1 | new slot n_ctx_slot = 256
- slot init: id 33 | task -1 | new slot n_ctx_slot = 256
- slot init: id 34 | task -1 | new slot n_ctx_slot = 256
- slot init: id 35 | task -1 | new slot n_ctx_slot = 256
- slot init: id 36 | task -1 | new slot n_ctx_slot = 256
- slot init: id 37 | task -1 | new slot n_ctx_slot = 256
- slot init: id 38 | task -1 | new slot n_ctx_slot = 256
- slot init: id 39 | task -1 | new slot n_ctx_slot = 256
- slot init: id 40 | task -1 | new slot n_ctx_slot = 256
- slot init: id 41 | task -1 | new slot n_ctx_slot = 256
- slot init: id 42 | task -1 | new slot n_ctx_slot = 256
- slot init: id 43 | task -1 | new slot n_ctx_slot = 256
- slot init: id 44 | task -1 | new slot n_ctx_slot = 256
- slot init: id 45 | task -1 | new slot n_ctx_slot = 256
- slot init: id 46 | task -1 | new slot n_ctx_slot = 256
- slot init: id 47 | task -1 | new slot n_ctx_slot = 256
- slot init: id 48 | task -1 | new slot n_ctx_slot = 256
- slot init: id 49 | task -1 | new slot n_ctx_slot = 256
- slot init: id 50 | task -1 | new slot n_ctx_slot = 256
- slot init: id 51 | task -1 | new slot n_ctx_slot = 256
- slot init: id 52 | task -1 | new slot n_ctx_slot = 256
- slot init: id 53 | task -1 | new slot n_ctx_slot = 256
- slot init: id 54 | task -1 | new slot n_ctx_slot = 256
- slot init: id 55 | task -1 | new slot n_ctx_slot = 256
- slot init: id 56 | task -1 | new slot n_ctx_slot = 256
- slot init: id 57 | task -1 | new slot n_ctx_slot = 256
- slot init: id 58 | task -1 | new slot n_ctx_slot = 256
- slot init: id 59 | task -1 | new slot n_ctx_slot = 256
- slot init: id 60 | task -1 | new slot n_ctx_slot = 256
- slot init: id 61 | task -1 | new slot n_ctx_slot = 256
- slot init: id 62 | task -1 | new slot n_ctx_slot = 256
- slot init: id 63 | task -1 | new slot n_ctx_slot = 256
- main: model loaded
- main: chat template, chat_template: {% for message in messages %}{% if (message['role'] == 'system') %}{{'<|im_start|>system<|im_sep|>' + message['content'] + '<|im_end|>'}}{% elif (message['role'] == 'user') %}{{'<|im_start|>user<|im_sep|>' + message['content'] + '<|im_end|>'}}{% elif (message['role'] == 'assistant') %}{{'<|im_start|>assistant<|im_sep|>' + message['content'] + '<|im_end|>'}}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant<|im_sep|>' }}{% endif %}, example_format: '<|im_start|>system<|im_sep|>You are a helpful assistant<|im_end|><|im_start|>user<|im_sep|>Hello<|im_end|><|im_start|>assistant<|im_sep|>Hi there<|im_end|><|im_start|>user<|im_sep|>How are you?<|im_end|><|im_start|>assistant<|im_sep|>'
- main: server is listening on http://0.0.0.0:8000 - starting the main loop
- srv update_slots: all slots are idle
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 0 | task 0 | processing task
- slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 0 | task 0 | kv cache rm [0, end)
- slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
- slot update_slots: id 0 | task 0 | prompt done, n_past = 207, n_tokens = 207
- slot update_slots: id 0 | task 0 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 0 | task 0 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 0 | task 0 | stop processing: n_past = 137, truncated = 1
- slot print_timing: id 0 | task 0 |
- prompt eval time = 181.10 ms / 207 tokens ( 0.87 ms per token, 1143.03 tokens per second)
- eval time = 2203.93 ms / 185 tokens ( 11.91 ms per token, 83.94 tokens per second)
- total time = 2385.03 ms / 392 tokens
- srv update_slots: all slots are idle
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 1 | task 186 | processing task
- slot update_slots: id 1 | task 186 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 1 | task 186 | kv cache rm [0, end)
- slot update_slots: id 1 | task 186 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
- slot update_slots: id 1 | task 186 | prompt done, n_past = 207, n_tokens = 207
- slot update_slots: id 1 | task 186 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 1 | task 186 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 1 | task 186 | stop processing: n_past = 151, truncated = 1
- slot print_timing: id 1 | task 186 |
- prompt eval time = 41.55 ms / 207 tokens ( 0.20 ms per token, 4981.95 tokens per second)
- eval time = 2352.12 ms / 199 tokens ( 11.82 ms per token, 84.60 tokens per second)
- total time = 2393.67 ms / 406 tokens
- srv update_slots: all slots are idle
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 2 | task 386 | processing task
- slot update_slots: id 2 | task 386 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 2 | task 386 | kv cache rm [0, end)
- slot update_slots: id 2 | task 386 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
- slot update_slots: id 2 | task 386 | prompt done, n_past = 207, n_tokens = 207
- slot update_slots: id 2 | task 386 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 2 | task 386 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 2 | task 386 | stop processing: n_past = 157, truncated = 1
- slot print_timing: id 2 | task 386 |
- prompt eval time = 41.76 ms / 207 tokens ( 0.20 ms per token, 4957.25 tokens per second)
- eval time = 2435.10 ms / 205 tokens ( 11.88 ms per token, 84.19 tokens per second)
- total time = 2476.85 ms / 412 tokens
- srv update_slots: all slots are idle
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 3 | task 592 | processing task
- slot update_slots: id 3 | task 592 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 3 | task 592 | kv cache rm [0, end)
- slot update_slots: id 3 | task 592 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
- slot update_slots: id 3 | task 592 | prompt done, n_past = 207, n_tokens = 207
- slot update_slots: id 3 | task 592 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 3 | task 592 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 3 | task 592 | stop processing: n_past = 153, truncated = 1
- slot print_timing: id 3 | task 592 |
- prompt eval time = 41.20 ms / 207 tokens ( 0.20 ms per token, 5024.27 tokens per second)
- eval time = 2402.52 ms / 201 tokens ( 11.95 ms per token, 83.66 tokens per second)
- total time = 2443.72 ms / 408 tokens
- srv update_slots: all slots are idle
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 4 | task 794 | processing task
- slot update_slots: id 4 | task 794 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 4 | task 794 | kv cache rm [0, end)
- slot update_slots: id 4 | task 794 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
- slot update_slots: id 4 | task 794 | prompt done, n_past = 207, n_tokens = 207
- slot update_slots: id 4 | task 794 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 4 | task 794 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 4 | task 794 | stop processing: n_past = 161, truncated = 1
- slot print_timing: id 4 | task 794 |
- prompt eval time = 41.64 ms / 207 tokens ( 0.20 ms per token, 4971.66 tokens per second)
- eval time = 2510.65 ms / 209 tokens ( 12.01 ms per token, 83.25 tokens per second)
- total time = 2552.29 ms / 416 tokens
- srv update_slots: all slots are idle
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 5 | task 1004 | processing task
- slot update_slots: id 5 | task 1004 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 5 | task 1004 | kv cache rm [0, end)
- slot update_slots: id 5 | task 1004 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
- slot update_slots: id 5 | task 1004 | prompt done, n_past = 207, n_tokens = 207
- slot update_slots: id 5 | task 1004 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 5 | task 1004 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 5 | task 1004 | stop processing: n_past = 140, truncated = 1
- slot print_timing: id 5 | task 1004 |
- prompt eval time = 41.84 ms / 207 tokens ( 0.20 ms per token, 4947.42 tokens per second)
- eval time = 2263.88 ms / 188 tokens ( 12.04 ms per token, 83.04 tokens per second)
- total time = 2305.72 ms / 395 tokens
- srv update_slots: all slots are idle
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 6 | task 1193 | processing task
- slot update_slots: id 6 | task 1193 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 6 | task 1193 | kv cache rm [0, end)
- slot update_slots: id 6 | task 1193 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
- slot update_slots: id 6 | task 1193 | prompt done, n_past = 207, n_tokens = 207
- slot update_slots: id 6 | task 1193 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 6 | task 1193 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 6 | task 1193 | stop processing: n_past = 161, truncated = 1
- slot print_timing: id 6 | task 1193 |
- prompt eval time = 43.14 ms / 207 tokens ( 0.21 ms per token, 4798.00 tokens per second)
- eval time = 2503.51 ms / 209 tokens ( 11.98 ms per token, 83.48 tokens per second)
- total time = 2546.65 ms / 416 tokens
- srv update_slots: all slots are idle
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 7 | task 1403 | processing task
- slot update_slots: id 7 | task 1403 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 7 | task 1403 | kv cache rm [0, end)
- slot update_slots: id 7 | task 1403 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
- slot update_slots: id 7 | task 1403 | prompt done, n_past = 207, n_tokens = 207
- slot update_slots: id 7 | task 1403 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 7 | task 1403 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 7 | task 1403 | stop processing: n_past = 144, truncated = 1
- slot print_timing: id 7 | task 1403 |
- prompt eval time = 45.63 ms / 207 tokens ( 0.22 ms per token, 4536.89 tokens per second)
- eval time = 2275.58 ms / 192 tokens ( 11.85 ms per token, 84.37 tokens per second)
- total time = 2321.21 ms / 399 tokens
- srv update_slots: all slots are idle
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 8 | task 1596 | processing task
- slot update_slots: id 8 | task 1596 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 8 | task 1596 | kv cache rm [0, end)
- slot update_slots: id 8 | task 1596 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
- slot update_slots: id 8 | task 1596 | prompt done, n_past = 207, n_tokens = 207
- slot update_slots: id 8 | task 1596 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 8 | task 1596 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 8 | task 1596 | stop processing: n_past = 155, truncated = 1
- slot print_timing: id 8 | task 1596 |
- prompt eval time = 42.49 ms / 207 tokens ( 0.21 ms per token, 4871.62 tokens per second)
- eval time = 2412.59 ms / 203 tokens ( 11.88 ms per token, 84.14 tokens per second)
- total time = 2455.08 ms / 410 tokens
- srv update_slots: all slots are idle
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 9 | task 1800 | processing task
- slot update_slots: id 9 | task 1800 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 9 | task 1800 | kv cache rm [0, end)
- slot update_slots: id 9 | task 1800 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
- slot update_slots: id 9 | task 1800 | prompt done, n_past = 207, n_tokens = 207
- slot update_slots: id 9 | task 1800 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 9 | task 1800 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 9 | task 1800 | stop processing: n_past = 171, truncated = 1
- slot print_timing: id 9 | task 1800 |
- prompt eval time = 42.85 ms / 207 tokens ( 0.21 ms per token, 4830.58 tokens per second)
- eval time = 2594.06 ms / 219 tokens ( 11.85 ms per token, 84.42 tokens per second)
- total time = 2636.91 ms / 426 tokens
- srv update_slots: all slots are idle
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 10 | task 2020 | processing task
- slot update_slots: id 10 | task 2020 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 10 | task 2020 | kv cache rm [0, end)
- slot update_slots: id 10 | task 2020 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
- slot update_slots: id 10 | task 2020 | prompt done, n_past = 207, n_tokens = 207
- slot update_slots: id 10 | task 2020 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 10 | task 2020 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 10 | task 2020 | stop processing: n_past = 160, truncated = 1
- slot print_timing: id 10 | task 2020 |
- prompt eval time = 42.82 ms / 207 tokens ( 0.21 ms per token, 4834.64 tokens per second)
- eval time = 2471.02 ms / 208 tokens ( 11.88 ms per token, 84.18 tokens per second)
- total time = 2513.83 ms / 415 tokens
- srv update_slots: all slots are idle
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 11 | task 2229 | processing task
- slot update_slots: id 11 | task 2229 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 11 | task 2229 | kv cache rm [0, end)
- slot update_slots: id 11 | task 2229 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
- slot update_slots: id 11 | task 2229 | prompt done, n_past = 207, n_tokens = 207
- slot update_slots: id 11 | task 2229 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 11 | task 2229 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 11 | task 2229 | stop processing: n_past = 156, truncated = 1
- slot print_timing: id 11 | task 2229 |
- prompt eval time = 42.50 ms / 207 tokens ( 0.21 ms per token, 4870.82 tokens per second)
- eval time = 2422.91 ms / 204 tokens ( 11.88 ms per token, 84.20 tokens per second)
- total time = 2465.41 ms / 411 tokens
- srv update_slots: all slots are idle
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 12 | task 2434 | processing task
- slot update_slots: id 12 | task 2434 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 12 | task 2434 | kv cache rm [0, end)
- slot update_slots: id 12 | task 2434 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
- slot update_slots: id 12 | task 2434 | prompt done, n_past = 207, n_tokens = 207
- slot update_slots: id 12 | task 2434 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 12 | task 2434 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 12 | task 2434 | stop processing: n_past = 146, truncated = 1
- slot print_timing: id 12 | task 2434 |
- prompt eval time = 42.82 ms / 207 tokens ( 0.21 ms per token, 4833.96 tokens per second)
- eval time = 2301.92 ms / 194 tokens ( 11.87 ms per token, 84.28 tokens per second)
- total time = 2344.74 ms / 401 tokens
- srv update_slots: all slots are idle
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 13 | task 2629 | processing task
- slot update_slots: id 13 | task 2629 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 13 | task 2629 | kv cache rm [0, end)
- slot update_slots: id 13 | task 2629 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
- slot update_slots: id 13 | task 2629 | prompt done, n_past = 207, n_tokens = 207
- slot update_slots: id 13 | task 2629 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 14 | task 2712 | processing task
- slot update_slots: id 14 | task 2712 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 14 | task 2712 | kv cache rm [0, end)
- slot update_slots: id 14 | task 2712 | prompt processing progress, n_past = 207, n_tokens = 208, progress = 1.000000
- slot update_slots: id 14 | task 2712 | prompt done, n_past = 207, n_tokens = 208
- slot update_slots: id 14 | task 2712 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 15 | task 2772 | processing task
- slot update_slots: id 15 | task 2772 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 15 | task 2772 | kv cache rm [0, end)
- slot update_slots: id 15 | task 2772 | prompt processing progress, n_past = 207, n_tokens = 209, progress = 1.000000
- slot update_slots: id 15 | task 2772 | prompt done, n_past = 207, n_tokens = 209
- slot update_slots: id 13 | task 2629 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 13 | task 2629 | stop processing: n_past = 130, truncated = 1
- slot print_timing: id 13 | task 2629 |
- prompt eval time = 43.16 ms / 207 tokens ( 0.21 ms per token, 4795.89 tokens per second)
- eval time = 2608.10 ms / 178 tokens ( 14.65 ms per token, 68.25 tokens per second)
- total time = 2651.26 ms / 385 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 15 | task 2772 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 16 | task 2831 | processing task
- slot update_slots: id 16 | task 2831 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 16 | task 2831 | kv cache rm [0, end)
- slot update_slots: id 16 | task 2831 | prompt processing progress, n_past = 207, n_tokens = 209, progress = 1.000000
- slot update_slots: id 16 | task 2831 | prompt done, n_past = 207, n_tokens = 209
- slot update_slots: id 16 | task 2831 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 17 | task 2890 | processing task
- slot update_slots: id 17 | task 2890 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 17 | task 2890 | kv cache rm [0, end)
- slot update_slots: id 17 | task 2890 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 17 | task 2890 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 14 | task 2712 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 14 | task 2712 | stop processing: n_past = 146, truncated = 1
- slot print_timing: id 14 | task 2712 |
- prompt eval time = 42.85 ms / 207 tokens ( 0.21 ms per token, 4830.58 tokens per second)
- eval time = 3306.65 ms / 194 tokens ( 17.04 ms per token, 58.67 tokens per second)
- total time = 3349.51 ms / 401 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 17 | task 2890 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 18 | task 2948 | processing task
- slot update_slots: id 18 | task 2948 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 18 | task 2948 | kv cache rm [0, end)
- slot update_slots: id 18 | task 2948 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 18 | task 2948 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 15 | task 2772 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 15 | task 2772 | stop processing: n_past = 155, truncated = 1
- slot print_timing: id 15 | task 2772 |
- prompt eval time = 43.00 ms / 207 tokens ( 0.21 ms per token, 4813.62 tokens per second)
- eval time = 3615.56 ms / 203 tokens ( 17.81 ms per token, 56.15 tokens per second)
- total time = 3658.56 ms / 410 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 19 | task 2998 | processing task
- slot update_slots: id 18 | task 2948 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 19 | task 2998 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 19 | task 2998 | kv cache rm [0, end)
- slot update_slots: id 19 | task 2998 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 19 | task 2998 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 16 | task 2831 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 16 | task 2831 | stop processing: n_past = 153, truncated = 1
- slot print_timing: id 16 | task 2831 |
- prompt eval time = 43.28 ms / 207 tokens ( 0.21 ms per token, 4782.37 tokens per second)
- eval time = 3638.73 ms / 201 tokens ( 18.10 ms per token, 55.24 tokens per second)
- total time = 3682.01 ms / 408 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 19 | task 2998 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 20 | task 3055 | processing task
- slot update_slots: id 20 | task 3055 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 20 | task 3055 | kv cache rm [0, end)
- slot update_slots: id 20 | task 3055 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 20 | task 3055 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 17 | task 2890 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 17 | task 2890 | stop processing: n_past = 133, truncated = 1
- slot print_timing: id 17 | task 2890 |
- prompt eval time = 44.34 ms / 207 tokens ( 0.21 ms per token, 4667.94 tokens per second)
- eval time = 3326.14 ms / 181 tokens ( 18.38 ms per token, 54.42 tokens per second)
- total time = 3370.49 ms / 388 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 20 | task 3055 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 21 | task 3112 | processing task
- slot update_slots: id 21 | task 3112 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 21 | task 3112 | kv cache rm [0, end)
- slot update_slots: id 21 | task 3112 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 21 | task 3112 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 18 | task 2948 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 21 | task 3112 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 18 | task 2948 | stop processing: n_past = 166, truncated = 1
- slot print_timing: id 18 | task 2948 |
- prompt eval time = 44.65 ms / 207 tokens ( 0.22 ms per token, 4636.27 tokens per second)
- eval time = 3943.99 ms / 214 tokens ( 18.43 ms per token, 54.26 tokens per second)
- total time = 3988.64 ms / 421 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 22 | task 3167 | processing task
- slot update_slots: id 22 | task 3167 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 22 | task 3167 | kv cache rm [0, end)
- slot update_slots: id 22 | task 3167 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 22 | task 3167 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 19 | task 2998 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 19 | task 2998 | stop processing: n_past = 140, truncated = 1
- slot print_timing: id 19 | task 2998 |
- prompt eval time = 47.97 ms / 207 tokens ( 0.23 ms per token, 4315.56 tokens per second)
- eval time = 3382.29 ms / 188 tokens ( 17.99 ms per token, 55.58 tokens per second)
- total time = 3430.25 ms / 395 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 22 | task 3167 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 23 | task 3222 | processing task
- slot update_slots: id 23 | task 3222 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 23 | task 3222 | kv cache rm [0, end)
- slot update_slots: id 23 | task 3222 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 23 | task 3222 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 20 | task 3055 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 20 | task 3055 | stop processing: n_past = 156, truncated = 1
- slot print_timing: id 20 | task 3055 |
- prompt eval time = 44.29 ms / 207 tokens ( 0.21 ms per token, 4673.64 tokens per second)
- eval time = 3700.86 ms / 204 tokens ( 18.14 ms per token, 55.12 tokens per second)
- total time = 3745.15 ms / 411 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 23 | task 3222 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 24 | task 3277 | processing task
- slot update_slots: id 24 | task 3277 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 24 | task 3277 | kv cache rm [0, end)
- slot update_slots: id 24 | task 3277 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 24 | task 3277 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 21 | task 3112 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 21 | task 3112 | stop processing: n_past = 149, truncated = 1
- slot print_timing: id 21 | task 3112 |
- prompt eval time = 45.40 ms / 207 tokens ( 0.22 ms per token, 4559.97 tokens per second)
- eval time = 3891.56 ms / 197 tokens ( 19.75 ms per token, 50.62 tokens per second)
- total time = 3936.96 ms / 404 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 25 | task 3316 | processing task
- slot update_slots: id 25 | task 3316 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 25 | task 3316 | kv cache rm [0, end)
- slot update_slots: id 25 | task 3316 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 25 | task 3316 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 24 | task 3277 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 22 | task 3167 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 22 | task 3167 | stop processing: n_past = 143, truncated = 1
- slot print_timing: id 22 | task 3167 |
- prompt eval time = 44.68 ms / 207 tokens ( 0.22 ms per token, 4632.95 tokens per second)
- eval time = 3772.56 ms / 191 tokens ( 19.75 ms per token, 50.63 tokens per second)
- total time = 3817.24 ms / 398 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 25 | task 3316 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 26 | task 3372 | processing task
- slot update_slots: id 26 | task 3372 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 26 | task 3372 | kv cache rm [0, end)
- slot update_slots: id 26 | task 3372 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 26 | task 3372 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 23 | task 3222 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 26 | task 3372 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 27 | task 3429 | processing task
- slot update_slots: id 27 | task 3429 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 27 | task 3429 | kv cache rm [0, end)
- slot update_slots: id 27 | task 3429 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 27 | task 3429 | prompt done, n_past = 207, n_tokens = 211
- slot release: id 23 | task 3222 | stop processing: n_past = 172, truncated = 1
- slot print_timing: id 23 | task 3222 |
- prompt eval time = 45.63 ms / 207 tokens ( 0.22 ms per token, 4536.69 tokens per second)
- eval time = 4322.02 ms / 220 tokens ( 19.65 ms per token, 50.90 tokens per second)
- total time = 4367.64 ms / 427 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 24 | task 3277 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 24 | task 3277 | stop processing: n_past = 139, truncated = 1
- slot print_timing: id 24 | task 3277 |
- prompt eval time = 46.62 ms / 207 tokens ( 0.23 ms per token, 4440.15 tokens per second)
- eval time = 3687.68 ms / 187 tokens ( 19.72 ms per token, 50.71 tokens per second)
- total time = 3734.30 ms / 394 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 27 | task 3429 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 28 | task 3481 | processing task
- slot update_slots: id 28 | task 3481 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 28 | task 3481 | kv cache rm [0, end)
- slot update_slots: id 28 | task 3481 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 28 | task 3481 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 25 | task 3316 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 25 | task 3316 | stop processing: n_past = 153, truncated = 1
- slot print_timing: id 25 | task 3316 |
- prompt eval time = 45.28 ms / 207 tokens ( 0.22 ms per token, 4571.45 tokens per second)
- eval time = 3932.11 ms / 201 tokens ( 19.56 ms per token, 51.12 tokens per second)
- total time = 3977.39 ms / 408 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 29 | task 3522 | processing task
- slot update_slots: id 29 | task 3522 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 29 | task 3522 | kv cache rm [0, end)
- slot update_slots: id 29 | task 3522 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 29 | task 3522 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 28 | task 3481 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 26 | task 3372 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 29 | task 3522 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 30 | task 3576 | processing task
- slot update_slots: id 30 | task 3576 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 30 | task 3576 | kv cache rm [0, end)
- slot update_slots: id 30 | task 3576 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 30 | task 3576 | prompt done, n_past = 207, n_tokens = 211
- slot release: id 26 | task 3372 | stop processing: n_past = 157, truncated = 1
- slot print_timing: id 26 | task 3372 |
- prompt eval time = 46.45 ms / 207 tokens ( 0.22 ms per token, 4456.50 tokens per second)
- eval time = 4071.36 ms / 205 tokens ( 19.86 ms per token, 50.35 tokens per second)
- total time = 4117.81 ms / 412 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 27 | task 3429 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 27 | task 3429 | stop processing: n_past = 138, truncated = 1
- slot print_timing: id 27 | task 3429 |
- prompt eval time = 47.07 ms / 207 tokens ( 0.23 ms per token, 4397.33 tokens per second)
- eval time = 3728.56 ms / 186 tokens ( 20.05 ms per token, 49.89 tokens per second)
- total time = 3775.64 ms / 393 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 30 | task 3576 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 31 | task 3628 | processing task
- slot update_slots: id 31 | task 3628 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 31 | task 3628 | kv cache rm [0, end)
- slot update_slots: id 31 | task 3628 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 31 | task 3628 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 28 | task 3481 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 32 | task 3673 | processing task
- slot update_slots: id 32 | task 3673 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 32 | task 3673 | kv cache rm [0, end)
- slot update_slots: id 32 | task 3673 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 32 | task 3673 | prompt done, n_past = 207, n_tokens = 211
- slot update_slots: id 31 | task 3628 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 28 | task 3481 | stop processing: n_past = 160, truncated = 1
- slot print_timing: id 28 | task 3481 |
- prompt eval time = 46.00 ms / 207 tokens ( 0.22 ms per token, 4500.29 tokens per second)
- eval time = 4134.51 ms / 208 tokens ( 19.88 ms per token, 50.31 tokens per second)
- total time = 4180.50 ms / 415 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 29 | task 3522 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 32 | task 3673 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 33 | task 3727 | processing task
- slot update_slots: id 33 | task 3727 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 33 | task 3727 | kv cache rm [0, end)
- slot update_slots: id 33 | task 3727 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 33 | task 3727 | prompt done, n_past = 207, n_tokens = 211
- slot release: id 29 | task 3522 | stop processing: n_past = 154, truncated = 1
- slot print_timing: id 29 | task 3522 |
- prompt eval time = 45.46 ms / 207 tokens ( 0.22 ms per token, 4553.45 tokens per second)
- eval time = 3992.98 ms / 202 tokens ( 19.77 ms per token, 50.59 tokens per second)
- total time = 4038.44 ms / 409 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 30 | task 3576 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 33 | task 3727 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 34 | task 3782 | processing task
- slot update_slots: id 34 | task 3782 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 34 | task 3782 | kv cache rm [0, end)
- slot update_slots: id 34 | task 3782 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 34 | task 3782 | prompt done, n_past = 207, n_tokens = 211
- slot release: id 30 | task 3576 | stop processing: n_past = 158, truncated = 1
- slot print_timing: id 30 | task 3576 |
- prompt eval time = 46.67 ms / 207 tokens ( 0.23 ms per token, 4435.78 tokens per second)
- eval time = 4064.30 ms / 206 tokens ( 19.73 ms per token, 50.69 tokens per second)
- total time = 4110.97 ms / 413 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 31 | task 3628 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 31 | task 3628 | stop processing: n_past = 151, truncated = 1
- slot print_timing: id 31 | task 3628 |
- prompt eval time = 45.52 ms / 207 tokens ( 0.22 ms per token, 4547.55 tokens per second)
- eval time = 3669.50 ms / 199 tokens ( 18.44 ms per token, 54.23 tokens per second)
- total time = 3715.02 ms / 406 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 34 | task 3782 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 35 | task 3837 | processing task
- slot update_slots: id 35 | task 3837 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 35 | task 3837 | kv cache rm [0, end)
- slot update_slots: id 35 | task 3837 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 35 | task 3837 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 32 | task 3673 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 32 | task 3673 | stop processing: n_past = 151, truncated = 1
- slot print_timing: id 32 | task 3673 |
- prompt eval time = 46.65 ms / 207 tokens ( 0.23 ms per token, 4437.58 tokens per second)
- eval time = 3654.16 ms / 199 tokens ( 18.36 ms per token, 54.46 tokens per second)
- total time = 3700.80 ms / 406 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 35 | task 3837 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 36 | task 3893 | processing task
- slot update_slots: id 36 | task 3893 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 36 | task 3893 | kv cache rm [0, end)
- slot update_slots: id 36 | task 3893 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 36 | task 3893 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 33 | task 3727 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 33 | task 3727 | stop processing: n_past = 146, truncated = 1
- slot print_timing: id 33 | task 3727 |
- prompt eval time = 46.41 ms / 207 tokens ( 0.22 ms per token, 4460.73 tokens per second)
- eval time = 3717.80 ms / 194 tokens ( 19.16 ms per token, 52.18 tokens per second)
- total time = 3764.21 ms / 401 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 37 | task 3939 | processing task
- slot update_slots: id 37 | task 3939 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 37 | task 3939 | kv cache rm [0, end)
- slot update_slots: id 37 | task 3939 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 37 | task 3939 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 36 | task 3893 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 34 | task 3782 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 34 | task 3782 | stop processing: n_past = 151, truncated = 1
- slot print_timing: id 34 | task 3782 |
- prompt eval time = 47.33 ms / 207 tokens ( 0.23 ms per token, 4373.36 tokens per second)
- eval time = 3811.68 ms / 199 tokens ( 19.15 ms per token, 52.21 tokens per second)
- total time = 3859.01 ms / 406 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 37 | task 3939 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 38 | task 3993 | processing task
- slot update_slots: id 38 | task 3993 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 38 | task 3993 | kv cache rm [0, end)
- slot update_slots: id 38 | task 3993 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 38 | task 3993 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 35 | task 3837 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 35 | task 3837 | stop processing: n_past = 132, truncated = 1
- slot print_timing: id 35 | task 3837 |
- prompt eval time = 47.18 ms / 207 tokens ( 0.23 ms per token, 4387.17 tokens per second)
- eval time = 3452.44 ms / 180 tokens ( 19.18 ms per token, 52.14 tokens per second)
- total time = 3499.62 ms / 387 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 38 | task 3993 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 39 | task 4045 | processing task
- slot update_slots: id 39 | task 4045 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 39 | task 4045 | kv cache rm [0, end)
- slot update_slots: id 39 | task 4045 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 39 | task 4045 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 36 | task 3893 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 36 | task 3893 | stop processing: n_past = 140, truncated = 1
- slot print_timing: id 36 | task 3893 |
- prompt eval time = 46.98 ms / 207 tokens ( 0.23 ms per token, 4406.32 tokens per second)
- eval time = 3875.64 ms / 188 tokens ( 20.62 ms per token, 48.51 tokens per second)
- total time = 3922.62 ms / 395 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 40 | task 4089 | processing task
- slot update_slots: id 40 | task 4089 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 40 | task 4089 | kv cache rm [0, end)
- slot update_slots: id 40 | task 4089 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 40 | task 4089 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 39 | task 4045 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 37 | task 3939 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 40 | task 4089 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 41 | task 4144 | processing task
- slot update_slots: id 41 | task 4144 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 41 | task 4144 | kv cache rm [0, end)
- slot update_slots: id 41 | task 4144 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 41 | task 4144 | prompt done, n_past = 207, n_tokens = 211
- slot release: id 37 | task 3939 | stop processing: n_past = 161, truncated = 1
- slot print_timing: id 37 | task 3939 |
- prompt eval time = 47.40 ms / 207 tokens ( 0.23 ms per token, 4366.72 tokens per second)
- eval time = 4136.98 ms / 209 tokens ( 19.79 ms per token, 50.52 tokens per second)
- total time = 4184.38 ms / 416 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 38 | task 3993 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 38 | task 3993 | stop processing: n_past = 143, truncated = 1
- slot print_timing: id 38 | task 3993 |
- prompt eval time = 47.26 ms / 207 tokens ( 0.23 ms per token, 4379.75 tokens per second)
- eval time = 3824.10 ms / 191 tokens ( 20.02 ms per token, 49.95 tokens per second)
- total time = 3871.36 ms / 398 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 41 | task 4144 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 42 | task 4195 | processing task
- slot update_slots: id 42 | task 4195 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 42 | task 4195 | kv cache rm [0, end)
- slot update_slots: id 42 | task 4195 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 42 | task 4195 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 39 | task 4045 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 43 | task 4237 | processing task
- slot update_slots: id 43 | task 4237 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 43 | task 4237 | kv cache rm [0, end)
- slot update_slots: id 43 | task 4237 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 43 | task 4237 | prompt done, n_past = 207, n_tokens = 211
- slot release: id 39 | task 4045 | stop processing: n_past = 144, truncated = 1
- slot print_timing: id 39 | task 4045 |
- prompt eval time = 47.00 ms / 207 tokens ( 0.23 ms per token, 4404.07 tokens per second)
- eval time = 3866.72 ms / 192 tokens ( 20.14 ms per token, 49.65 tokens per second)
- total time = 3913.72 ms / 399 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 42 | task 4195 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 40 | task 4089 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 43 | task 4237 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 44 | task 4289 | processing task
- slot update_slots: id 44 | task 4289 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 44 | task 4289 | kv cache rm [0, end)
- slot update_slots: id 44 | task 4289 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 44 | task 4289 | prompt done, n_past = 207, n_tokens = 211
- slot release: id 40 | task 4089 | stop processing: n_past = 166, truncated = 1
- slot print_timing: id 40 | task 4089 |
- prompt eval time = 48.32 ms / 207 tokens ( 0.23 ms per token, 4284.03 tokens per second)
- eval time = 4326.17 ms / 214 tokens ( 20.22 ms per token, 49.47 tokens per second)
- total time = 4374.49 ms / 421 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 41 | task 4144 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 41 | task 4144 | stop processing: n_past = 132, truncated = 1
- slot print_timing: id 41 | task 4144 |
- prompt eval time = 48.13 ms / 207 tokens ( 0.23 ms per token, 4300.58 tokens per second)
- eval time = 3688.72 ms / 180 tokens ( 20.49 ms per token, 48.80 tokens per second)
- total time = 3736.85 ms / 387 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 44 | task 4289 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 45 | task 4341 | processing task
- slot update_slots: id 45 | task 4341 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 45 | task 4341 | kv cache rm [0, end)
- slot update_slots: id 45 | task 4341 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 45 | task 4341 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 42 | task 4195 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 46 | task 4384 | processing task
- slot update_slots: id 46 | task 4384 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 46 | task 4384 | kv cache rm [0, end)
- slot update_slots: id 46 | task 4384 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 46 | task 4384 | prompt done, n_past = 207, n_tokens = 211
- slot update_slots: id 45 | task 4341 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 42 | task 4195 | stop processing: n_past = 151, truncated = 1
- slot print_timing: id 42 | task 4195 |
- prompt eval time = 291.92 ms / 207 tokens ( 1.41 ms per token, 709.09 tokens per second)
- eval time = 3998.35 ms / 199 tokens ( 20.09 ms per token, 49.77 tokens per second)
- total time = 4290.28 ms / 406 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 43 | task 4237 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 43 | task 4237 | stop processing: n_past = 143, truncated = 1
- slot print_timing: id 43 | task 4237 |
- prompt eval time = 49.39 ms / 207 tokens ( 0.24 ms per token, 4191.22 tokens per second)
- eval time = 3843.95 ms / 191 tokens ( 20.13 ms per token, 49.69 tokens per second)
- total time = 3893.33 ms / 398 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 46 | task 4384 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 47 | task 4438 | processing task
- slot update_slots: id 47 | task 4438 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 47 | task 4438 | kv cache rm [0, end)
- slot update_slots: id 47 | task 4438 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 47 | task 4438 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 44 | task 4289 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 47 | task 4438 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 48 | task 4490 | processing task
- slot update_slots: id 48 | task 4490 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 48 | task 4490 | kv cache rm [0, end)
- slot update_slots: id 48 | task 4490 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 48 | task 4490 | prompt done, n_past = 207, n_tokens = 211
- slot release: id 44 | task 4289 | stop processing: n_past = 169, truncated = 1
- slot print_timing: id 44 | task 4289 |
- prompt eval time = 49.38 ms / 207 tokens ( 0.24 ms per token, 4192.07 tokens per second)
- eval time = 4477.58 ms / 217 tokens ( 20.63 ms per token, 48.46 tokens per second)
- total time = 4526.95 ms / 424 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 45 | task 4341 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 49 | task 4537 | processing task
- slot update_slots: id 49 | task 4537 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 49 | task 4537 | kv cache rm [0, end)
- slot update_slots: id 49 | task 4537 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 49 | task 4537 | prompt done, n_past = 207, n_tokens = 211
- slot update_slots: id 48 | task 4490 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 45 | task 4341 | stop processing: n_past = 156, truncated = 1
- slot print_timing: id 45 | task 4341 |
- prompt eval time = 47.99 ms / 207 tokens ( 0.23 ms per token, 4313.22 tokens per second)
- eval time = 3997.14 ms / 204 tokens ( 19.59 ms per token, 51.04 tokens per second)
- total time = 4045.13 ms / 411 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 46 | task 4384 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 46 | task 4384 | stop processing: n_past = 135, truncated = 1
- slot print_timing: id 46 | task 4384 |
- prompt eval time = 48.47 ms / 207 tokens ( 0.23 ms per token, 4270.59 tokens per second)
- eval time = 3596.67 ms / 183 tokens ( 19.65 ms per token, 50.88 tokens per second)
- total time = 3645.14 ms / 390 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 49 | task 4537 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 50 | task 4591 | processing task
- slot update_slots: id 50 | task 4591 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 50 | task 4591 | kv cache rm [0, end)
- slot update_slots: id 50 | task 4591 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 50 | task 4591 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 47 | task 4438 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 47 | task 4438 | stop processing: n_past = 147, truncated = 1
- slot print_timing: id 47 | task 4438 |
- prompt eval time = 48.46 ms / 207 tokens ( 0.23 ms per token, 4271.30 tokens per second)
- eval time = 3780.99 ms / 195 tokens ( 19.39 ms per token, 51.57 tokens per second)
- total time = 3829.45 ms / 402 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 50 | task 4591 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 51 | task 4646 | processing task
- slot update_slots: id 51 | task 4646 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 51 | task 4646 | kv cache rm [0, end)
- slot update_slots: id 51 | task 4646 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 51 | task 4646 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 48 | task 4490 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 48 | task 4490 | stop processing: n_past = 149, truncated = 1
- slot print_timing: id 48 | task 4490 |
- prompt eval time = 48.46 ms / 207 tokens ( 0.23 ms per token, 4271.92 tokens per second)
- eval time = 3662.46 ms / 197 tokens ( 18.59 ms per token, 53.79 tokens per second)
- total time = 3710.91 ms / 404 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 51 | task 4646 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 52 | task 4700 | processing task
- slot update_slots: id 52 | task 4700 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 52 | task 4700 | kv cache rm [0, end)
- slot update_slots: id 52 | task 4700 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 52 | task 4700 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 49 | task 4537 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 49 | task 4537 | stop processing: n_past = 143, truncated = 1
- slot print_timing: id 49 | task 4537 |
- prompt eval time = 49.42 ms / 207 tokens ( 0.24 ms per token, 4188.50 tokens per second)
- eval time = 3549.68 ms / 191 tokens ( 18.58 ms per token, 53.81 tokens per second)
- total time = 3599.11 ms / 398 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 52 | task 4700 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 53 | task 4754 | processing task
- slot update_slots: id 53 | task 4754 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 53 | task 4754 | kv cache rm [0, end)
- slot update_slots: id 53 | task 4754 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 53 | task 4754 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 50 | task 4591 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 54 | task 4797 | processing task
- slot update_slots: id 54 | task 4797 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 54 | task 4797 | kv cache rm [0, end)
- slot update_slots: id 54 | task 4797 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 54 | task 4797 | prompt done, n_past = 207, n_tokens = 211
- slot release: id 50 | task 4591 | stop processing: n_past = 158, truncated = 1
- slot print_timing: id 50 | task 4591 |
- prompt eval time = 49.73 ms / 207 tokens ( 0.24 ms per token, 4162.06 tokens per second)
- eval time = 4055.47 ms / 206 tokens ( 19.69 ms per token, 50.80 tokens per second)
- total time = 4105.20 ms / 413 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 53 | task 4754 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 51 | task 4646 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 51 | task 4646 | stop processing: n_past = 149, truncated = 1
- slot print_timing: id 51 | task 4646 |
- prompt eval time = 49.98 ms / 207 tokens ( 0.24 ms per token, 4141.41 tokens per second)
- eval time = 3863.28 ms / 197 tokens ( 19.61 ms per token, 50.99 tokens per second)
- total time = 3913.27 ms / 404 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 54 | task 4797 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 55 | task 4851 | processing task
- slot update_slots: id 55 | task 4851 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 55 | task 4851 | kv cache rm [0, end)
- slot update_slots: id 55 | task 4851 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 55 | task 4851 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 52 | task 4700 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 55 | task 4851 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 56 | task 4903 | processing task
- slot update_slots: id 56 | task 4903 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 56 | task 4903 | kv cache rm [0, end)
- slot update_slots: id 56 | task 4903 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 56 | task 4903 | prompt done, n_past = 207, n_tokens = 211
- slot release: id 52 | task 4700 | stop processing: n_past = 153, truncated = 1
- slot print_timing: id 52 | task 4700 |
- prompt eval time = 49.42 ms / 207 tokens ( 0.24 ms per token, 4188.67 tokens per second)
- eval time = 4132.56 ms / 201 tokens ( 20.56 ms per token, 48.64 tokens per second)
- total time = 4181.97 ms / 408 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 53 | task 4754 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 57 | task 4950 | processing task
- slot update_slots: id 57 | task 4950 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 57 | task 4950 | kv cache rm [0, end)
- slot update_slots: id 57 | task 4950 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 57 | task 4950 | prompt done, n_past = 207, n_tokens = 211
- slot update_slots: id 56 | task 4903 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 53 | task 4754 | stop processing: n_past = 152, truncated = 1
- slot print_timing: id 53 | task 4754 |
- prompt eval time = 50.06 ms / 207 tokens ( 0.24 ms per token, 4135.12 tokens per second)
- eval time = 4135.98 ms / 200 tokens ( 20.68 ms per token, 48.36 tokens per second)
- total time = 4186.04 ms / 407 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 54 | task 4797 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 57 | task 4950 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 54 | task 4797 | stop processing: n_past = 153, truncated = 1
- slot print_timing: id 54 | task 4797 |
- prompt eval time = 51.09 ms / 207 tokens ( 0.25 ms per token, 4051.36 tokens per second)
- eval time = 3937.85 ms / 201 tokens ( 19.59 ms per token, 51.04 tokens per second)
- total time = 3988.95 ms / 408 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 58 | task 5003 | processing task
- slot update_slots: id 58 | task 5003 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 58 | task 5003 | kv cache rm [0, end)
- slot update_slots: id 58 | task 5003 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 58 | task 5003 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 55 | task 4851 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 58 | task 5003 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 59 | task 5055 | processing task
- slot update_slots: id 59 | task 5055 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 59 | task 5055 | kv cache rm [0, end)
- slot update_slots: id 59 | task 5055 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 59 | task 5055 | prompt done, n_past = 207, n_tokens = 211
- slot release: id 55 | task 4851 | stop processing: n_past = 160, truncated = 1
- slot print_timing: id 55 | task 4851 |
- prompt eval time = 50.16 ms / 207 tokens ( 0.24 ms per token, 4126.55 tokens per second)
- eval time = 4295.02 ms / 208 tokens ( 20.65 ms per token, 48.43 tokens per second)
- total time = 4345.18 ms / 415 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 56 | task 4903 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 60 | task 5100 | processing task
- slot update_slots: id 60 | task 5100 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 60 | task 5100 | kv cache rm [0, end)
- slot update_slots: id 60 | task 5100 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 60 | task 5100 | prompt done, n_past = 207, n_tokens = 211
- slot update_slots: id 59 | task 5055 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 56 | task 4903 | stop processing: n_past = 165, truncated = 1
- slot print_timing: id 56 | task 4903 |
- prompt eval time = 50.73 ms / 207 tokens ( 0.25 ms per token, 4080.35 tokens per second)
- eval time = 4258.50 ms / 213 tokens ( 19.99 ms per token, 50.02 tokens per second)
- total time = 4309.23 ms / 420 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 57 | task 4950 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 60 | task 5100 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 61 | task 5152 | processing task
- slot update_slots: id 61 | task 5152 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 61 | task 5152 | kv cache rm [0, end)
- slot update_slots: id 61 | task 5152 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 61 | task 5152 | prompt done, n_past = 207, n_tokens = 211
- slot release: id 57 | task 4950 | stop processing: n_past = 172, truncated = 1
- slot print_timing: id 57 | task 4950 |
- prompt eval time = 51.16 ms / 207 tokens ( 0.25 ms per token, 4046.45 tokens per second)
- eval time = 4419.22 ms / 220 tokens ( 20.09 ms per token, 49.78 tokens per second)
- total time = 4470.38 ms / 427 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 58 | task 5003 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 61 | task 5152 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 62 | task 5203 | processing task
- slot update_slots: id 62 | task 5203 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 62 | task 5203 | kv cache rm [0, end)
- slot update_slots: id 62 | task 5203 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 62 | task 5203 | prompt done, n_past = 207, n_tokens = 211
- slot release: id 58 | task 5003 | stop processing: n_past = 156, truncated = 1
- slot print_timing: id 58 | task 5003 |
- prompt eval time = 51.13 ms / 207 tokens ( 0.25 ms per token, 4048.11 tokens per second)
- eval time = 4135.32 ms / 204 tokens ( 20.27 ms per token, 49.33 tokens per second)
- total time = 4186.46 ms / 411 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 59 | task 5055 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 62 | task 5203 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 63 | task 5254 | processing task
- slot update_slots: id 63 | task 5254 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 63 | task 5254 | kv cache rm [0, end)
- slot update_slots: id 63 | task 5254 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 63 | task 5254 | prompt done, n_past = 207, n_tokens = 211
- slot release: id 59 | task 5055 | stop processing: n_past = 148, truncated = 1
- slot print_timing: id 59 | task 5055 |
- prompt eval time = 50.93 ms / 207 tokens ( 0.25 ms per token, 4064.48 tokens per second)
- eval time = 4040.08 ms / 196 tokens ( 20.61 ms per token, 48.51 tokens per second)
- total time = 4091.01 ms / 403 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 60 | task 5100 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 0 | task 5296 | processing task
- slot update_slots: id 0 | task 5296 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 0 | task 5296 | kv cache rm [0, end)
- slot update_slots: id 0 | task 5296 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 0 | task 5296 | prompt done, n_past = 207, n_tokens = 211
- slot update_slots: id 63 | task 5254 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 60 | task 5100 | stop processing: n_past = 162, truncated = 1
- slot print_timing: id 60 | task 5100 |
- prompt eval time = 51.10 ms / 207 tokens ( 0.25 ms per token, 4050.56 tokens per second)
- eval time = 4320.03 ms / 210 tokens ( 20.57 ms per token, 48.61 tokens per second)
- total time = 4371.14 ms / 417 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 61 | task 5152 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 0 | task 5296 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 1 | task 5348 | processing task
- slot update_slots: id 1 | task 5348 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 1 | task 5348 | kv cache rm [0, end)
- slot update_slots: id 1 | task 5348 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 1 | task 5348 | prompt done, n_past = 207, n_tokens = 211
- slot release: id 61 | task 5152 | stop processing: n_past = 155, truncated = 1
- slot print_timing: id 61 | task 5152 |
- prompt eval time = 52.04 ms / 207 tokens ( 0.25 ms per token, 3977.40 tokens per second)
- eval time = 4389.28 ms / 203 tokens ( 21.62 ms per token, 46.25 tokens per second)
- total time = 4441.32 ms / 410 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 62 | task 5203 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 2 | task 5389 | processing task
- slot update_slots: id 2 | task 5389 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 2 | task 5389 | kv cache rm [0, end)
- slot update_slots: id 2 | task 5389 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 2 | task 5389 | prompt done, n_past = 207, n_tokens = 211
- slot release: id 62 | task 5203 | stop processing: n_past = 143, truncated = 1
- slot print_timing: id 62 | task 5203 |
- prompt eval time = 51.37 ms / 207 tokens ( 0.25 ms per token, 4029.98 tokens per second)
- eval time = 4154.91 ms / 191 tokens ( 21.75 ms per token, 45.97 tokens per second)
- total time = 4206.27 ms / 398 tokens
- slot update_slots: id 1 | task 5348 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 63 | task 5254 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 63 | task 5254 | stop processing: n_past = 132, truncated = 1
- slot print_timing: id 63 | task 5254 |
- prompt eval time = 247.45 ms / 207 tokens ( 1.20 ms per token, 836.54 tokens per second)
- eval time = 3698.53 ms / 180 tokens ( 20.55 ms per token, 48.67 tokens per second)
- total time = 3945.98 ms / 387 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 2 | task 5389 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 3 | task 5441 | processing task
- slot update_slots: id 3 | task 5441 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 3 | task 5441 | kv cache rm [0, end)
- slot update_slots: id 3 | task 5441 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 3 | task 5441 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 0 | task 5296 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 4 | task 5484 | processing task
- slot update_slots: id 4 | task 5484 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 4 | task 5484 | kv cache rm [0, end)
- slot update_slots: id 4 | task 5484 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 4 | task 5484 | prompt done, n_past = 207, n_tokens = 211
- slot update_slots: id 3 | task 5441 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 0 | task 5296 | stop processing: n_past = 165, truncated = 1
- slot print_timing: id 0 | task 5296 |
- prompt eval time = 50.73 ms / 207 tokens ( 0.25 ms per token, 4080.02 tokens per second)
- eval time = 4561.13 ms / 213 tokens ( 21.41 ms per token, 46.70 tokens per second)
- total time = 4611.87 ms / 420 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 1 | task 5348 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 4 | task 5484 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 5 | task 5535 | processing task
- slot update_slots: id 5 | task 5535 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 5 | task 5535 | kv cache rm [0, end)
- slot update_slots: id 5 | task 5535 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 5 | task 5535 | prompt done, n_past = 207, n_tokens = 211
- slot release: id 1 | task 5348 | stop processing: n_past = 153, truncated = 1
- slot print_timing: id 1 | task 5348 |
- prompt eval time = 51.11 ms / 207 tokens ( 0.25 ms per token, 4050.25 tokens per second)
- eval time = 4508.44 ms / 201 tokens ( 22.43 ms per token, 44.58 tokens per second)
- total time = 4559.55 ms / 408 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 2 | task 5389 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 2 | task 5389 | stop processing: n_past = 132, truncated = 1
- slot print_timing: id 2 | task 5389 |
- prompt eval time = 52.52 ms / 207 tokens ( 0.25 ms per token, 3940.98 tokens per second)
- eval time = 3860.96 ms / 180 tokens ( 21.45 ms per token, 46.62 tokens per second)
- total time = 3913.49 ms / 387 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 6 | task 5578 | processing task
- slot update_slots: id 6 | task 5578 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 6 | task 5578 | kv cache rm [0, end)
- slot update_slots: id 6 | task 5578 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 6 | task 5578 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 5 | task 5535 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 3 | task 5441 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 3 | task 5441 | stop processing: n_past = 132, truncated = 1
- slot print_timing: id 3 | task 5441 |
- prompt eval time = 52.12 ms / 207 tokens ( 0.25 ms per token, 3971.45 tokens per second)
- eval time = 3842.86 ms / 180 tokens ( 21.35 ms per token, 46.84 tokens per second)
- total time = 3894.98 ms / 387 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 6 | task 5578 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 7 | task 5630 | processing task
- slot update_slots: id 7 | task 5630 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 7 | task 5630 | kv cache rm [0, end)
- slot update_slots: id 7 | task 5630 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 7 | task 5630 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 4 | task 5484 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 4 | task 5484 | stop processing: n_past = 140, truncated = 1
- slot print_timing: id 4 | task 5484 |
- prompt eval time = 51.50 ms / 207 tokens ( 0.25 ms per token, 4019.34 tokens per second)
- eval time = 3820.95 ms / 188 tokens ( 20.32 ms per token, 49.20 tokens per second)
- total time = 3872.45 ms / 395 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 7 | task 5630 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 8 | task 5683 | processing task
- slot update_slots: id 8 | task 5683 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 8 | task 5683 | kv cache rm [0, end)
- slot update_slots: id 8 | task 5683 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 8 | task 5683 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 5 | task 5535 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 8 | task 5683 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 9 | task 5735 | processing task
- slot update_slots: id 9 | task 5735 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 9 | task 5735 | kv cache rm [0, end)
- slot update_slots: id 9 | task 5735 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 9 | task 5735 | prompt done, n_past = 207, n_tokens = 211
- slot release: id 5 | task 5535 | stop processing: n_past = 156, truncated = 1
- slot print_timing: id 5 | task 5535 |
- prompt eval time = 52.86 ms / 207 tokens ( 0.26 ms per token, 3915.93 tokens per second)
- eval time = 4383.39 ms / 204 tokens ( 21.49 ms per token, 46.54 tokens per second)
- total time = 4436.25 ms / 411 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 6 | task 5578 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 10 | task 5774 | processing task
- slot update_slots: id 10 | task 5774 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 10 | task 5774 | kv cache rm [0, end)
- slot update_slots: id 10 | task 5774 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 10 | task 5774 | prompt done, n_past = 207, n_tokens = 211
- slot update_slots: id 9 | task 5735 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 6 | task 5578 | stop processing: n_past = 172, truncated = 1
- slot print_timing: id 6 | task 5578 |
- prompt eval time = 52.64 ms / 207 tokens ( 0.25 ms per token, 3932.67 tokens per second)
- eval time = 4530.05 ms / 220 tokens ( 20.59 ms per token, 48.56 tokens per second)
- total time = 4582.69 ms / 427 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 7 | task 5630 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 10 | task 5774 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 11 | task 5825 | processing task
- slot update_slots: id 11 | task 5825 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 11 | task 5825 | kv cache rm [0, end)
- slot update_slots: id 11 | task 5825 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 11 | task 5825 | prompt done, n_past = 207, n_tokens = 211
- slot release: id 7 | task 5630 | stop processing: n_past = 151, truncated = 1
- slot print_timing: id 7 | task 5630 |
- prompt eval time = 51.52 ms / 207 tokens ( 0.25 ms per token, 4017.62 tokens per second)
- eval time = 4154.40 ms / 199 tokens ( 20.88 ms per token, 47.90 tokens per second)
- total time = 4205.93 ms / 406 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 8 | task 5683 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 11 | task 5825 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 12 | task 5876 | processing task
- slot update_slots: id 12 | task 5876 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 12 | task 5876 | kv cache rm [0, end)
- slot update_slots: id 12 | task 5876 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 12 | task 5876 | prompt done, n_past = 207, n_tokens = 211
- slot release: id 8 | task 5683 | stop processing: n_past = 155, truncated = 1
- slot print_timing: id 8 | task 5683 |
- prompt eval time = 53.90 ms / 207 tokens ( 0.26 ms per token, 3840.52 tokens per second)
- eval time = 4448.20 ms / 203 tokens ( 21.91 ms per token, 45.64 tokens per second)
- total time = 4502.10 ms / 410 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 9 | task 5735 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 13 | task 5918 | processing task
- slot update_slots: id 13 | task 5918 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 13 | task 5918 | kv cache rm [0, end)
- slot update_slots: id 13 | task 5918 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 13 | task 5918 | prompt done, n_past = 207, n_tokens = 211
- slot release: id 9 | task 5735 | stop processing: n_past = 139, truncated = 1
- slot print_timing: id 9 | task 5735 |
- prompt eval time = 51.43 ms / 207 tokens ( 0.25 ms per token, 4025.12 tokens per second)
- eval time = 4147.37 ms / 187 tokens ( 22.18 ms per token, 45.09 tokens per second)
- total time = 4198.80 ms / 394 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 12 | task 5876 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 10 | task 5774 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 10 | task 5774 | stop processing: n_past = 142, truncated = 1
- slot print_timing: id 10 | task 5774 |
- prompt eval time = 51.05 ms / 207 tokens ( 0.25 ms per token, 4054.53 tokens per second)
- eval time = 3913.70 ms / 190 tokens ( 20.60 ms per token, 48.55 tokens per second)
- total time = 3964.76 ms / 397 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 13 | task 5918 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 14 | task 5969 | processing task
- slot update_slots: id 14 | task 5969 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 14 | task 5969 | kv cache rm [0, end)
- slot update_slots: id 14 | task 5969 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
- slot update_slots: id 14 | task 5969 | prompt done, n_past = 207, n_tokens = 210
- slot update_slots: id 11 | task 5825 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 14 | task 5969 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 15 | task 6021 | processing task
- slot update_slots: id 15 | task 6021 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 15 | task 6021 | kv cache rm [0, end)
- slot update_slots: id 15 | task 6021 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 15 | task 6021 | prompt done, n_past = 207, n_tokens = 211
- slot release: id 11 | task 5825 | stop processing: n_past = 172, truncated = 1
- slot print_timing: id 11 | task 5825 |
- prompt eval time = 51.23 ms / 207 tokens ( 0.25 ms per token, 4040.68 tokens per second)
- eval time = 4770.88 ms / 220 tokens ( 21.69 ms per token, 46.11 tokens per second)
- total time = 4822.11 ms / 427 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 12 | task 5876 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 16 | task 6059 | processing task
- slot update_slots: id 16 | task 6059 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 16 | task 6059 | kv cache rm [0, end)
- slot update_slots: id 16 | task 6059 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 16 | task 6059 | prompt done, n_past = 207, n_tokens = 211
- slot release: id 12 | task 5876 | stop processing: n_past = 137, truncated = 1
- slot print_timing: id 12 | task 5876 |
- prompt eval time = 54.24 ms / 207 tokens ( 0.26 ms per token, 3816.02 tokens per second)
- eval time = 4269.31 ms / 185 tokens ( 23.08 ms per token, 43.33 tokens per second)
- total time = 4323.56 ms / 392 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 15 | task 6021 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 13 | task 5918 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 17 | task 6102 | processing task
- slot update_slots: id 17 | task 6102 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 17 | task 6102 | kv cache rm [0, end)
- slot update_slots: id 17 | task 6102 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 17 | task 6102 | prompt done, n_past = 207, n_tokens = 211
- slot update_slots: id 16 | task 6059 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 13 | task 5918 | stop processing: n_past = 155, truncated = 1
- slot print_timing: id 13 | task 5918 |
- prompt eval time = 51.06 ms / 207 tokens ( 0.25 ms per token, 4053.82 tokens per second)
- eval time = 4628.33 ms / 203 tokens ( 22.80 ms per token, 43.86 tokens per second)
- total time = 4679.39 ms / 410 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 18 | task 6143 | processing task
- slot update_slots: id 18 | task 6143 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 18 | task 6143 | kv cache rm [0, end)
- slot update_slots: id 18 | task 6143 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 18 | task 6143 | prompt done, n_past = 207, n_tokens = 211
- slot update_slots: id 14 | task 5969 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 17 | task 6102 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 14 | task 5969 | stop processing: n_past = 149, truncated = 1
- slot print_timing: id 14 | task 5969 |
- prompt eval time = 53.08 ms / 207 tokens ( 0.26 ms per token, 3900.14 tokens per second)
- eval time = 4702.05 ms / 197 tokens ( 23.87 ms per token, 41.90 tokens per second)
- total time = 4755.13 ms / 404 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 19 | task 6185 | processing task
- slot update_slots: id 19 | task 6185 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 19 | task 6185 | kv cache rm [0, end)
- slot update_slots: id 19 | task 6185 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 19 | task 6185 | prompt done, n_past = 207, n_tokens = 211
- slot update_slots: id 18 | task 6143 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 15 | task 6021 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 20 | task 6226 | processing task
- slot update_slots: id 20 | task 6226 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 20 | task 6226 | kv cache rm [0, end)
- slot update_slots: id 20 | task 6226 | prompt processing progress, n_past = 207, n_tokens = 212, progress = 1.000000
- slot update_slots: id 20 | task 6226 | prompt done, n_past = 207, n_tokens = 212
- slot release: id 15 | task 6021 | stop processing: n_past = 158, truncated = 1
- slot print_timing: id 15 | task 6021 |
- prompt eval time = 51.60 ms / 207 tokens ( 0.25 ms per token, 4011.24 tokens per second)
- eval time = 5098.82 ms / 206 tokens ( 24.75 ms per token, 40.40 tokens per second)
- total time = 5150.43 ms / 413 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 19 | task 6185 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 16 | task 6059 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 16 | task 6059 | stop processing: n_past = 152, truncated = 1
- slot print_timing: id 16 | task 6059 |
- prompt eval time = 51.38 ms / 207 tokens ( 0.25 ms per token, 4029.12 tokens per second)
- eval time = 4892.63 ms / 200 tokens ( 24.46 ms per token, 40.88 tokens per second)
- total time = 4944.01 ms / 407 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 21 | task 6267 | processing task
- slot update_slots: id 21 | task 6267 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 21 | task 6267 | kv cache rm [0, end)
- slot update_slots: id 21 | task 6267 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 21 | task 6267 | prompt done, n_past = 207, n_tokens = 211
- slot update_slots: id 20 | task 6226 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 17 | task 6102 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 17 | task 6102 | stop processing: n_past = 150, truncated = 1
- slot print_timing: id 17 | task 6102 |
- prompt eval time = 52.07 ms / 207 tokens ( 0.25 ms per token, 3975.19 tokens per second)
- eval time = 4883.65 ms / 198 tokens ( 24.66 ms per token, 40.54 tokens per second)
- total time = 4935.73 ms / 405 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 22 | task 6309 | processing task
- slot update_slots: id 22 | task 6309 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 22 | task 6309 | kv cache rm [0, end)
- slot update_slots: id 22 | task 6309 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 22 | task 6309 | prompt done, n_past = 207, n_tokens = 211
- slot update_slots: id 21 | task 6267 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 18 | task 6143 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 23 | task 6350 | processing task
- slot update_slots: id 23 | task 6350 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 23 | task 6350 | kv cache rm [0, end)
- slot update_slots: id 23 | task 6350 | prompt processing progress, n_past = 207, n_tokens = 212, progress = 1.000000
- slot update_slots: id 23 | task 6350 | prompt done, n_past = 207, n_tokens = 212
- slot release: id 18 | task 6143 | stop processing: n_past = 158, truncated = 1
- slot print_timing: id 18 | task 6143 |
- prompt eval time = 52.10 ms / 207 tokens ( 0.25 ms per token, 3972.90 tokens per second)
- eval time = 5059.27 ms / 206 tokens ( 24.56 ms per token, 40.72 tokens per second)
- total time = 5111.37 ms / 413 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 22 | task 6309 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 19 | task 6185 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 24 | task 6390 | processing task
- slot update_slots: id 24 | task 6390 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 24 | task 6390 | kv cache rm [0, end)
- slot update_slots: id 24 | task 6390 | prompt processing progress, n_past = 207, n_tokens = 212, progress = 1.000000
- slot update_slots: id 24 | task 6390 | prompt done, n_past = 207, n_tokens = 212
- slot release: id 19 | task 6185 | stop processing: n_past = 154, truncated = 1
- slot print_timing: id 19 | task 6185 |
- prompt eval time = 51.67 ms / 207 tokens ( 0.25 ms per token, 4006.04 tokens per second)
- eval time = 5024.85 ms / 202 tokens ( 24.88 ms per token, 40.20 tokens per second)
- total time = 5076.53 ms / 409 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 23 | task 6350 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 20 | task 6226 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 20 | task 6226 | stop processing: n_past = 146, truncated = 1
- slot print_timing: id 20 | task 6226 |
- prompt eval time = 52.53 ms / 207 tokens ( 0.25 ms per token, 3940.68 tokens per second)
- eval time = 4832.27 ms / 194 tokens ( 24.91 ms per token, 40.15 tokens per second)
- total time = 4884.80 ms / 401 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 25 | task 6432 | processing task
- slot update_slots: id 25 | task 6432 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 25 | task 6432 | kv cache rm [0, end)
- slot update_slots: id 25 | task 6432 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
- slot update_slots: id 25 | task 6432 | prompt done, n_past = 207, n_tokens = 211
- slot update_slots: id 24 | task 6390 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 21 | task 6267 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 26 | task 6473 | processing task
- slot update_slots: id 26 | task 6473 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 26 | task 6473 | kv cache rm [0, end)
- slot update_slots: id 26 | task 6473 | prompt processing progress, n_past = 207, n_tokens = 212, progress = 1.000000
- slot update_slots: id 26 | task 6473 | prompt done, n_past = 207, n_tokens = 212
- slot release: id 21 | task 6267 | stop processing: n_past = 157, truncated = 1
- slot print_timing: id 21 | task 6267 |
- prompt eval time = 52.66 ms / 207 tokens ( 0.25 ms per token, 3930.88 tokens per second)
- eval time = 5056.33 ms / 205 tokens ( 24.67 ms per token, 40.54 tokens per second)
- total time = 5108.99 ms / 412 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 25 | task 6432 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 22 | task 6309 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 27 | task 6514 | processing task
- slot update_slots: id 27 | task 6514 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 27 | task 6514 | kv cache rm [0, end)
- slot update_slots: id 27 | task 6514 | prompt processing progress, n_past = 207, n_tokens = 212, progress = 1.000000
- slot update_slots: id 27 | task 6514 | prompt done, n_past = 207, n_tokens = 212
- slot update_slots: id 26 | task 6473 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 22 | task 6309 | stop processing: n_past = 168, truncated = 1
- slot print_timing: id 22 | task 6309 |
- prompt eval time = 51.56 ms / 207 tokens ( 0.25 ms per token, 4015.05 tokens per second)
- eval time = 5314.41 ms / 216 tokens ( 24.60 ms per token, 40.64 tokens per second)
- total time = 5365.96 ms / 423 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 23 | task 6350 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 28 | task 6553 | processing task
- slot update_slots: id 28 | task 6553 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 28 | task 6553 | kv cache rm [0, end)
- slot update_slots: id 28 | task 6553 | prompt processing progress, n_past = 207, n_tokens = 212, progress = 1.000000
- slot update_slots: id 28 | task 6553 | prompt done, n_past = 207, n_tokens = 212
- slot release: id 23 | task 6350 | stop processing: n_past = 152, truncated = 1
- slot print_timing: id 23 | task 6350 |
- prompt eval time = 51.83 ms / 207 tokens ( 0.25 ms per token, 3993.90 tokens per second)
- eval time = 5005.01 ms / 200 tokens ( 25.03 ms per token, 39.96 tokens per second)
- total time = 5056.84 ms / 407 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- slot update_slots: id 27 | task 6514 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 24 | task 6390 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 24 | task 6390 | stop processing: n_past = 132, truncated = 1
- slot print_timing: id 24 | task 6390 |
- prompt eval time = 54.62 ms / 207 tokens ( 0.26 ms per token, 3789.96 tokens per second)
- eval time = 4561.86 ms / 180 tokens ( 25.34 ms per token, 39.46 tokens per second)
- total time = 4616.47 ms / 387 tokens
- srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
- srv params_from_: Chat format: Content-only
- slot launch_slot_: id 29 | task 6595 | processing task
- slot update_slots: id 29 | task 6595 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 29 | task 6595 | kv cache rm [0, end)
- slot update_slots: id 29 | task 6595 | prompt processing progress, n_
- ...
- task 17150 | processing task
- slot update_slots: id 22 | task 17150 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
- slot update_slots: id 22 | task 17150 | kv cache rm [0, end)
- slot update_slots: id 22 | task 17150 | prompt processing progress, n_past = 207, n_tokens = 270, progress = 1.000000
- slot update_slots: id 22 | task 17150 | prompt done, n_past = 207, n_tokens = 270
- slot update_slots: id 19 | task 18377 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 2 | task 18349 | stop processing: n_past = 147, truncated = 1
- slot print_timing: id 2 | task 18349 |
- prompt eval time = 100.38 ms / 207 tokens ( 0.48 ms per token, 2062.25 tokens per second)
- eval time = 21899.89 ms / 195 tokens ( 112.31 ms per token, 8.90 tokens per second)
- total time = 22000.26 ms / 402 tokens
- slot release: id 56 | task 18352 | stop processing: n_past = 147, truncated = 1
- slot print_timing: id 56 | task 18352 |
- prompt eval time = 104.38 ms / 207 tokens ( 0.50 ms per token, 1983.21 tokens per second)
- eval time = 21900.07 ms / 195 tokens ( 112.31 ms per token, 8.90 tokens per second)
- total time = 22004.45 ms / 402 tokens
- slot release: id 58 | task 18347 | stop processing: n_past = 156, truncated = 1
- slot print_timing: id 58 | task 18347 |
- prompt eval time = 72.15 ms / 207 tokens ( 0.35 ms per token, 2869.10 tokens per second)
- eval time = 23572.45 ms / 204 tokens ( 115.55 ms per token, 8.65 tokens per second)
- total time = 23644.60 ms / 411 tokens
- slot update_slots: id 39 | task 18378 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 19 | task 18377 | stop processing: n_past = 132, truncated = 1
- slot print_timing: id 19 | task 18377 |
- prompt eval time = 69.09 ms / 207 tokens ( 0.33 ms per token, 2995.96 tokens per second)
- eval time = 20104.56 ms / 180 tokens ( 111.69 ms per token, 8.95 tokens per second)
- total time = 20173.66 ms / 387 tokens
- slot update_slots: id 26 | task 18380 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 47 | task 18381 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 12 | task 18345 | stop processing: n_past = 157, truncated = 1
- slot print_timing: id 12 | task 18345 |
- prompt eval time = 239.03 ms / 207 tokens ( 1.15 ms per token, 866.01 tokens per second)
- eval time = 22917.47 ms / 205 tokens ( 111.79 ms per token, 8.95 tokens per second)
- total time = 23156.49 ms / 412 tokens
- slot update_slots: id 28 | task 18382 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 6 | task 18384 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 24 | task 18386 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 0 | task 18387 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 27 | task 18388 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 63 | task 18350 | stop processing: n_past = 159, truncated = 1
- slot print_timing: id 63 | task 18350 |
- prompt eval time = 332.26 ms / 207 tokens ( 1.61 ms per token, 623.01 tokens per second)
- eval time = 23043.08 ms / 207 tokens ( 111.32 ms per token, 8.98 tokens per second)
- total time = 23375.34 ms / 414 tokens
- slot update_slots: id 15 | task 18394 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 33 | task 18395 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 16 | task 18423 | stop processing: n_past = 201, truncated = 1
- slot print_timing: id 16 | task 18423 |
- prompt eval time = 68.05 ms / 207 tokens ( 0.33 ms per token, 3042.10 tokens per second)
- eval time = 11507.20 ms / 122 tokens ( 94.32 ms per token, 10.60 tokens per second)
- total time = 11575.24 ms / 329 tokens
- slot release: id 23 | task 18357 | stop processing: n_past = 154, truncated = 1
- slot print_timing: id 23 | task 18357 |
- prompt eval time = 69.08 ms / 207 tokens ( 0.33 ms per token, 2996.53 tokens per second)
- eval time = 22361.60 ms / 202 tokens ( 110.70 ms per token, 9.03 tokens per second)
- total time = 22430.68 ms / 409 tokens
- slot release: id 57 | task 18346 | stop processing: n_past = 161, truncated = 1
- slot print_timing: id 57 | task 18346 |
- prompt eval time = 100.42 ms / 207 tokens ( 0.49 ms per token, 2061.34 tokens per second)
- eval time = 23587.38 ms / 209 tokens ( 112.86 ms per token, 8.86 tokens per second)
- total time = 23687.80 ms / 416 tokens
- slot update_slots: id 1 | task 16657 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 31 | task 18396 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 26 | task 18380 | stop processing: n_past = 139, truncated = 1
- slot print_timing: id 26 | task 18380 |
- prompt eval time = 272.44 ms / 207 tokens ( 1.32 ms per token, 759.81 tokens per second)
- eval time = 20368.85 ms / 187 tokens ( 108.92 ms per token, 9.18 tokens per second)
- total time = 20641.28 ms / 394 tokens
- slot release: id 61 | task 18351 | stop processing: n_past = 163, truncated = 1
- slot print_timing: id 61 | task 18351 |
- prompt eval time = 250.36 ms / 207 tokens ( 1.21 ms per token, 826.80 tokens per second)
- eval time = 23318.32 ms / 211 tokens ( 110.51 ms per token, 9.05 tokens per second)
- total time = 23568.68 ms / 418 tokens
- slot release: id 55 | task 18364 | stop processing: n_past = 158, truncated = 1
- slot print_timing: id 55 | task 18364 |
- prompt eval time = 68.06 ms / 207 tokens ( 0.33 ms per token, 3041.61 tokens per second)
- eval time = 22641.48 ms / 206 tokens ( 109.91 ms per token, 9.10 tokens per second)
- total time = 22709.54 ms / 413 tokens
- slot release: id 62 | task 18372 | stop processing: n_past = 156, truncated = 1
- slot print_timing: id 62 | task 18372 |
- prompt eval time = 72.45 ms / 207 tokens ( 0.35 ms per token, 2857.10 tokens per second)
- eval time = 22298.42 ms / 204 tokens ( 109.31 ms per token, 9.15 tokens per second)
- total time = 22370.87 ms / 411 tokens
- slot update_slots: id 49 | task 18473 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 28 | task 18382 | stop processing: n_past = 141, truncated = 1
- slot print_timing: id 28 | task 18382 |
- prompt eval time = 69.78 ms / 207 tokens ( 0.34 ms per token, 2966.34 tokens per second)
- eval time = 20223.10 ms / 189 tokens ( 107.00 ms per token, 9.35 tokens per second)
- total time = 20292.89 ms / 396 tokens
- slot release: id 32 | task 18371 | stop processing: n_past = 159, truncated = 1
- slot print_timing: id 32 | task 18371 |
- prompt eval time = 261.29 ms / 207 tokens ( 1.26 ms per token, 792.23 tokens per second)
- eval time = 22440.94 ms / 207 tokens ( 108.41 ms per token, 9.22 tokens per second)
- total time = 22702.23 ms / 414 tokens
- slot update_slots: id 10 | task 18401 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 25 | task 18400 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 45 | task 18402 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 27 | task 18388 | stop processing: n_past = 141, truncated = 1
- slot print_timing: id 27 | task 18388 |
- prompt eval time = 237.58 ms / 207 tokens ( 1.15 ms per token, 871.29 tokens per second)
- eval time = 20084.16 ms / 189 tokens ( 106.27 ms per token, 9.41 tokens per second)
- total time = 20321.74 ms / 396 tokens
- slot update_slots: id 5 | task 18404 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 4 | task 18405 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 35 | task 18476 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 46 | task 18406 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 31 | task 18396 | stop processing: n_past = 142, truncated = 1
- slot print_timing: id 31 | task 18396 |
- prompt eval time = 65.92 ms / 207 tokens ( 0.32 ms per token, 3140.31 tokens per second)
- eval time = 19701.24 ms / 190 tokens ( 103.69 ms per token, 9.64 tokens per second)
- total time = 19767.16 ms / 397 tokens
- slot release: id 15 | task 18394 | stop processing: n_past = 146, truncated = 1
- slot print_timing: id 15 | task 18394 |
- prompt eval time = 271.41 ms / 207 tokens ( 1.31 ms per token, 762.67 tokens per second)
- eval time = 20220.92 ms / 194 tokens ( 104.23 ms per token, 9.59 tokens per second)
- total time = 20492.33 ms / 401 tokens
- slot release: id 47 | task 18381 | stop processing: n_past = 153, truncated = 1
- slot print_timing: id 47 | task 18381 |
- prompt eval time = 273.90 ms / 207 tokens ( 1.32 ms per token, 755.74 tokens per second)
- eval time = 21396.34 ms / 201 tokens ( 106.45 ms per token, 9.39 tokens per second)
- total time = 21670.24 ms / 408 tokens
- slot update_slots: id 8 | task 18407 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 24 | task 18386 | stop processing: n_past = 155, truncated = 1
- slot print_timing: id 24 | task 18386 |
- prompt eval time = 99.02 ms / 207 tokens ( 0.48 ms per token, 2090.47 tokens per second)
- eval time = 21166.37 ms / 203 tokens ( 104.27 ms per token, 9.59 tokens per second)
- total time = 21265.39 ms / 410 tokens
- slot update_slots: id 50 | task 18408 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 33 | task 18395 | stop processing: n_past = 153, truncated = 1
- slot print_timing: id 33 | task 18395 |
- prompt eval time = 273.11 ms / 207 tokens ( 1.32 ms per token, 757.94 tokens per second)
- eval time = 20440.12 ms / 201 tokens ( 101.69 ms per token, 9.83 tokens per second)
- total time = 20713.23 ms / 408 tokens
- slot release: id 45 | task 18402 | stop processing: n_past = 143, truncated = 1
- slot print_timing: id 45 | task 18402 |
- prompt eval time = 100.71 ms / 207 tokens ( 0.49 ms per token, 2055.47 tokens per second)
- eval time = 19371.28 ms / 191 tokens ( 101.42 ms per token, 9.86 tokens per second)
- total time = 19471.99 ms / 398 tokens
- slot update_slots: id 14 | task 18478 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 17 | task 18409 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 6 | task 18384 | stop processing: n_past = 158, truncated = 1
- slot print_timing: id 6 | task 18384 |
- prompt eval time = 97.75 ms / 207 tokens ( 0.47 ms per token, 2117.56 tokens per second)
- eval time = 21437.71 ms / 206 tokens ( 104.07 ms per token, 9.61 tokens per second)
- total time = 21535.47 ms / 413 tokens
- slot release: id 8 | task 18407 | stop processing: n_past = 132, truncated = 1
- slot print_timing: id 8 | task 18407 |
- prompt eval time = 242.06 ms / 207 tokens ( 1.17 ms per token, 855.15 tokens per second)
- eval time = 17675.97 ms / 180 tokens ( 98.20 ms per token, 10.18 tokens per second)
- total time = 17918.03 ms / 387 tokens
- slot update_slots: id 40 | task 18410 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 50 | task 18408 | stop processing: n_past = 132, truncated = 1
- slot print_timing: id 50 | task 18408 |
- prompt eval time = 71.55 ms / 207 tokens ( 0.35 ms per token, 2893.12 tokens per second)
- eval time = 17634.19 ms / 180 tokens ( 97.97 ms per token, 10.21 tokens per second)
- total time = 17705.74 ms / 387 tokens
- slot update_slots: id 3 | task 18411 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 0 | task 18387 | stop processing: n_past = 160, truncated = 1
- slot print_timing: id 0 | task 18387 |
- prompt eval time = 237.39 ms / 207 tokens ( 1.15 ms per token, 871.97 tokens per second)
- eval time = 21465.11 ms / 208 tokens ( 103.20 ms per token, 9.69 tokens per second)
- total time = 21702.50 ms / 415 tokens
- slot release: id 25 | task 18400 | stop processing: n_past = 148, truncated = 1
- slot print_timing: id 25 | task 18400 |
- prompt eval time = 99.29 ms / 207 tokens ( 0.48 ms per token, 2084.72 tokens per second)
- eval time = 19878.37 ms / 196 tokens ( 101.42 ms per token, 9.86 tokens per second)
- total time = 19977.66 ms / 403 tokens
- slot release: id 39 | task 18378 | stop processing: n_past = 167, truncated = 1
- slot print_timing: id 39 | task 18378 |
- prompt eval time = 71.45 ms / 207 tokens ( 0.35 ms per token, 2897.21 tokens per second)
- eval time = 22721.10 ms / 215 tokens ( 105.68 ms per token, 9.46 tokens per second)
- total time = 22792.55 ms / 422 tokens
- slot update_slots: id 38 | task 17128 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 42 | task 18389 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 7 | task 18413 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 22 | task 17150 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 5 | task 18404 | stop processing: n_past = 149, truncated = 1
- slot print_timing: id 5 | task 18404 |
- prompt eval time = 68.72 ms / 207 tokens ( 0.33 ms per token, 3012.40 tokens per second)
- eval time = 19660.99 ms / 197 tokens ( 99.80 ms per token, 10.02 tokens per second)
- total time = 19729.70 ms / 404 tokens
- slot update_slots: id 21 | task 18414 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 4 | task 18405 | stop processing: n_past = 152, truncated = 1
- slot print_timing: id 4 | task 18405 |
- prompt eval time = 303.48 ms / 207 tokens ( 1.47 ms per token, 682.08 tokens per second)
- eval time = 19689.79 ms / 200 tokens ( 98.45 ms per token, 10.16 tokens per second)
- total time = 19993.27 ms / 407 tokens
- slot release: id 10 | task 18401 | stop processing: n_past = 158, truncated = 1
- slot print_timing: id 10 | task 18401 |
- prompt eval time = 64.93 ms / 207 tokens ( 0.31 ms per token, 3188.00 tokens per second)
- eval time = 20475.26 ms / 206 tokens ( 99.39 ms per token, 10.06 tokens per second)
- total time = 20540.19 ms / 413 tokens
- slot release: id 7 | task 18413 | stop processing: n_past = 137, truncated = 1
- slot print_timing: id 7 | task 18413 |
- prompt eval time = 260.85 ms / 207 tokens ( 1.26 ms per token, 793.57 tokens per second)
- eval time = 17028.36 ms / 185 tokens ( 92.05 ms per token, 10.86 tokens per second)
- total time = 17289.20 ms / 392 tokens
- slot release: id 46 | task 18406 | stop processing: n_past = 155, truncated = 1
- slot print_timing: id 46 | task 18406 |
- prompt eval time = 72.82 ms / 207 tokens ( 0.35 ms per token, 2842.51 tokens per second)
- eval time = 19721.20 ms / 203 tokens ( 97.15 ms per token, 10.29 tokens per second)
- total time = 19794.02 ms / 410 tokens
- slot update_slots: id 37 | task 18416 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 3 | task 18411 | stop processing: n_past = 148, truncated = 1
- slot print_timing: id 3 | task 18411 |
- prompt eval time = 239.54 ms / 207 tokens ( 1.16 ms per token, 864.15 tokens per second)
- eval time = 18005.35 ms / 196 tokens ( 91.86 ms per token, 10.89 tokens per second)
- total time = 18244.89 ms / 403 tokens
- slot update_slots: id 43 | task 18418 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 36 | task 18417 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 53 | task 18422 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 42 | task 18389 | stop processing: n_past = 148, truncated = 1
- slot print_timing: id 42 | task 18389 |
- prompt eval time = 67.36 ms / 207 tokens ( 0.33 ms per token, 3073.27 tokens per second)
- eval time = 17777.27 ms / 196 tokens ( 90.70 ms per token, 11.03 tokens per second)
- total time = 17844.62 ms / 403 tokens
- slot release: id 17 | task 18409 | stop processing: n_past = 162, truncated = 1
- slot print_timing: id 17 | task 18409 |
- prompt eval time = 285.31 ms / 207 tokens ( 1.38 ms per token, 725.53 tokens per second)
- eval time = 18993.04 ms / 210 tokens ( 90.44 ms per token, 11.06 tokens per second)
- total time = 19278.35 ms / 417 tokens
- slot release: id 37 | task 18416 | stop processing: n_past = 145, truncated = 1
- slot print_timing: id 37 | task 18416 |
- prompt eval time = 66.82 ms / 207 tokens ( 0.32 ms per token, 3097.78 tokens per second)
- eval time = 16902.05 ms / 193 tokens ( 87.58 ms per token, 11.42 tokens per second)
- total time = 16968.87 ms / 400 tokens
- slot update_slots: id 29 | task 18424 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 40 | task 18410 | stop processing: n_past = 164, truncated = 1
- slot print_timing: id 40 | task 18410 |
- prompt eval time = 243.07 ms / 207 tokens ( 1.17 ms per token, 851.59 tokens per second)
- eval time = 18815.54 ms / 212 tokens ( 88.75 ms per token, 11.27 tokens per second)
- total time = 19058.61 ms / 419 tokens
- slot update_slots: id 60 | task 18430 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 21 | task 18414 | stop processing: n_past = 158, truncated = 1
- slot print_timing: id 21 | task 18414 |
- prompt eval time = 65.36 ms / 207 tokens ( 0.32 ms per token, 3166.88 tokens per second)
- eval time = 17701.99 ms / 206 tokens ( 85.93 ms per token, 11.64 tokens per second)
- total time = 17767.35 ms / 413 tokens
- slot update_slots: id 13 | task 18431 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 18 | task 18434 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 53 | task 18422 | stop processing: n_past = 150, truncated = 1
- slot print_timing: id 53 | task 18422 |
- prompt eval time = 266.26 ms / 207 tokens ( 1.29 ms per token, 777.45 tokens per second)
- eval time = 16377.97 ms / 198 tokens ( 82.72 ms per token, 12.09 tokens per second)
- total time = 16644.22 ms / 405 tokens
- slot update_slots: id 44 | task 18432 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 11 | task 18433 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 52 | task 18444 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 29 | task 18424 | stop processing: n_past = 144, truncated = 1
- slot print_timing: id 29 | task 18424 |
- prompt eval time = 68.72 ms / 207 tokens ( 0.33 ms per token, 3012.31 tokens per second)
- eval time = 15358.76 ms / 192 tokens ( 79.99 ms per token, 12.50 tokens per second)
- total time = 15427.48 ms / 399 tokens
- slot update_slots: id 34 | task 18445 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 59 | task 18446 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 36 | task 18417 | stop processing: n_past = 160, truncated = 1
- slot print_timing: id 36 | task 18417 |
- prompt eval time = 265.01 ms / 207 tokens ( 1.28 ms per token, 781.11 tokens per second)
- eval time = 17001.68 ms / 208 tokens ( 81.74 ms per token, 12.23 tokens per second)
- total time = 17266.68 ms / 415 tokens
- slot release: id 60 | task 18430 | stop processing: n_past = 149, truncated = 1
- slot print_timing: id 60 | task 18430 |
- prompt eval time = 249.04 ms / 207 tokens ( 1.20 ms per token, 831.21 tokens per second)
- eval time = 15252.59 ms / 197 tokens ( 77.42 ms per token, 12.92 tokens per second)
- total time = 15501.63 ms / 404 tokens
- slot update_slots: id 30 | task 18447 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 43 | task 18418 | stop processing: n_past = 165, truncated = 1
- slot print_timing: id 43 | task 18418 |
- prompt eval time = 70.08 ms / 207 tokens ( 0.34 ms per token, 2953.81 tokens per second)
- eval time = 17370.91 ms / 213 tokens ( 81.55 ms per token, 12.26 tokens per second)
- total time = 17440.99 ms / 420 tokens
- slot release: id 44 | task 18432 | stop processing: n_past = 142, truncated = 1
- slot print_timing: id 44 | task 18432 |
- prompt eval time = 70.02 ms / 207 tokens ( 0.34 ms per token, 2956.34 tokens per second)
- eval time = 14485.95 ms / 190 tokens ( 76.24 ms per token, 13.12 tokens per second)
- total time = 14555.97 ms / 397 tokens
- slot update_slots: id 48 | task 18448 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 18 | task 18434 | stop processing: n_past = 152, truncated = 1
- slot print_timing: id 18 | task 18434 |
- prompt eval time = 94.44 ms / 207 tokens ( 0.46 ms per token, 2191.91 tokens per second)
- eval time = 14830.41 ms / 200 tokens ( 74.15 ms per token, 13.49 tokens per second)
- total time = 14924.85 ms / 407 tokens
- slot update_slots: id 51 | task 18449 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 52 | task 18444 | stop processing: n_past = 145, truncated = 1
- slot print_timing: id 52 | task 18444 |
- prompt eval time = 70.81 ms / 207 tokens ( 0.34 ms per token, 2923.36 tokens per second)
- eval time = 14295.37 ms / 193 tokens ( 74.07 ms per token, 13.50 tokens per second)
- total time = 14366.18 ms / 400 tokens
- slot update_slots: id 54 | task 18450 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 13 | task 18431 | stop processing: n_past = 157, truncated = 1
- slot print_timing: id 13 | task 18431 |
- prompt eval time = 94.05 ms / 207 tokens ( 0.45 ms per token, 2201.07 tokens per second)
- eval time = 15157.72 ms / 205 tokens ( 73.94 ms per token, 13.52 tokens per second)
- total time = 15251.76 ms / 412 tokens
- slot release: id 59 | task 18446 | stop processing: n_past = 146, truncated = 1
- slot print_timing: id 59 | task 18446 |
- prompt eval time = 274.99 ms / 207 tokens ( 1.33 ms per token, 752.74 tokens per second)
- eval time = 14071.08 ms / 194 tokens ( 72.53 ms per token, 13.79 tokens per second)
- total time = 14346.07 ms / 401 tokens
- slot release: id 11 | task 18433 | stop processing: n_past = 150, truncated = 1
- slot print_timing: id 11 | task 18433 |
- prompt eval time = 239.94 ms / 207 tokens ( 1.16 ms per token, 862.73 tokens per second)
- eval time = 14656.67 ms / 198 tokens ( 74.02 ms per token, 13.51 tokens per second)
- total time = 14896.60 ms / 405 tokens
- slot release: id 34 | task 18445 | stop processing: n_past = 148, truncated = 1
- slot print_timing: id 34 | task 18445 |
- prompt eval time = 273.03 ms / 207 tokens ( 1.32 ms per token, 758.16 tokens per second)
- eval time = 14118.00 ms / 196 tokens ( 72.03 ms per token, 13.88 tokens per second)
- total time = 14391.03 ms / 403 tokens
- slot update_slots: id 9 | task 18451 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 41 | task 18452 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 51 | task 18449 | stop processing: n_past = 142, truncated = 1
- slot print_timing: id 51 | task 18449 |
- prompt eval time = 66.77 ms / 207 tokens ( 0.32 ms per token, 3100.43 tokens per second)
- eval time = 13196.81 ms / 190 tokens ( 69.46 ms per token, 14.40 tokens per second)
- total time = 13263.58 ms / 397 tokens
- slot release: id 30 | task 18447 | stop processing: n_past = 150, truncated = 1
- slot print_timing: id 30 | task 18447 |
- prompt eval time = 65.08 ms / 207 tokens ( 0.31 ms per token, 3180.51 tokens per second)
- eval time = 13884.45 ms / 198 tokens ( 70.12 ms per token, 14.26 tokens per second)
- total time = 13949.53 ms / 405 tokens
- slot update_slots: id 20 | task 18471 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 48 | task 18448 | stop processing: n_past = 155, truncated = 1
- slot print_timing: id 48 | task 18448 |
- prompt eval time = 66.36 ms / 207 tokens ( 0.32 ms per token, 3119.58 tokens per second)
- eval time = 13915.74 ms / 203 tokens ( 68.55 ms per token, 14.59 tokens per second)
- total time = 13982.10 ms / 410 tokens
- slot release: id 20 | task 18471 | stop processing: n_past = 140, truncated = 1
- slot print_timing: id 20 | task 18471 |
- prompt eval time = 64.84 ms / 207 tokens ( 0.31 ms per token, 3192.57 tokens per second)
- eval time = 12019.93 ms / 188 tokens ( 63.94 ms per token, 15.64 tokens per second)
- total time = 12084.77 ms / 395 tokens
- slot release: id 9 | task 18451 | stop processing: n_past = 152, truncated = 1
- slot print_timing: id 9 | task 18451 |
- prompt eval time = 67.95 ms / 207 tokens ( 0.33 ms per token, 3046.27 tokens per second)
- eval time = 13274.74 ms / 200 tokens ( 66.37 ms per token, 15.07 tokens per second)
- total time = 13342.69 ms / 407 tokens
- slot release: id 54 | task 18450 | stop processing: n_past = 163, truncated = 1
- slot print_timing: id 54 | task 18450 |
- prompt eval time = 67.28 ms / 207 tokens ( 0.33 ms per token, 3076.92 tokens per second)
- eval time = 13594.90 ms / 211 tokens ( 64.43 ms per token, 15.52 tokens per second)
- total time = 13662.18 ms / 418 tokens
- slot update_slots: id 1 | task 16657 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 49 | task 18473 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 41 | task 18452 | stop processing: n_past = 162, truncated = 1
- slot print_timing: id 41 | task 18452 |
- prompt eval time = 70.42 ms / 207 tokens ( 0.34 ms per token, 2939.59 tokens per second)
- eval time = 13007.56 ms / 210 tokens ( 61.94 ms per token, 16.14 tokens per second)
- total time = 13077.97 ms / 417 tokens
- slot release: id 1 | task 16657 | stop processing: n_past = 135, truncated = 1
- slot print_timing: id 1 | task 16657 |
- prompt eval time = 63.73 ms / 207 tokens ( 0.31 ms per token, 3248.23 tokens per second)
- eval time = 10734.93 ms / 183 tokens ( 58.66 ms per token, 17.05 tokens per second)
- total time = 10798.66 ms / 390 tokens
- slot update_slots: id 35 | task 18476 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 49 | task 18473 | stop processing: n_past = 141, truncated = 1
- slot print_timing: id 49 | task 18473 |
- prompt eval time = 67.33 ms / 207 tokens ( 0.33 ms per token, 3074.18 tokens per second)
- eval time = 10574.60 ms / 189 tokens ( 55.95 ms per token, 17.87 tokens per second)
- total time = 10641.93 ms / 396 tokens
- slot release: id 35 | task 18476 | stop processing: n_past = 138, truncated = 1
- slot print_timing: id 35 | task 18476 |
- prompt eval time = 66.50 ms / 207 tokens ( 0.32 ms per token, 3112.83 tokens per second)
- eval time = 10172.91 ms / 186 tokens ( 54.69 ms per token, 18.28 tokens per second)
- total time = 10239.40 ms / 393 tokens
- slot update_slots: id 14 | task 18478 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 38 | task 17128 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot update_slots: id 22 | task 17150 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
- slot release: id 38 | task 17128 | stop processing: n_past = 140, truncated = 1
- slot print_timing: id 38 | task 17128 |
- prompt eval time = 244.46 ms / 207 tokens ( 1.18 ms per token, 846.75 tokens per second)
- eval time = 8892.84 ms / 188 tokens ( 47.30 ms per token, 21.14 tokens per second)
- total time = 9137.31 ms / 395 tokens
- slot release: id 14 | task 18478 | stop processing: n_past = 155, truncated = 1
- slot print_timing: id 14 | task 18478 |
- prompt eval time = 267.84 ms / 207 tokens ( 1.29 ms per token, 772.86 tokens per second)
- eval time = 9704.56 ms / 203 tokens ( 47.81 ms per token, 20.92 tokens per second)
- total time = 9972.40 ms / 410 tokens
- slot release: id 22 | task 17150 | stop processing: n_past = 158, truncated = 1
- slot print_timing: id 22 | task 17150 |
- prompt eval time = 64.90 ms / 207 tokens ( 0.31 ms per token, 3189.52 tokens per second)
- eval time = 9116.47 ms / 206 tokens ( 44.25 ms per token, 22.60 tokens per second)
- total time = 9181.37 ms / 413 tokens
- srv update_slots: all slots are idle
Add Comment
Please, Sign In to add comment