Guest User

llama with fa

a guest
May 27th, 2025
7
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 144.23 KB | Software | 0 0
  1. :~/llama.cpp/build/bin# ./llama-server \
  2. -m ./phi-4-Q8_0.gguf \
  3. -c 16384 \
  4. -np 64 \
  5. -ngl 99 \
  6. -fa \
  7. -t 8 \
  8. --host 0.0.0.0 --port 8000
  9. ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
  10. ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
  11. ggml_cuda_init: found 1 CUDA devices:
  12. Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
  13. build: 5501 (cdf94a18) with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
  14. system info: n_threads = 8, n_threads_batch = 8, total_threads = 16
  15.  
  16. system_info: n_threads = 8 (n_threads_batch = 8) / 16 | CUDA : ARCHS = 1200 | F16 = 1 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | FA_ALL_QUANTS = 1 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
  17.  
  18. main: binding port with default address family
  19. main: HTTP server is listening, hostname: 0.0.0.0, port: 8000, http threads: 66
  20. main: loading model
  21. srv load_model: loading model './phi-4-Q8_0.gguf'
  22. llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 5090) - 30843 MiB free
  23. llama_model_loader: loaded meta data with 40 key-value pairs and 363 tensors from ./phi-4-Q8_0.gguf (version GGUF V3 (latest))
  24. llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
  25. llama_model_loader: - kv 0: general.architecture str = llama
  26. llama_model_loader: - kv 1: general.type str = model
  27. llama_model_loader: - kv 2: general.name str = Phi 4
  28. llama_model_loader: - kv 3: general.version str = 4
  29. llama_model_loader: - kv 4: general.basename str = phi
  30. llama_model_loader: - kv 5: general.size_label str = 15B
  31. llama_model_loader: - kv 6: general.license str = mit
  32. llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/microsoft/phi-...
  33. llama_model_loader: - kv 8: general.base_model.count u32 = 1
  34. llama_model_loader: - kv 9: general.base_model.0.name str = Phi 4
  35. llama_model_loader: - kv 10: general.base_model.0.version str = 4
  36. llama_model_loader: - kv 11: general.base_model.0.organization str = Microsoft
  37. llama_model_loader: - kv 12: general.base_model.0.repo_url str = https://huggingface.co/microsoft/phi-4
  38. llama_model_loader: - kv 13: general.tags arr[str,9] = ["phi", "phi4", "unsloth", "nlp", "ma...
  39. llama_model_loader: - kv 14: general.languages arr[str,1] = ["en"]
  40. llama_model_loader: - kv 15: llama.block_count u32 = 40
  41. llama_model_loader: - kv 16: llama.context_length u32 = 16384
  42. llama_model_loader: - kv 17: llama.embedding_length u32 = 5120
  43. llama_model_loader: - kv 18: llama.feed_forward_length u32 = 17920
  44. llama_model_loader: - kv 19: llama.attention.head_count u32 = 40
  45. llama_model_loader: - kv 20: llama.attention.head_count_kv u32 = 10
  46. llama_model_loader: - kv 21: llama.rope.freq_base f32 = 250000.000000
  47. llama_model_loader: - kv 22: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
  48. llama_model_loader: - kv 23: llama.attention.key_length u32 = 128
  49. llama_model_loader: - kv 24: llama.attention.value_length u32 = 128
  50. llama_model_loader: - kv 25: general.file_type u32 = 7
  51. llama_model_loader: - kv 26: llama.vocab_size u32 = 100352
  52. llama_model_loader: - kv 27: llama.rope.dimension_count u32 = 128
  53. llama_model_loader: - kv 28: tokenizer.ggml.model str = gpt2
  54. llama_model_loader: - kv 29: tokenizer.ggml.pre str = dbrx
  55. llama_model_loader: - kv 30: tokenizer.ggml.tokens arr[str,100352] = ["!", "\"", "#", "$", "%", "&", "'", ...
  56. llama_model_loader: - kv 31: tokenizer.ggml.token_type arr[i32,100352] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
  57. llama_model_loader: - kv 32: tokenizer.ggml.merges arr[str,100000] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
  58. llama_model_loader: - kv 33: tokenizer.ggml.bos_token_id u32 = 100257
  59. llama_model_loader: - kv 34: tokenizer.ggml.eos_token_id u32 = 100265
  60. llama_model_loader: - kv 35: tokenizer.ggml.unknown_token_id u32 = 5809
  61. llama_model_loader: - kv 36: tokenizer.ggml.padding_token_id u32 = 100351
  62. llama_model_loader: - kv 37: tokenizer.chat_template str = {% for message in messages %}{% if (m...
  63. llama_model_loader: - kv 38: tokenizer.ggml.add_space_prefix bool = false
  64. llama_model_loader: - kv 39: general.quantization_version u32 = 2
  65. llama_model_loader: - type f32: 81 tensors
  66. llama_model_loader: - type q8_0: 282 tensors
  67. print_info: file format = GGUF V3 (latest)
  68. print_info: file type = Q8_0
  69. print_info: file size = 14.51 GiB (8.50 BPW)
  70. load: special tokens cache size = 97
  71. load: token to piece cache size = 0.6151 MB
  72. print_info: arch = llama
  73. print_info: vocab_only = 0
  74. print_info: n_ctx_train = 16384
  75. print_info: n_embd = 5120
  76. print_info: n_layer = 40
  77. print_info: n_head = 40
  78. print_info: n_head_kv = 10
  79. print_info: n_rot = 128
  80. print_info: n_swa = 0
  81. print_info: is_swa_any = 0
  82. print_info: n_embd_head_k = 128
  83. print_info: n_embd_head_v = 128
  84. print_info: n_gqa = 4
  85. print_info: n_embd_k_gqa = 1280
  86. print_info: n_embd_v_gqa = 1280
  87. print_info: f_norm_eps = 0.0e+00
  88. print_info: f_norm_rms_eps = 1.0e-05
  89. print_info: f_clamp_kqv = 0.0e+00
  90. print_info: f_max_alibi_bias = 0.0e+00
  91. print_info: f_logit_scale = 0.0e+00
  92. print_info: f_attn_scale = 0.0e+00
  93. print_info: n_ff = 17920
  94. print_info: n_expert = 0
  95. print_info: n_expert_used = 0
  96. print_info: causal attn = 1
  97. print_info: pooling type = 0
  98. print_info: rope type = 0
  99. print_info: rope scaling = linear
  100. print_info: freq_base_train = 250000.0
  101. print_info: freq_scale_train = 1
  102. print_info: n_ctx_orig_yarn = 16384
  103. print_info: rope_finetuned = unknown
  104. print_info: ssm_d_conv = 0
  105. print_info: ssm_d_inner = 0
  106. print_info: ssm_d_state = 0
  107. print_info: ssm_dt_rank = 0
  108. print_info: ssm_dt_b_c_rms = 0
  109. print_info: model type = 13B
  110. print_info: model params = 14.66 B
  111. print_info: general.name = Phi 4
  112. print_info: vocab type = BPE
  113. print_info: n_vocab = 100352
  114. print_info: n_merges = 100000
  115. print_info: BOS token = 100257 '<|endoftext|>'
  116. print_info: EOS token = 100265 '<|im_end|>'
  117. print_info: EOT token = 100265 '<|im_end|>'
  118. print_info: UNK token = 5809 '�'
  119. print_info: PAD token = 100351 '<|dummy_87|>'
  120. print_info: LF token = 198 'Ċ'
  121. print_info: FIM PRE token = 100258 '<|fim_prefix|>'
  122. print_info: FIM SUF token = 100260 '<|fim_suffix|>'
  123. print_info: FIM MID token = 100259 '<|fim_middle|>'
  124. print_info: EOG token = 100257 '<|endoftext|>'
  125. print_info: EOG token = 100265 '<|im_end|>'
  126. print_info: max token length = 256
  127. load_tensors: loading model tensors, this can take a while... (mmap = true)
  128. load_tensors: offloading 40 repeating layers to GPU
  129. load_tensors: offloading output layer to GPU
  130. load_tensors: offloaded 41/41 layers to GPU
  131. load_tensors: CUDA0 model buffer size = 14334.71 MiB
  132. load_tensors: CPU_Mapped model buffer size = 520.62 MiB
  133. ...............................................................................................
  134. llama_context: constructing llama_context
  135. llama_context: n_seq_max = 64
  136. llama_context: n_ctx = 16384
  137. llama_context: n_ctx_per_seq = 256
  138. llama_context: n_batch = 2048
  139. llama_context: n_ubatch = 512
  140. llama_context: causal_attn = 1
  141. llama_context: flash_attn = 1
  142. llama_context: freq_base = 250000.0
  143. llama_context: freq_scale = 1
  144. llama_context: n_ctx_per_seq (256) < n_ctx_train (16384) -- the full capacity of the model will not be utilized
  145. llama_context: CUDA_Host output buffer size = 24.50 MiB
  146. llama_kv_cache_unified: CUDA0 KV buffer size = 3200.00 MiB
  147. llama_kv_cache_unified: size = 3200.00 MiB ( 16384 cells, 40 layers, 64 seqs), K (f16): 1600.00 MiB, V (f16): 1600.00 MiB
  148. llama_context: CUDA0 compute buffer size = 206.00 MiB
  149. llama_context: CUDA_Host compute buffer size = 42.01 MiB
  150. llama_context: graph nodes = 1287
  151. llama_context: graph splits = 2
  152. common_init_from_params: setting dry_penalty_last_n to ctx_size = 16384
  153. common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
  154. srv init: initializing slots, n_slots = 64
  155. slot init: id 0 | task -1 | new slot n_ctx_slot = 256
  156. slot init: id 1 | task -1 | new slot n_ctx_slot = 256
  157. slot init: id 2 | task -1 | new slot n_ctx_slot = 256
  158. slot init: id 3 | task -1 | new slot n_ctx_slot = 256
  159. slot init: id 4 | task -1 | new slot n_ctx_slot = 256
  160. slot init: id 5 | task -1 | new slot n_ctx_slot = 256
  161. slot init: id 6 | task -1 | new slot n_ctx_slot = 256
  162. slot init: id 7 | task -1 | new slot n_ctx_slot = 256
  163. slot init: id 8 | task -1 | new slot n_ctx_slot = 256
  164. slot init: id 9 | task -1 | new slot n_ctx_slot = 256
  165. slot init: id 10 | task -1 | new slot n_ctx_slot = 256
  166. slot init: id 11 | task -1 | new slot n_ctx_slot = 256
  167. slot init: id 12 | task -1 | new slot n_ctx_slot = 256
  168. slot init: id 13 | task -1 | new slot n_ctx_slot = 256
  169. slot init: id 14 | task -1 | new slot n_ctx_slot = 256
  170. slot init: id 15 | task -1 | new slot n_ctx_slot = 256
  171. slot init: id 16 | task -1 | new slot n_ctx_slot = 256
  172. slot init: id 17 | task -1 | new slot n_ctx_slot = 256
  173. slot init: id 18 | task -1 | new slot n_ctx_slot = 256
  174. slot init: id 19 | task -1 | new slot n_ctx_slot = 256
  175. slot init: id 20 | task -1 | new slot n_ctx_slot = 256
  176. slot init: id 21 | task -1 | new slot n_ctx_slot = 256
  177. slot init: id 22 | task -1 | new slot n_ctx_slot = 256
  178. slot init: id 23 | task -1 | new slot n_ctx_slot = 256
  179. slot init: id 24 | task -1 | new slot n_ctx_slot = 256
  180. slot init: id 25 | task -1 | new slot n_ctx_slot = 256
  181. slot init: id 26 | task -1 | new slot n_ctx_slot = 256
  182. slot init: id 27 | task -1 | new slot n_ctx_slot = 256
  183. slot init: id 28 | task -1 | new slot n_ctx_slot = 256
  184. slot init: id 29 | task -1 | new slot n_ctx_slot = 256
  185. slot init: id 30 | task -1 | new slot n_ctx_slot = 256
  186. slot init: id 31 | task -1 | new slot n_ctx_slot = 256
  187. slot init: id 32 | task -1 | new slot n_ctx_slot = 256
  188. slot init: id 33 | task -1 | new slot n_ctx_slot = 256
  189. slot init: id 34 | task -1 | new slot n_ctx_slot = 256
  190. slot init: id 35 | task -1 | new slot n_ctx_slot = 256
  191. slot init: id 36 | task -1 | new slot n_ctx_slot = 256
  192. slot init: id 37 | task -1 | new slot n_ctx_slot = 256
  193. slot init: id 38 | task -1 | new slot n_ctx_slot = 256
  194. slot init: id 39 | task -1 | new slot n_ctx_slot = 256
  195. slot init: id 40 | task -1 | new slot n_ctx_slot = 256
  196. slot init: id 41 | task -1 | new slot n_ctx_slot = 256
  197. slot init: id 42 | task -1 | new slot n_ctx_slot = 256
  198. slot init: id 43 | task -1 | new slot n_ctx_slot = 256
  199. slot init: id 44 | task -1 | new slot n_ctx_slot = 256
  200. slot init: id 45 | task -1 | new slot n_ctx_slot = 256
  201. slot init: id 46 | task -1 | new slot n_ctx_slot = 256
  202. slot init: id 47 | task -1 | new slot n_ctx_slot = 256
  203. slot init: id 48 | task -1 | new slot n_ctx_slot = 256
  204. slot init: id 49 | task -1 | new slot n_ctx_slot = 256
  205. slot init: id 50 | task -1 | new slot n_ctx_slot = 256
  206. slot init: id 51 | task -1 | new slot n_ctx_slot = 256
  207. slot init: id 52 | task -1 | new slot n_ctx_slot = 256
  208. slot init: id 53 | task -1 | new slot n_ctx_slot = 256
  209. slot init: id 54 | task -1 | new slot n_ctx_slot = 256
  210. slot init: id 55 | task -1 | new slot n_ctx_slot = 256
  211. slot init: id 56 | task -1 | new slot n_ctx_slot = 256
  212. slot init: id 57 | task -1 | new slot n_ctx_slot = 256
  213. slot init: id 58 | task -1 | new slot n_ctx_slot = 256
  214. slot init: id 59 | task -1 | new slot n_ctx_slot = 256
  215. slot init: id 60 | task -1 | new slot n_ctx_slot = 256
  216. slot init: id 61 | task -1 | new slot n_ctx_slot = 256
  217. slot init: id 62 | task -1 | new slot n_ctx_slot = 256
  218. slot init: id 63 | task -1 | new slot n_ctx_slot = 256
  219. main: model loaded
  220. main: chat template, chat_template: {% for message in messages %}{% if (message['role'] == 'system') %}{{'<|im_start|>system<|im_sep|>' + message['content'] + '<|im_end|>'}}{% elif (message['role'] == 'user') %}{{'<|im_start|>user<|im_sep|>' + message['content'] + '<|im_end|>'}}{% elif (message['role'] == 'assistant') %}{{'<|im_start|>assistant<|im_sep|>' + message['content'] + '<|im_end|>'}}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant<|im_sep|>' }}{% endif %}, example_format: '<|im_start|>system<|im_sep|>You are a helpful assistant<|im_end|><|im_start|>user<|im_sep|>Hello<|im_end|><|im_start|>assistant<|im_sep|>Hi there<|im_end|><|im_start|>user<|im_sep|>How are you?<|im_end|><|im_start|>assistant<|im_sep|>'
  221. main: server is listening on http://0.0.0.0:8000 - starting the main loop
  222. srv update_slots: all slots are idle
  223. srv params_from_: Chat format: Content-only
  224. slot launch_slot_: id 0 | task 0 | processing task
  225. slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  226. slot update_slots: id 0 | task 0 | kv cache rm [0, end)
  227. slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
  228. slot update_slots: id 0 | task 0 | prompt done, n_past = 207, n_tokens = 207
  229. slot update_slots: id 0 | task 0 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  230. slot update_slots: id 0 | task 0 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  231. slot release: id 0 | task 0 | stop processing: n_past = 137, truncated = 1
  232. slot print_timing: id 0 | task 0 |
  233. prompt eval time = 181.10 ms / 207 tokens ( 0.87 ms per token, 1143.03 tokens per second)
  234. eval time = 2203.93 ms / 185 tokens ( 11.91 ms per token, 83.94 tokens per second)
  235. total time = 2385.03 ms / 392 tokens
  236. srv update_slots: all slots are idle
  237. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  238. srv params_from_: Chat format: Content-only
  239. slot launch_slot_: id 1 | task 186 | processing task
  240. slot update_slots: id 1 | task 186 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  241. slot update_slots: id 1 | task 186 | kv cache rm [0, end)
  242. slot update_slots: id 1 | task 186 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
  243. slot update_slots: id 1 | task 186 | prompt done, n_past = 207, n_tokens = 207
  244. slot update_slots: id 1 | task 186 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  245. slot update_slots: id 1 | task 186 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  246. slot release: id 1 | task 186 | stop processing: n_past = 151, truncated = 1
  247. slot print_timing: id 1 | task 186 |
  248. prompt eval time = 41.55 ms / 207 tokens ( 0.20 ms per token, 4981.95 tokens per second)
  249. eval time = 2352.12 ms / 199 tokens ( 11.82 ms per token, 84.60 tokens per second)
  250. total time = 2393.67 ms / 406 tokens
  251. srv update_slots: all slots are idle
  252. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  253. srv params_from_: Chat format: Content-only
  254. slot launch_slot_: id 2 | task 386 | processing task
  255. slot update_slots: id 2 | task 386 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  256. slot update_slots: id 2 | task 386 | kv cache rm [0, end)
  257. slot update_slots: id 2 | task 386 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
  258. slot update_slots: id 2 | task 386 | prompt done, n_past = 207, n_tokens = 207
  259. slot update_slots: id 2 | task 386 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  260. slot update_slots: id 2 | task 386 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  261. slot release: id 2 | task 386 | stop processing: n_past = 157, truncated = 1
  262. slot print_timing: id 2 | task 386 |
  263. prompt eval time = 41.76 ms / 207 tokens ( 0.20 ms per token, 4957.25 tokens per second)
  264. eval time = 2435.10 ms / 205 tokens ( 11.88 ms per token, 84.19 tokens per second)
  265. total time = 2476.85 ms / 412 tokens
  266. srv update_slots: all slots are idle
  267. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  268. srv params_from_: Chat format: Content-only
  269. slot launch_slot_: id 3 | task 592 | processing task
  270. slot update_slots: id 3 | task 592 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  271. slot update_slots: id 3 | task 592 | kv cache rm [0, end)
  272. slot update_slots: id 3 | task 592 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
  273. slot update_slots: id 3 | task 592 | prompt done, n_past = 207, n_tokens = 207
  274. slot update_slots: id 3 | task 592 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  275. slot update_slots: id 3 | task 592 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  276. slot release: id 3 | task 592 | stop processing: n_past = 153, truncated = 1
  277. slot print_timing: id 3 | task 592 |
  278. prompt eval time = 41.20 ms / 207 tokens ( 0.20 ms per token, 5024.27 tokens per second)
  279. eval time = 2402.52 ms / 201 tokens ( 11.95 ms per token, 83.66 tokens per second)
  280. total time = 2443.72 ms / 408 tokens
  281. srv update_slots: all slots are idle
  282. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  283. srv params_from_: Chat format: Content-only
  284. slot launch_slot_: id 4 | task 794 | processing task
  285. slot update_slots: id 4 | task 794 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  286. slot update_slots: id 4 | task 794 | kv cache rm [0, end)
  287. slot update_slots: id 4 | task 794 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
  288. slot update_slots: id 4 | task 794 | prompt done, n_past = 207, n_tokens = 207
  289. slot update_slots: id 4 | task 794 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  290. slot update_slots: id 4 | task 794 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  291. slot release: id 4 | task 794 | stop processing: n_past = 161, truncated = 1
  292. slot print_timing: id 4 | task 794 |
  293. prompt eval time = 41.64 ms / 207 tokens ( 0.20 ms per token, 4971.66 tokens per second)
  294. eval time = 2510.65 ms / 209 tokens ( 12.01 ms per token, 83.25 tokens per second)
  295. total time = 2552.29 ms / 416 tokens
  296. srv update_slots: all slots are idle
  297. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  298. srv params_from_: Chat format: Content-only
  299. slot launch_slot_: id 5 | task 1004 | processing task
  300. slot update_slots: id 5 | task 1004 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  301. slot update_slots: id 5 | task 1004 | kv cache rm [0, end)
  302. slot update_slots: id 5 | task 1004 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
  303. slot update_slots: id 5 | task 1004 | prompt done, n_past = 207, n_tokens = 207
  304. slot update_slots: id 5 | task 1004 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  305. slot update_slots: id 5 | task 1004 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  306. slot release: id 5 | task 1004 | stop processing: n_past = 140, truncated = 1
  307. slot print_timing: id 5 | task 1004 |
  308. prompt eval time = 41.84 ms / 207 tokens ( 0.20 ms per token, 4947.42 tokens per second)
  309. eval time = 2263.88 ms / 188 tokens ( 12.04 ms per token, 83.04 tokens per second)
  310. total time = 2305.72 ms / 395 tokens
  311. srv update_slots: all slots are idle
  312. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  313. srv params_from_: Chat format: Content-only
  314. slot launch_slot_: id 6 | task 1193 | processing task
  315. slot update_slots: id 6 | task 1193 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  316. slot update_slots: id 6 | task 1193 | kv cache rm [0, end)
  317. slot update_slots: id 6 | task 1193 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
  318. slot update_slots: id 6 | task 1193 | prompt done, n_past = 207, n_tokens = 207
  319. slot update_slots: id 6 | task 1193 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  320. slot update_slots: id 6 | task 1193 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  321. slot release: id 6 | task 1193 | stop processing: n_past = 161, truncated = 1
  322. slot print_timing: id 6 | task 1193 |
  323. prompt eval time = 43.14 ms / 207 tokens ( 0.21 ms per token, 4798.00 tokens per second)
  324. eval time = 2503.51 ms / 209 tokens ( 11.98 ms per token, 83.48 tokens per second)
  325. total time = 2546.65 ms / 416 tokens
  326. srv update_slots: all slots are idle
  327. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  328. srv params_from_: Chat format: Content-only
  329. slot launch_slot_: id 7 | task 1403 | processing task
  330. slot update_slots: id 7 | task 1403 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  331. slot update_slots: id 7 | task 1403 | kv cache rm [0, end)
  332. slot update_slots: id 7 | task 1403 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
  333. slot update_slots: id 7 | task 1403 | prompt done, n_past = 207, n_tokens = 207
  334. slot update_slots: id 7 | task 1403 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  335. slot update_slots: id 7 | task 1403 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  336. slot release: id 7 | task 1403 | stop processing: n_past = 144, truncated = 1
  337. slot print_timing: id 7 | task 1403 |
  338. prompt eval time = 45.63 ms / 207 tokens ( 0.22 ms per token, 4536.89 tokens per second)
  339. eval time = 2275.58 ms / 192 tokens ( 11.85 ms per token, 84.37 tokens per second)
  340. total time = 2321.21 ms / 399 tokens
  341. srv update_slots: all slots are idle
  342. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  343. srv params_from_: Chat format: Content-only
  344. slot launch_slot_: id 8 | task 1596 | processing task
  345. slot update_slots: id 8 | task 1596 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  346. slot update_slots: id 8 | task 1596 | kv cache rm [0, end)
  347. slot update_slots: id 8 | task 1596 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
  348. slot update_slots: id 8 | task 1596 | prompt done, n_past = 207, n_tokens = 207
  349. slot update_slots: id 8 | task 1596 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  350. slot update_slots: id 8 | task 1596 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  351. slot release: id 8 | task 1596 | stop processing: n_past = 155, truncated = 1
  352. slot print_timing: id 8 | task 1596 |
  353. prompt eval time = 42.49 ms / 207 tokens ( 0.21 ms per token, 4871.62 tokens per second)
  354. eval time = 2412.59 ms / 203 tokens ( 11.88 ms per token, 84.14 tokens per second)
  355. total time = 2455.08 ms / 410 tokens
  356. srv update_slots: all slots are idle
  357. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  358. srv params_from_: Chat format: Content-only
  359. slot launch_slot_: id 9 | task 1800 | processing task
  360. slot update_slots: id 9 | task 1800 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  361. slot update_slots: id 9 | task 1800 | kv cache rm [0, end)
  362. slot update_slots: id 9 | task 1800 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
  363. slot update_slots: id 9 | task 1800 | prompt done, n_past = 207, n_tokens = 207
  364. slot update_slots: id 9 | task 1800 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  365. slot update_slots: id 9 | task 1800 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  366. slot release: id 9 | task 1800 | stop processing: n_past = 171, truncated = 1
  367. slot print_timing: id 9 | task 1800 |
  368. prompt eval time = 42.85 ms / 207 tokens ( 0.21 ms per token, 4830.58 tokens per second)
  369. eval time = 2594.06 ms / 219 tokens ( 11.85 ms per token, 84.42 tokens per second)
  370. total time = 2636.91 ms / 426 tokens
  371. srv update_slots: all slots are idle
  372. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  373. srv params_from_: Chat format: Content-only
  374. slot launch_slot_: id 10 | task 2020 | processing task
  375. slot update_slots: id 10 | task 2020 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  376. slot update_slots: id 10 | task 2020 | kv cache rm [0, end)
  377. slot update_slots: id 10 | task 2020 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
  378. slot update_slots: id 10 | task 2020 | prompt done, n_past = 207, n_tokens = 207
  379. slot update_slots: id 10 | task 2020 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  380. slot update_slots: id 10 | task 2020 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  381. slot release: id 10 | task 2020 | stop processing: n_past = 160, truncated = 1
  382. slot print_timing: id 10 | task 2020 |
  383. prompt eval time = 42.82 ms / 207 tokens ( 0.21 ms per token, 4834.64 tokens per second)
  384. eval time = 2471.02 ms / 208 tokens ( 11.88 ms per token, 84.18 tokens per second)
  385. total time = 2513.83 ms / 415 tokens
  386. srv update_slots: all slots are idle
  387. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  388. srv params_from_: Chat format: Content-only
  389. slot launch_slot_: id 11 | task 2229 | processing task
  390. slot update_slots: id 11 | task 2229 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  391. slot update_slots: id 11 | task 2229 | kv cache rm [0, end)
  392. slot update_slots: id 11 | task 2229 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
  393. slot update_slots: id 11 | task 2229 | prompt done, n_past = 207, n_tokens = 207
  394. slot update_slots: id 11 | task 2229 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  395. slot update_slots: id 11 | task 2229 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  396. slot release: id 11 | task 2229 | stop processing: n_past = 156, truncated = 1
  397. slot print_timing: id 11 | task 2229 |
  398. prompt eval time = 42.50 ms / 207 tokens ( 0.21 ms per token, 4870.82 tokens per second)
  399. eval time = 2422.91 ms / 204 tokens ( 11.88 ms per token, 84.20 tokens per second)
  400. total time = 2465.41 ms / 411 tokens
  401. srv update_slots: all slots are idle
  402. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  403. srv params_from_: Chat format: Content-only
  404. slot launch_slot_: id 12 | task 2434 | processing task
  405. slot update_slots: id 12 | task 2434 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  406. slot update_slots: id 12 | task 2434 | kv cache rm [0, end)
  407. slot update_slots: id 12 | task 2434 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
  408. slot update_slots: id 12 | task 2434 | prompt done, n_past = 207, n_tokens = 207
  409. slot update_slots: id 12 | task 2434 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  410. slot update_slots: id 12 | task 2434 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  411. slot release: id 12 | task 2434 | stop processing: n_past = 146, truncated = 1
  412. slot print_timing: id 12 | task 2434 |
  413. prompt eval time = 42.82 ms / 207 tokens ( 0.21 ms per token, 4833.96 tokens per second)
  414. eval time = 2301.92 ms / 194 tokens ( 11.87 ms per token, 84.28 tokens per second)
  415. total time = 2344.74 ms / 401 tokens
  416. srv update_slots: all slots are idle
  417. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  418. srv params_from_: Chat format: Content-only
  419. slot launch_slot_: id 13 | task 2629 | processing task
  420. slot update_slots: id 13 | task 2629 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  421. slot update_slots: id 13 | task 2629 | kv cache rm [0, end)
  422. slot update_slots: id 13 | task 2629 | prompt processing progress, n_past = 207, n_tokens = 207, progress = 1.000000
  423. slot update_slots: id 13 | task 2629 | prompt done, n_past = 207, n_tokens = 207
  424. slot update_slots: id 13 | task 2629 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  425. srv params_from_: Chat format: Content-only
  426. slot launch_slot_: id 14 | task 2712 | processing task
  427. slot update_slots: id 14 | task 2712 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  428. slot update_slots: id 14 | task 2712 | kv cache rm [0, end)
  429. slot update_slots: id 14 | task 2712 | prompt processing progress, n_past = 207, n_tokens = 208, progress = 1.000000
  430. slot update_slots: id 14 | task 2712 | prompt done, n_past = 207, n_tokens = 208
  431. slot update_slots: id 14 | task 2712 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  432. srv params_from_: Chat format: Content-only
  433. slot launch_slot_: id 15 | task 2772 | processing task
  434. slot update_slots: id 15 | task 2772 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  435. slot update_slots: id 15 | task 2772 | kv cache rm [0, end)
  436. slot update_slots: id 15 | task 2772 | prompt processing progress, n_past = 207, n_tokens = 209, progress = 1.000000
  437. slot update_slots: id 15 | task 2772 | prompt done, n_past = 207, n_tokens = 209
  438. slot update_slots: id 13 | task 2629 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  439. slot release: id 13 | task 2629 | stop processing: n_past = 130, truncated = 1
  440. slot print_timing: id 13 | task 2629 |
  441. prompt eval time = 43.16 ms / 207 tokens ( 0.21 ms per token, 4795.89 tokens per second)
  442. eval time = 2608.10 ms / 178 tokens ( 14.65 ms per token, 68.25 tokens per second)
  443. total time = 2651.26 ms / 385 tokens
  444. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  445. slot update_slots: id 15 | task 2772 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  446. srv params_from_: Chat format: Content-only
  447. slot launch_slot_: id 16 | task 2831 | processing task
  448. slot update_slots: id 16 | task 2831 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  449. slot update_slots: id 16 | task 2831 | kv cache rm [0, end)
  450. slot update_slots: id 16 | task 2831 | prompt processing progress, n_past = 207, n_tokens = 209, progress = 1.000000
  451. slot update_slots: id 16 | task 2831 | prompt done, n_past = 207, n_tokens = 209
  452. slot update_slots: id 16 | task 2831 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  453. srv params_from_: Chat format: Content-only
  454. slot launch_slot_: id 17 | task 2890 | processing task
  455. slot update_slots: id 17 | task 2890 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  456. slot update_slots: id 17 | task 2890 | kv cache rm [0, end)
  457. slot update_slots: id 17 | task 2890 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  458. slot update_slots: id 17 | task 2890 | prompt done, n_past = 207, n_tokens = 210
  459. slot update_slots: id 14 | task 2712 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  460. slot release: id 14 | task 2712 | stop processing: n_past = 146, truncated = 1
  461. slot print_timing: id 14 | task 2712 |
  462. prompt eval time = 42.85 ms / 207 tokens ( 0.21 ms per token, 4830.58 tokens per second)
  463. eval time = 3306.65 ms / 194 tokens ( 17.04 ms per token, 58.67 tokens per second)
  464. total time = 3349.51 ms / 401 tokens
  465. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  466. slot update_slots: id 17 | task 2890 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  467. srv params_from_: Chat format: Content-only
  468. slot launch_slot_: id 18 | task 2948 | processing task
  469. slot update_slots: id 18 | task 2948 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  470. slot update_slots: id 18 | task 2948 | kv cache rm [0, end)
  471. slot update_slots: id 18 | task 2948 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  472. slot update_slots: id 18 | task 2948 | prompt done, n_past = 207, n_tokens = 210
  473. slot update_slots: id 15 | task 2772 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  474. slot release: id 15 | task 2772 | stop processing: n_past = 155, truncated = 1
  475. slot print_timing: id 15 | task 2772 |
  476. prompt eval time = 43.00 ms / 207 tokens ( 0.21 ms per token, 4813.62 tokens per second)
  477. eval time = 3615.56 ms / 203 tokens ( 17.81 ms per token, 56.15 tokens per second)
  478. total time = 3658.56 ms / 410 tokens
  479. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  480. srv params_from_: Chat format: Content-only
  481. slot launch_slot_: id 19 | task 2998 | processing task
  482. slot update_slots: id 18 | task 2948 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  483. slot update_slots: id 19 | task 2998 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  484. slot update_slots: id 19 | task 2998 | kv cache rm [0, end)
  485. slot update_slots: id 19 | task 2998 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  486. slot update_slots: id 19 | task 2998 | prompt done, n_past = 207, n_tokens = 210
  487. slot update_slots: id 16 | task 2831 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  488. slot release: id 16 | task 2831 | stop processing: n_past = 153, truncated = 1
  489. slot print_timing: id 16 | task 2831 |
  490. prompt eval time = 43.28 ms / 207 tokens ( 0.21 ms per token, 4782.37 tokens per second)
  491. eval time = 3638.73 ms / 201 tokens ( 18.10 ms per token, 55.24 tokens per second)
  492. total time = 3682.01 ms / 408 tokens
  493. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  494. slot update_slots: id 19 | task 2998 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  495. srv params_from_: Chat format: Content-only
  496. slot launch_slot_: id 20 | task 3055 | processing task
  497. slot update_slots: id 20 | task 3055 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  498. slot update_slots: id 20 | task 3055 | kv cache rm [0, end)
  499. slot update_slots: id 20 | task 3055 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  500. slot update_slots: id 20 | task 3055 | prompt done, n_past = 207, n_tokens = 210
  501. slot update_slots: id 17 | task 2890 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  502. slot release: id 17 | task 2890 | stop processing: n_past = 133, truncated = 1
  503. slot print_timing: id 17 | task 2890 |
  504. prompt eval time = 44.34 ms / 207 tokens ( 0.21 ms per token, 4667.94 tokens per second)
  505. eval time = 3326.14 ms / 181 tokens ( 18.38 ms per token, 54.42 tokens per second)
  506. total time = 3370.49 ms / 388 tokens
  507. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  508. slot update_slots: id 20 | task 3055 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  509. srv params_from_: Chat format: Content-only
  510. slot launch_slot_: id 21 | task 3112 | processing task
  511. slot update_slots: id 21 | task 3112 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  512. slot update_slots: id 21 | task 3112 | kv cache rm [0, end)
  513. slot update_slots: id 21 | task 3112 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  514. slot update_slots: id 21 | task 3112 | prompt done, n_past = 207, n_tokens = 210
  515. slot update_slots: id 18 | task 2948 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  516. slot update_slots: id 21 | task 3112 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  517. slot release: id 18 | task 2948 | stop processing: n_past = 166, truncated = 1
  518. slot print_timing: id 18 | task 2948 |
  519. prompt eval time = 44.65 ms / 207 tokens ( 0.22 ms per token, 4636.27 tokens per second)
  520. eval time = 3943.99 ms / 214 tokens ( 18.43 ms per token, 54.26 tokens per second)
  521. total time = 3988.64 ms / 421 tokens
  522. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  523. srv params_from_: Chat format: Content-only
  524. slot launch_slot_: id 22 | task 3167 | processing task
  525. slot update_slots: id 22 | task 3167 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  526. slot update_slots: id 22 | task 3167 | kv cache rm [0, end)
  527. slot update_slots: id 22 | task 3167 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  528. slot update_slots: id 22 | task 3167 | prompt done, n_past = 207, n_tokens = 210
  529. slot update_slots: id 19 | task 2998 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  530. slot release: id 19 | task 2998 | stop processing: n_past = 140, truncated = 1
  531. slot print_timing: id 19 | task 2998 |
  532. prompt eval time = 47.97 ms / 207 tokens ( 0.23 ms per token, 4315.56 tokens per second)
  533. eval time = 3382.29 ms / 188 tokens ( 17.99 ms per token, 55.58 tokens per second)
  534. total time = 3430.25 ms / 395 tokens
  535. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  536. slot update_slots: id 22 | task 3167 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  537. srv params_from_: Chat format: Content-only
  538. slot launch_slot_: id 23 | task 3222 | processing task
  539. slot update_slots: id 23 | task 3222 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  540. slot update_slots: id 23 | task 3222 | kv cache rm [0, end)
  541. slot update_slots: id 23 | task 3222 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  542. slot update_slots: id 23 | task 3222 | prompt done, n_past = 207, n_tokens = 210
  543. slot update_slots: id 20 | task 3055 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  544. slot release: id 20 | task 3055 | stop processing: n_past = 156, truncated = 1
  545. slot print_timing: id 20 | task 3055 |
  546. prompt eval time = 44.29 ms / 207 tokens ( 0.21 ms per token, 4673.64 tokens per second)
  547. eval time = 3700.86 ms / 204 tokens ( 18.14 ms per token, 55.12 tokens per second)
  548. total time = 3745.15 ms / 411 tokens
  549. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  550. slot update_slots: id 23 | task 3222 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  551. srv params_from_: Chat format: Content-only
  552. slot launch_slot_: id 24 | task 3277 | processing task
  553. slot update_slots: id 24 | task 3277 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  554. slot update_slots: id 24 | task 3277 | kv cache rm [0, end)
  555. slot update_slots: id 24 | task 3277 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  556. slot update_slots: id 24 | task 3277 | prompt done, n_past = 207, n_tokens = 210
  557. slot update_slots: id 21 | task 3112 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  558. slot release: id 21 | task 3112 | stop processing: n_past = 149, truncated = 1
  559. slot print_timing: id 21 | task 3112 |
  560. prompt eval time = 45.40 ms / 207 tokens ( 0.22 ms per token, 4559.97 tokens per second)
  561. eval time = 3891.56 ms / 197 tokens ( 19.75 ms per token, 50.62 tokens per second)
  562. total time = 3936.96 ms / 404 tokens
  563. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  564. srv params_from_: Chat format: Content-only
  565. slot launch_slot_: id 25 | task 3316 | processing task
  566. slot update_slots: id 25 | task 3316 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  567. slot update_slots: id 25 | task 3316 | kv cache rm [0, end)
  568. slot update_slots: id 25 | task 3316 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  569. slot update_slots: id 25 | task 3316 | prompt done, n_past = 207, n_tokens = 210
  570. slot update_slots: id 24 | task 3277 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  571. slot update_slots: id 22 | task 3167 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  572. slot release: id 22 | task 3167 | stop processing: n_past = 143, truncated = 1
  573. slot print_timing: id 22 | task 3167 |
  574. prompt eval time = 44.68 ms / 207 tokens ( 0.22 ms per token, 4632.95 tokens per second)
  575. eval time = 3772.56 ms / 191 tokens ( 19.75 ms per token, 50.63 tokens per second)
  576. total time = 3817.24 ms / 398 tokens
  577. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  578. slot update_slots: id 25 | task 3316 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  579. srv params_from_: Chat format: Content-only
  580. slot launch_slot_: id 26 | task 3372 | processing task
  581. slot update_slots: id 26 | task 3372 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  582. slot update_slots: id 26 | task 3372 | kv cache rm [0, end)
  583. slot update_slots: id 26 | task 3372 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  584. slot update_slots: id 26 | task 3372 | prompt done, n_past = 207, n_tokens = 210
  585. slot update_slots: id 23 | task 3222 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  586. slot update_slots: id 26 | task 3372 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  587. srv params_from_: Chat format: Content-only
  588. slot launch_slot_: id 27 | task 3429 | processing task
  589. slot update_slots: id 27 | task 3429 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  590. slot update_slots: id 27 | task 3429 | kv cache rm [0, end)
  591. slot update_slots: id 27 | task 3429 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  592. slot update_slots: id 27 | task 3429 | prompt done, n_past = 207, n_tokens = 211
  593. slot release: id 23 | task 3222 | stop processing: n_past = 172, truncated = 1
  594. slot print_timing: id 23 | task 3222 |
  595. prompt eval time = 45.63 ms / 207 tokens ( 0.22 ms per token, 4536.69 tokens per second)
  596. eval time = 4322.02 ms / 220 tokens ( 19.65 ms per token, 50.90 tokens per second)
  597. total time = 4367.64 ms / 427 tokens
  598. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  599. slot update_slots: id 24 | task 3277 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  600. slot release: id 24 | task 3277 | stop processing: n_past = 139, truncated = 1
  601. slot print_timing: id 24 | task 3277 |
  602. prompt eval time = 46.62 ms / 207 tokens ( 0.23 ms per token, 4440.15 tokens per second)
  603. eval time = 3687.68 ms / 187 tokens ( 19.72 ms per token, 50.71 tokens per second)
  604. total time = 3734.30 ms / 394 tokens
  605. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  606. slot update_slots: id 27 | task 3429 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  607. srv params_from_: Chat format: Content-only
  608. slot launch_slot_: id 28 | task 3481 | processing task
  609. slot update_slots: id 28 | task 3481 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  610. slot update_slots: id 28 | task 3481 | kv cache rm [0, end)
  611. slot update_slots: id 28 | task 3481 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  612. slot update_slots: id 28 | task 3481 | prompt done, n_past = 207, n_tokens = 210
  613. slot update_slots: id 25 | task 3316 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  614. slot release: id 25 | task 3316 | stop processing: n_past = 153, truncated = 1
  615. slot print_timing: id 25 | task 3316 |
  616. prompt eval time = 45.28 ms / 207 tokens ( 0.22 ms per token, 4571.45 tokens per second)
  617. eval time = 3932.11 ms / 201 tokens ( 19.56 ms per token, 51.12 tokens per second)
  618. total time = 3977.39 ms / 408 tokens
  619. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  620. srv params_from_: Chat format: Content-only
  621. slot launch_slot_: id 29 | task 3522 | processing task
  622. slot update_slots: id 29 | task 3522 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  623. slot update_slots: id 29 | task 3522 | kv cache rm [0, end)
  624. slot update_slots: id 29 | task 3522 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  625. slot update_slots: id 29 | task 3522 | prompt done, n_past = 207, n_tokens = 210
  626. slot update_slots: id 28 | task 3481 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  627. slot update_slots: id 26 | task 3372 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  628. slot update_slots: id 29 | task 3522 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  629. srv params_from_: Chat format: Content-only
  630. slot launch_slot_: id 30 | task 3576 | processing task
  631. slot update_slots: id 30 | task 3576 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  632. slot update_slots: id 30 | task 3576 | kv cache rm [0, end)
  633. slot update_slots: id 30 | task 3576 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  634. slot update_slots: id 30 | task 3576 | prompt done, n_past = 207, n_tokens = 211
  635. slot release: id 26 | task 3372 | stop processing: n_past = 157, truncated = 1
  636. slot print_timing: id 26 | task 3372 |
  637. prompt eval time = 46.45 ms / 207 tokens ( 0.22 ms per token, 4456.50 tokens per second)
  638. eval time = 4071.36 ms / 205 tokens ( 19.86 ms per token, 50.35 tokens per second)
  639. total time = 4117.81 ms / 412 tokens
  640. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  641. slot update_slots: id 27 | task 3429 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  642. slot release: id 27 | task 3429 | stop processing: n_past = 138, truncated = 1
  643. slot print_timing: id 27 | task 3429 |
  644. prompt eval time = 47.07 ms / 207 tokens ( 0.23 ms per token, 4397.33 tokens per second)
  645. eval time = 3728.56 ms / 186 tokens ( 20.05 ms per token, 49.89 tokens per second)
  646. total time = 3775.64 ms / 393 tokens
  647. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  648. slot update_slots: id 30 | task 3576 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  649. srv params_from_: Chat format: Content-only
  650. slot launch_slot_: id 31 | task 3628 | processing task
  651. slot update_slots: id 31 | task 3628 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  652. slot update_slots: id 31 | task 3628 | kv cache rm [0, end)
  653. slot update_slots: id 31 | task 3628 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  654. slot update_slots: id 31 | task 3628 | prompt done, n_past = 207, n_tokens = 210
  655. slot update_slots: id 28 | task 3481 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  656. srv params_from_: Chat format: Content-only
  657. slot launch_slot_: id 32 | task 3673 | processing task
  658. slot update_slots: id 32 | task 3673 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  659. slot update_slots: id 32 | task 3673 | kv cache rm [0, end)
  660. slot update_slots: id 32 | task 3673 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  661. slot update_slots: id 32 | task 3673 | prompt done, n_past = 207, n_tokens = 211
  662. slot update_slots: id 31 | task 3628 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  663. slot release: id 28 | task 3481 | stop processing: n_past = 160, truncated = 1
  664. slot print_timing: id 28 | task 3481 |
  665. prompt eval time = 46.00 ms / 207 tokens ( 0.22 ms per token, 4500.29 tokens per second)
  666. eval time = 4134.51 ms / 208 tokens ( 19.88 ms per token, 50.31 tokens per second)
  667. total time = 4180.50 ms / 415 tokens
  668. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  669. slot update_slots: id 29 | task 3522 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  670. slot update_slots: id 32 | task 3673 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  671. srv params_from_: Chat format: Content-only
  672. slot launch_slot_: id 33 | task 3727 | processing task
  673. slot update_slots: id 33 | task 3727 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  674. slot update_slots: id 33 | task 3727 | kv cache rm [0, end)
  675. slot update_slots: id 33 | task 3727 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  676. slot update_slots: id 33 | task 3727 | prompt done, n_past = 207, n_tokens = 211
  677. slot release: id 29 | task 3522 | stop processing: n_past = 154, truncated = 1
  678. slot print_timing: id 29 | task 3522 |
  679. prompt eval time = 45.46 ms / 207 tokens ( 0.22 ms per token, 4553.45 tokens per second)
  680. eval time = 3992.98 ms / 202 tokens ( 19.77 ms per token, 50.59 tokens per second)
  681. total time = 4038.44 ms / 409 tokens
  682. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  683. slot update_slots: id 30 | task 3576 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  684. slot update_slots: id 33 | task 3727 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  685. srv params_from_: Chat format: Content-only
  686. slot launch_slot_: id 34 | task 3782 | processing task
  687. slot update_slots: id 34 | task 3782 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  688. slot update_slots: id 34 | task 3782 | kv cache rm [0, end)
  689. slot update_slots: id 34 | task 3782 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  690. slot update_slots: id 34 | task 3782 | prompt done, n_past = 207, n_tokens = 211
  691. slot release: id 30 | task 3576 | stop processing: n_past = 158, truncated = 1
  692. slot print_timing: id 30 | task 3576 |
  693. prompt eval time = 46.67 ms / 207 tokens ( 0.23 ms per token, 4435.78 tokens per second)
  694. eval time = 4064.30 ms / 206 tokens ( 19.73 ms per token, 50.69 tokens per second)
  695. total time = 4110.97 ms / 413 tokens
  696. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  697. slot update_slots: id 31 | task 3628 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  698. slot release: id 31 | task 3628 | stop processing: n_past = 151, truncated = 1
  699. slot print_timing: id 31 | task 3628 |
  700. prompt eval time = 45.52 ms / 207 tokens ( 0.22 ms per token, 4547.55 tokens per second)
  701. eval time = 3669.50 ms / 199 tokens ( 18.44 ms per token, 54.23 tokens per second)
  702. total time = 3715.02 ms / 406 tokens
  703. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  704. slot update_slots: id 34 | task 3782 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  705. srv params_from_: Chat format: Content-only
  706. slot launch_slot_: id 35 | task 3837 | processing task
  707. slot update_slots: id 35 | task 3837 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  708. slot update_slots: id 35 | task 3837 | kv cache rm [0, end)
  709. slot update_slots: id 35 | task 3837 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  710. slot update_slots: id 35 | task 3837 | prompt done, n_past = 207, n_tokens = 210
  711. slot update_slots: id 32 | task 3673 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  712. slot release: id 32 | task 3673 | stop processing: n_past = 151, truncated = 1
  713. slot print_timing: id 32 | task 3673 |
  714. prompt eval time = 46.65 ms / 207 tokens ( 0.23 ms per token, 4437.58 tokens per second)
  715. eval time = 3654.16 ms / 199 tokens ( 18.36 ms per token, 54.46 tokens per second)
  716. total time = 3700.80 ms / 406 tokens
  717. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  718. slot update_slots: id 35 | task 3837 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  719. srv params_from_: Chat format: Content-only
  720. slot launch_slot_: id 36 | task 3893 | processing task
  721. slot update_slots: id 36 | task 3893 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  722. slot update_slots: id 36 | task 3893 | kv cache rm [0, end)
  723. slot update_slots: id 36 | task 3893 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  724. slot update_slots: id 36 | task 3893 | prompt done, n_past = 207, n_tokens = 210
  725. slot update_slots: id 33 | task 3727 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  726. slot release: id 33 | task 3727 | stop processing: n_past = 146, truncated = 1
  727. slot print_timing: id 33 | task 3727 |
  728. prompt eval time = 46.41 ms / 207 tokens ( 0.22 ms per token, 4460.73 tokens per second)
  729. eval time = 3717.80 ms / 194 tokens ( 19.16 ms per token, 52.18 tokens per second)
  730. total time = 3764.21 ms / 401 tokens
  731. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  732. srv params_from_: Chat format: Content-only
  733. slot launch_slot_: id 37 | task 3939 | processing task
  734. slot update_slots: id 37 | task 3939 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  735. slot update_slots: id 37 | task 3939 | kv cache rm [0, end)
  736. slot update_slots: id 37 | task 3939 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  737. slot update_slots: id 37 | task 3939 | prompt done, n_past = 207, n_tokens = 210
  738. slot update_slots: id 36 | task 3893 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  739. slot update_slots: id 34 | task 3782 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  740. slot release: id 34 | task 3782 | stop processing: n_past = 151, truncated = 1
  741. slot print_timing: id 34 | task 3782 |
  742. prompt eval time = 47.33 ms / 207 tokens ( 0.23 ms per token, 4373.36 tokens per second)
  743. eval time = 3811.68 ms / 199 tokens ( 19.15 ms per token, 52.21 tokens per second)
  744. total time = 3859.01 ms / 406 tokens
  745. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  746. slot update_slots: id 37 | task 3939 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  747. srv params_from_: Chat format: Content-only
  748. slot launch_slot_: id 38 | task 3993 | processing task
  749. slot update_slots: id 38 | task 3993 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  750. slot update_slots: id 38 | task 3993 | kv cache rm [0, end)
  751. slot update_slots: id 38 | task 3993 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  752. slot update_slots: id 38 | task 3993 | prompt done, n_past = 207, n_tokens = 210
  753. slot update_slots: id 35 | task 3837 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  754. slot release: id 35 | task 3837 | stop processing: n_past = 132, truncated = 1
  755. slot print_timing: id 35 | task 3837 |
  756. prompt eval time = 47.18 ms / 207 tokens ( 0.23 ms per token, 4387.17 tokens per second)
  757. eval time = 3452.44 ms / 180 tokens ( 19.18 ms per token, 52.14 tokens per second)
  758. total time = 3499.62 ms / 387 tokens
  759. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  760. slot update_slots: id 38 | task 3993 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  761. srv params_from_: Chat format: Content-only
  762. slot launch_slot_: id 39 | task 4045 | processing task
  763. slot update_slots: id 39 | task 4045 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  764. slot update_slots: id 39 | task 4045 | kv cache rm [0, end)
  765. slot update_slots: id 39 | task 4045 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  766. slot update_slots: id 39 | task 4045 | prompt done, n_past = 207, n_tokens = 210
  767. slot update_slots: id 36 | task 3893 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  768. slot release: id 36 | task 3893 | stop processing: n_past = 140, truncated = 1
  769. slot print_timing: id 36 | task 3893 |
  770. prompt eval time = 46.98 ms / 207 tokens ( 0.23 ms per token, 4406.32 tokens per second)
  771. eval time = 3875.64 ms / 188 tokens ( 20.62 ms per token, 48.51 tokens per second)
  772. total time = 3922.62 ms / 395 tokens
  773. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  774. srv params_from_: Chat format: Content-only
  775. slot launch_slot_: id 40 | task 4089 | processing task
  776. slot update_slots: id 40 | task 4089 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  777. slot update_slots: id 40 | task 4089 | kv cache rm [0, end)
  778. slot update_slots: id 40 | task 4089 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  779. slot update_slots: id 40 | task 4089 | prompt done, n_past = 207, n_tokens = 210
  780. slot update_slots: id 39 | task 4045 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  781. slot update_slots: id 37 | task 3939 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  782. slot update_slots: id 40 | task 4089 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  783. srv params_from_: Chat format: Content-only
  784. slot launch_slot_: id 41 | task 4144 | processing task
  785. slot update_slots: id 41 | task 4144 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  786. slot update_slots: id 41 | task 4144 | kv cache rm [0, end)
  787. slot update_slots: id 41 | task 4144 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  788. slot update_slots: id 41 | task 4144 | prompt done, n_past = 207, n_tokens = 211
  789. slot release: id 37 | task 3939 | stop processing: n_past = 161, truncated = 1
  790. slot print_timing: id 37 | task 3939 |
  791. prompt eval time = 47.40 ms / 207 tokens ( 0.23 ms per token, 4366.72 tokens per second)
  792. eval time = 4136.98 ms / 209 tokens ( 19.79 ms per token, 50.52 tokens per second)
  793. total time = 4184.38 ms / 416 tokens
  794. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  795. slot update_slots: id 38 | task 3993 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  796. slot release: id 38 | task 3993 | stop processing: n_past = 143, truncated = 1
  797. slot print_timing: id 38 | task 3993 |
  798. prompt eval time = 47.26 ms / 207 tokens ( 0.23 ms per token, 4379.75 tokens per second)
  799. eval time = 3824.10 ms / 191 tokens ( 20.02 ms per token, 49.95 tokens per second)
  800. total time = 3871.36 ms / 398 tokens
  801. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  802. slot update_slots: id 41 | task 4144 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  803. srv params_from_: Chat format: Content-only
  804. slot launch_slot_: id 42 | task 4195 | processing task
  805. slot update_slots: id 42 | task 4195 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  806. slot update_slots: id 42 | task 4195 | kv cache rm [0, end)
  807. slot update_slots: id 42 | task 4195 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  808. slot update_slots: id 42 | task 4195 | prompt done, n_past = 207, n_tokens = 210
  809. slot update_slots: id 39 | task 4045 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  810. srv params_from_: Chat format: Content-only
  811. slot launch_slot_: id 43 | task 4237 | processing task
  812. slot update_slots: id 43 | task 4237 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  813. slot update_slots: id 43 | task 4237 | kv cache rm [0, end)
  814. slot update_slots: id 43 | task 4237 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  815. slot update_slots: id 43 | task 4237 | prompt done, n_past = 207, n_tokens = 211
  816. slot release: id 39 | task 4045 | stop processing: n_past = 144, truncated = 1
  817. slot print_timing: id 39 | task 4045 |
  818. prompt eval time = 47.00 ms / 207 tokens ( 0.23 ms per token, 4404.07 tokens per second)
  819. eval time = 3866.72 ms / 192 tokens ( 20.14 ms per token, 49.65 tokens per second)
  820. total time = 3913.72 ms / 399 tokens
  821. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  822. slot update_slots: id 42 | task 4195 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  823. slot update_slots: id 40 | task 4089 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  824. slot update_slots: id 43 | task 4237 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  825. srv params_from_: Chat format: Content-only
  826. slot launch_slot_: id 44 | task 4289 | processing task
  827. slot update_slots: id 44 | task 4289 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  828. slot update_slots: id 44 | task 4289 | kv cache rm [0, end)
  829. slot update_slots: id 44 | task 4289 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  830. slot update_slots: id 44 | task 4289 | prompt done, n_past = 207, n_tokens = 211
  831. slot release: id 40 | task 4089 | stop processing: n_past = 166, truncated = 1
  832. slot print_timing: id 40 | task 4089 |
  833. prompt eval time = 48.32 ms / 207 tokens ( 0.23 ms per token, 4284.03 tokens per second)
  834. eval time = 4326.17 ms / 214 tokens ( 20.22 ms per token, 49.47 tokens per second)
  835. total time = 4374.49 ms / 421 tokens
  836. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  837. slot update_slots: id 41 | task 4144 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  838. slot release: id 41 | task 4144 | stop processing: n_past = 132, truncated = 1
  839. slot print_timing: id 41 | task 4144 |
  840. prompt eval time = 48.13 ms / 207 tokens ( 0.23 ms per token, 4300.58 tokens per second)
  841. eval time = 3688.72 ms / 180 tokens ( 20.49 ms per token, 48.80 tokens per second)
  842. total time = 3736.85 ms / 387 tokens
  843. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  844. slot update_slots: id 44 | task 4289 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  845. srv params_from_: Chat format: Content-only
  846. slot launch_slot_: id 45 | task 4341 | processing task
  847. slot update_slots: id 45 | task 4341 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  848. slot update_slots: id 45 | task 4341 | kv cache rm [0, end)
  849. slot update_slots: id 45 | task 4341 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  850. slot update_slots: id 45 | task 4341 | prompt done, n_past = 207, n_tokens = 210
  851. slot update_slots: id 42 | task 4195 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  852. srv params_from_: Chat format: Content-only
  853. slot launch_slot_: id 46 | task 4384 | processing task
  854. slot update_slots: id 46 | task 4384 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  855. slot update_slots: id 46 | task 4384 | kv cache rm [0, end)
  856. slot update_slots: id 46 | task 4384 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  857. slot update_slots: id 46 | task 4384 | prompt done, n_past = 207, n_tokens = 211
  858. slot update_slots: id 45 | task 4341 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  859. slot release: id 42 | task 4195 | stop processing: n_past = 151, truncated = 1
  860. slot print_timing: id 42 | task 4195 |
  861. prompt eval time = 291.92 ms / 207 tokens ( 1.41 ms per token, 709.09 tokens per second)
  862. eval time = 3998.35 ms / 199 tokens ( 20.09 ms per token, 49.77 tokens per second)
  863. total time = 4290.28 ms / 406 tokens
  864. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  865. slot update_slots: id 43 | task 4237 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  866. slot release: id 43 | task 4237 | stop processing: n_past = 143, truncated = 1
  867. slot print_timing: id 43 | task 4237 |
  868. prompt eval time = 49.39 ms / 207 tokens ( 0.24 ms per token, 4191.22 tokens per second)
  869. eval time = 3843.95 ms / 191 tokens ( 20.13 ms per token, 49.69 tokens per second)
  870. total time = 3893.33 ms / 398 tokens
  871. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  872. slot update_slots: id 46 | task 4384 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  873. srv params_from_: Chat format: Content-only
  874. slot launch_slot_: id 47 | task 4438 | processing task
  875. slot update_slots: id 47 | task 4438 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  876. slot update_slots: id 47 | task 4438 | kv cache rm [0, end)
  877. slot update_slots: id 47 | task 4438 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  878. slot update_slots: id 47 | task 4438 | prompt done, n_past = 207, n_tokens = 210
  879. slot update_slots: id 44 | task 4289 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  880. slot update_slots: id 47 | task 4438 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  881. srv params_from_: Chat format: Content-only
  882. slot launch_slot_: id 48 | task 4490 | processing task
  883. slot update_slots: id 48 | task 4490 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  884. slot update_slots: id 48 | task 4490 | kv cache rm [0, end)
  885. slot update_slots: id 48 | task 4490 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  886. slot update_slots: id 48 | task 4490 | prompt done, n_past = 207, n_tokens = 211
  887. slot release: id 44 | task 4289 | stop processing: n_past = 169, truncated = 1
  888. slot print_timing: id 44 | task 4289 |
  889. prompt eval time = 49.38 ms / 207 tokens ( 0.24 ms per token, 4192.07 tokens per second)
  890. eval time = 4477.58 ms / 217 tokens ( 20.63 ms per token, 48.46 tokens per second)
  891. total time = 4526.95 ms / 424 tokens
  892. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  893. slot update_slots: id 45 | task 4341 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  894. srv params_from_: Chat format: Content-only
  895. slot launch_slot_: id 49 | task 4537 | processing task
  896. slot update_slots: id 49 | task 4537 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  897. slot update_slots: id 49 | task 4537 | kv cache rm [0, end)
  898. slot update_slots: id 49 | task 4537 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  899. slot update_slots: id 49 | task 4537 | prompt done, n_past = 207, n_tokens = 211
  900. slot update_slots: id 48 | task 4490 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  901. slot release: id 45 | task 4341 | stop processing: n_past = 156, truncated = 1
  902. slot print_timing: id 45 | task 4341 |
  903. prompt eval time = 47.99 ms / 207 tokens ( 0.23 ms per token, 4313.22 tokens per second)
  904. eval time = 3997.14 ms / 204 tokens ( 19.59 ms per token, 51.04 tokens per second)
  905. total time = 4045.13 ms / 411 tokens
  906. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  907. slot update_slots: id 46 | task 4384 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  908. slot release: id 46 | task 4384 | stop processing: n_past = 135, truncated = 1
  909. slot print_timing: id 46 | task 4384 |
  910. prompt eval time = 48.47 ms / 207 tokens ( 0.23 ms per token, 4270.59 tokens per second)
  911. eval time = 3596.67 ms / 183 tokens ( 19.65 ms per token, 50.88 tokens per second)
  912. total time = 3645.14 ms / 390 tokens
  913. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  914. slot update_slots: id 49 | task 4537 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  915. srv params_from_: Chat format: Content-only
  916. slot launch_slot_: id 50 | task 4591 | processing task
  917. slot update_slots: id 50 | task 4591 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  918. slot update_slots: id 50 | task 4591 | kv cache rm [0, end)
  919. slot update_slots: id 50 | task 4591 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  920. slot update_slots: id 50 | task 4591 | prompt done, n_past = 207, n_tokens = 210
  921. slot update_slots: id 47 | task 4438 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  922. slot release: id 47 | task 4438 | stop processing: n_past = 147, truncated = 1
  923. slot print_timing: id 47 | task 4438 |
  924. prompt eval time = 48.46 ms / 207 tokens ( 0.23 ms per token, 4271.30 tokens per second)
  925. eval time = 3780.99 ms / 195 tokens ( 19.39 ms per token, 51.57 tokens per second)
  926. total time = 3829.45 ms / 402 tokens
  927. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  928. slot update_slots: id 50 | task 4591 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  929. srv params_from_: Chat format: Content-only
  930. slot launch_slot_: id 51 | task 4646 | processing task
  931. slot update_slots: id 51 | task 4646 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  932. slot update_slots: id 51 | task 4646 | kv cache rm [0, end)
  933. slot update_slots: id 51 | task 4646 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  934. slot update_slots: id 51 | task 4646 | prompt done, n_past = 207, n_tokens = 210
  935. slot update_slots: id 48 | task 4490 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  936. slot release: id 48 | task 4490 | stop processing: n_past = 149, truncated = 1
  937. slot print_timing: id 48 | task 4490 |
  938. prompt eval time = 48.46 ms / 207 tokens ( 0.23 ms per token, 4271.92 tokens per second)
  939. eval time = 3662.46 ms / 197 tokens ( 18.59 ms per token, 53.79 tokens per second)
  940. total time = 3710.91 ms / 404 tokens
  941. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  942. slot update_slots: id 51 | task 4646 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  943. srv params_from_: Chat format: Content-only
  944. slot launch_slot_: id 52 | task 4700 | processing task
  945. slot update_slots: id 52 | task 4700 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  946. slot update_slots: id 52 | task 4700 | kv cache rm [0, end)
  947. slot update_slots: id 52 | task 4700 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  948. slot update_slots: id 52 | task 4700 | prompt done, n_past = 207, n_tokens = 210
  949. slot update_slots: id 49 | task 4537 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  950. slot release: id 49 | task 4537 | stop processing: n_past = 143, truncated = 1
  951. slot print_timing: id 49 | task 4537 |
  952. prompt eval time = 49.42 ms / 207 tokens ( 0.24 ms per token, 4188.50 tokens per second)
  953. eval time = 3549.68 ms / 191 tokens ( 18.58 ms per token, 53.81 tokens per second)
  954. total time = 3599.11 ms / 398 tokens
  955. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  956. slot update_slots: id 52 | task 4700 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  957. srv params_from_: Chat format: Content-only
  958. slot launch_slot_: id 53 | task 4754 | processing task
  959. slot update_slots: id 53 | task 4754 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  960. slot update_slots: id 53 | task 4754 | kv cache rm [0, end)
  961. slot update_slots: id 53 | task 4754 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  962. slot update_slots: id 53 | task 4754 | prompt done, n_past = 207, n_tokens = 210
  963. slot update_slots: id 50 | task 4591 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  964. srv params_from_: Chat format: Content-only
  965. slot launch_slot_: id 54 | task 4797 | processing task
  966. slot update_slots: id 54 | task 4797 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  967. slot update_slots: id 54 | task 4797 | kv cache rm [0, end)
  968. slot update_slots: id 54 | task 4797 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  969. slot update_slots: id 54 | task 4797 | prompt done, n_past = 207, n_tokens = 211
  970. slot release: id 50 | task 4591 | stop processing: n_past = 158, truncated = 1
  971. slot print_timing: id 50 | task 4591 |
  972. prompt eval time = 49.73 ms / 207 tokens ( 0.24 ms per token, 4162.06 tokens per second)
  973. eval time = 4055.47 ms / 206 tokens ( 19.69 ms per token, 50.80 tokens per second)
  974. total time = 4105.20 ms / 413 tokens
  975. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  976. slot update_slots: id 53 | task 4754 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  977. slot update_slots: id 51 | task 4646 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  978. slot release: id 51 | task 4646 | stop processing: n_past = 149, truncated = 1
  979. slot print_timing: id 51 | task 4646 |
  980. prompt eval time = 49.98 ms / 207 tokens ( 0.24 ms per token, 4141.41 tokens per second)
  981. eval time = 3863.28 ms / 197 tokens ( 19.61 ms per token, 50.99 tokens per second)
  982. total time = 3913.27 ms / 404 tokens
  983. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  984. slot update_slots: id 54 | task 4797 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  985. srv params_from_: Chat format: Content-only
  986. slot launch_slot_: id 55 | task 4851 | processing task
  987. slot update_slots: id 55 | task 4851 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  988. slot update_slots: id 55 | task 4851 | kv cache rm [0, end)
  989. slot update_slots: id 55 | task 4851 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  990. slot update_slots: id 55 | task 4851 | prompt done, n_past = 207, n_tokens = 210
  991. slot update_slots: id 52 | task 4700 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  992. slot update_slots: id 55 | task 4851 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  993. srv params_from_: Chat format: Content-only
  994. slot launch_slot_: id 56 | task 4903 | processing task
  995. slot update_slots: id 56 | task 4903 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  996. slot update_slots: id 56 | task 4903 | kv cache rm [0, end)
  997. slot update_slots: id 56 | task 4903 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  998. slot update_slots: id 56 | task 4903 | prompt done, n_past = 207, n_tokens = 211
  999. slot release: id 52 | task 4700 | stop processing: n_past = 153, truncated = 1
  1000. slot print_timing: id 52 | task 4700 |
  1001. prompt eval time = 49.42 ms / 207 tokens ( 0.24 ms per token, 4188.67 tokens per second)
  1002. eval time = 4132.56 ms / 201 tokens ( 20.56 ms per token, 48.64 tokens per second)
  1003. total time = 4181.97 ms / 408 tokens
  1004. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1005. slot update_slots: id 53 | task 4754 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1006. srv params_from_: Chat format: Content-only
  1007. slot launch_slot_: id 57 | task 4950 | processing task
  1008. slot update_slots: id 57 | task 4950 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1009. slot update_slots: id 57 | task 4950 | kv cache rm [0, end)
  1010. slot update_slots: id 57 | task 4950 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1011. slot update_slots: id 57 | task 4950 | prompt done, n_past = 207, n_tokens = 211
  1012. slot update_slots: id 56 | task 4903 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1013. slot release: id 53 | task 4754 | stop processing: n_past = 152, truncated = 1
  1014. slot print_timing: id 53 | task 4754 |
  1015. prompt eval time = 50.06 ms / 207 tokens ( 0.24 ms per token, 4135.12 tokens per second)
  1016. eval time = 4135.98 ms / 200 tokens ( 20.68 ms per token, 48.36 tokens per second)
  1017. total time = 4186.04 ms / 407 tokens
  1018. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1019. slot update_slots: id 54 | task 4797 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1020. slot update_slots: id 57 | task 4950 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1021. slot release: id 54 | task 4797 | stop processing: n_past = 153, truncated = 1
  1022. slot print_timing: id 54 | task 4797 |
  1023. prompt eval time = 51.09 ms / 207 tokens ( 0.25 ms per token, 4051.36 tokens per second)
  1024. eval time = 3937.85 ms / 201 tokens ( 19.59 ms per token, 51.04 tokens per second)
  1025. total time = 3988.95 ms / 408 tokens
  1026. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1027. srv params_from_: Chat format: Content-only
  1028. slot launch_slot_: id 58 | task 5003 | processing task
  1029. slot update_slots: id 58 | task 5003 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1030. slot update_slots: id 58 | task 5003 | kv cache rm [0, end)
  1031. slot update_slots: id 58 | task 5003 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  1032. slot update_slots: id 58 | task 5003 | prompt done, n_past = 207, n_tokens = 210
  1033. slot update_slots: id 55 | task 4851 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1034. slot update_slots: id 58 | task 5003 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1035. srv params_from_: Chat format: Content-only
  1036. slot launch_slot_: id 59 | task 5055 | processing task
  1037. slot update_slots: id 59 | task 5055 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1038. slot update_slots: id 59 | task 5055 | kv cache rm [0, end)
  1039. slot update_slots: id 59 | task 5055 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1040. slot update_slots: id 59 | task 5055 | prompt done, n_past = 207, n_tokens = 211
  1041. slot release: id 55 | task 4851 | stop processing: n_past = 160, truncated = 1
  1042. slot print_timing: id 55 | task 4851 |
  1043. prompt eval time = 50.16 ms / 207 tokens ( 0.24 ms per token, 4126.55 tokens per second)
  1044. eval time = 4295.02 ms / 208 tokens ( 20.65 ms per token, 48.43 tokens per second)
  1045. total time = 4345.18 ms / 415 tokens
  1046. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1047. slot update_slots: id 56 | task 4903 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1048. srv params_from_: Chat format: Content-only
  1049. slot launch_slot_: id 60 | task 5100 | processing task
  1050. slot update_slots: id 60 | task 5100 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1051. slot update_slots: id 60 | task 5100 | kv cache rm [0, end)
  1052. slot update_slots: id 60 | task 5100 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1053. slot update_slots: id 60 | task 5100 | prompt done, n_past = 207, n_tokens = 211
  1054. slot update_slots: id 59 | task 5055 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1055. slot release: id 56 | task 4903 | stop processing: n_past = 165, truncated = 1
  1056. slot print_timing: id 56 | task 4903 |
  1057. prompt eval time = 50.73 ms / 207 tokens ( 0.25 ms per token, 4080.35 tokens per second)
  1058. eval time = 4258.50 ms / 213 tokens ( 19.99 ms per token, 50.02 tokens per second)
  1059. total time = 4309.23 ms / 420 tokens
  1060. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1061. slot update_slots: id 57 | task 4950 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1062. slot update_slots: id 60 | task 5100 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1063. srv params_from_: Chat format: Content-only
  1064. slot launch_slot_: id 61 | task 5152 | processing task
  1065. slot update_slots: id 61 | task 5152 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1066. slot update_slots: id 61 | task 5152 | kv cache rm [0, end)
  1067. slot update_slots: id 61 | task 5152 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1068. slot update_slots: id 61 | task 5152 | prompt done, n_past = 207, n_tokens = 211
  1069. slot release: id 57 | task 4950 | stop processing: n_past = 172, truncated = 1
  1070. slot print_timing: id 57 | task 4950 |
  1071. prompt eval time = 51.16 ms / 207 tokens ( 0.25 ms per token, 4046.45 tokens per second)
  1072. eval time = 4419.22 ms / 220 tokens ( 20.09 ms per token, 49.78 tokens per second)
  1073. total time = 4470.38 ms / 427 tokens
  1074. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1075. slot update_slots: id 58 | task 5003 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1076. slot update_slots: id 61 | task 5152 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1077. srv params_from_: Chat format: Content-only
  1078. slot launch_slot_: id 62 | task 5203 | processing task
  1079. slot update_slots: id 62 | task 5203 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1080. slot update_slots: id 62 | task 5203 | kv cache rm [0, end)
  1081. slot update_slots: id 62 | task 5203 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1082. slot update_slots: id 62 | task 5203 | prompt done, n_past = 207, n_tokens = 211
  1083. slot release: id 58 | task 5003 | stop processing: n_past = 156, truncated = 1
  1084. slot print_timing: id 58 | task 5003 |
  1085. prompt eval time = 51.13 ms / 207 tokens ( 0.25 ms per token, 4048.11 tokens per second)
  1086. eval time = 4135.32 ms / 204 tokens ( 20.27 ms per token, 49.33 tokens per second)
  1087. total time = 4186.46 ms / 411 tokens
  1088. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1089. slot update_slots: id 59 | task 5055 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1090. slot update_slots: id 62 | task 5203 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1091. srv params_from_: Chat format: Content-only
  1092. slot launch_slot_: id 63 | task 5254 | processing task
  1093. slot update_slots: id 63 | task 5254 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1094. slot update_slots: id 63 | task 5254 | kv cache rm [0, end)
  1095. slot update_slots: id 63 | task 5254 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1096. slot update_slots: id 63 | task 5254 | prompt done, n_past = 207, n_tokens = 211
  1097. slot release: id 59 | task 5055 | stop processing: n_past = 148, truncated = 1
  1098. slot print_timing: id 59 | task 5055 |
  1099. prompt eval time = 50.93 ms / 207 tokens ( 0.25 ms per token, 4064.48 tokens per second)
  1100. eval time = 4040.08 ms / 196 tokens ( 20.61 ms per token, 48.51 tokens per second)
  1101. total time = 4091.01 ms / 403 tokens
  1102. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1103. slot update_slots: id 60 | task 5100 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1104. srv params_from_: Chat format: Content-only
  1105. slot launch_slot_: id 0 | task 5296 | processing task
  1106. slot update_slots: id 0 | task 5296 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1107. slot update_slots: id 0 | task 5296 | kv cache rm [0, end)
  1108. slot update_slots: id 0 | task 5296 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1109. slot update_slots: id 0 | task 5296 | prompt done, n_past = 207, n_tokens = 211
  1110. slot update_slots: id 63 | task 5254 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1111. slot release: id 60 | task 5100 | stop processing: n_past = 162, truncated = 1
  1112. slot print_timing: id 60 | task 5100 |
  1113. prompt eval time = 51.10 ms / 207 tokens ( 0.25 ms per token, 4050.56 tokens per second)
  1114. eval time = 4320.03 ms / 210 tokens ( 20.57 ms per token, 48.61 tokens per second)
  1115. total time = 4371.14 ms / 417 tokens
  1116. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1117. slot update_slots: id 61 | task 5152 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1118. slot update_slots: id 0 | task 5296 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1119. srv params_from_: Chat format: Content-only
  1120. slot launch_slot_: id 1 | task 5348 | processing task
  1121. slot update_slots: id 1 | task 5348 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1122. slot update_slots: id 1 | task 5348 | kv cache rm [0, end)
  1123. slot update_slots: id 1 | task 5348 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1124. slot update_slots: id 1 | task 5348 | prompt done, n_past = 207, n_tokens = 211
  1125. slot release: id 61 | task 5152 | stop processing: n_past = 155, truncated = 1
  1126. slot print_timing: id 61 | task 5152 |
  1127. prompt eval time = 52.04 ms / 207 tokens ( 0.25 ms per token, 3977.40 tokens per second)
  1128. eval time = 4389.28 ms / 203 tokens ( 21.62 ms per token, 46.25 tokens per second)
  1129. total time = 4441.32 ms / 410 tokens
  1130. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1131. slot update_slots: id 62 | task 5203 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1132. srv params_from_: Chat format: Content-only
  1133. slot launch_slot_: id 2 | task 5389 | processing task
  1134. slot update_slots: id 2 | task 5389 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1135. slot update_slots: id 2 | task 5389 | kv cache rm [0, end)
  1136. slot update_slots: id 2 | task 5389 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1137. slot update_slots: id 2 | task 5389 | prompt done, n_past = 207, n_tokens = 211
  1138. slot release: id 62 | task 5203 | stop processing: n_past = 143, truncated = 1
  1139. slot print_timing: id 62 | task 5203 |
  1140. prompt eval time = 51.37 ms / 207 tokens ( 0.25 ms per token, 4029.98 tokens per second)
  1141. eval time = 4154.91 ms / 191 tokens ( 21.75 ms per token, 45.97 tokens per second)
  1142. total time = 4206.27 ms / 398 tokens
  1143. slot update_slots: id 1 | task 5348 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1144. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1145. slot update_slots: id 63 | task 5254 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1146. slot release: id 63 | task 5254 | stop processing: n_past = 132, truncated = 1
  1147. slot print_timing: id 63 | task 5254 |
  1148. prompt eval time = 247.45 ms / 207 tokens ( 1.20 ms per token, 836.54 tokens per second)
  1149. eval time = 3698.53 ms / 180 tokens ( 20.55 ms per token, 48.67 tokens per second)
  1150. total time = 3945.98 ms / 387 tokens
  1151. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1152. slot update_slots: id 2 | task 5389 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1153. srv params_from_: Chat format: Content-only
  1154. slot launch_slot_: id 3 | task 5441 | processing task
  1155. slot update_slots: id 3 | task 5441 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1156. slot update_slots: id 3 | task 5441 | kv cache rm [0, end)
  1157. slot update_slots: id 3 | task 5441 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  1158. slot update_slots: id 3 | task 5441 | prompt done, n_past = 207, n_tokens = 210
  1159. slot update_slots: id 0 | task 5296 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1160. srv params_from_: Chat format: Content-only
  1161. slot launch_slot_: id 4 | task 5484 | processing task
  1162. slot update_slots: id 4 | task 5484 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1163. slot update_slots: id 4 | task 5484 | kv cache rm [0, end)
  1164. slot update_slots: id 4 | task 5484 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1165. slot update_slots: id 4 | task 5484 | prompt done, n_past = 207, n_tokens = 211
  1166. slot update_slots: id 3 | task 5441 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1167. slot release: id 0 | task 5296 | stop processing: n_past = 165, truncated = 1
  1168. slot print_timing: id 0 | task 5296 |
  1169. prompt eval time = 50.73 ms / 207 tokens ( 0.25 ms per token, 4080.02 tokens per second)
  1170. eval time = 4561.13 ms / 213 tokens ( 21.41 ms per token, 46.70 tokens per second)
  1171. total time = 4611.87 ms / 420 tokens
  1172. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1173. slot update_slots: id 1 | task 5348 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1174. slot update_slots: id 4 | task 5484 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1175. srv params_from_: Chat format: Content-only
  1176. slot launch_slot_: id 5 | task 5535 | processing task
  1177. slot update_slots: id 5 | task 5535 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1178. slot update_slots: id 5 | task 5535 | kv cache rm [0, end)
  1179. slot update_slots: id 5 | task 5535 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1180. slot update_slots: id 5 | task 5535 | prompt done, n_past = 207, n_tokens = 211
  1181. slot release: id 1 | task 5348 | stop processing: n_past = 153, truncated = 1
  1182. slot print_timing: id 1 | task 5348 |
  1183. prompt eval time = 51.11 ms / 207 tokens ( 0.25 ms per token, 4050.25 tokens per second)
  1184. eval time = 4508.44 ms / 201 tokens ( 22.43 ms per token, 44.58 tokens per second)
  1185. total time = 4559.55 ms / 408 tokens
  1186. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1187. slot update_slots: id 2 | task 5389 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1188. slot release: id 2 | task 5389 | stop processing: n_past = 132, truncated = 1
  1189. slot print_timing: id 2 | task 5389 |
  1190. prompt eval time = 52.52 ms / 207 tokens ( 0.25 ms per token, 3940.98 tokens per second)
  1191. eval time = 3860.96 ms / 180 tokens ( 21.45 ms per token, 46.62 tokens per second)
  1192. total time = 3913.49 ms / 387 tokens
  1193. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1194. srv params_from_: Chat format: Content-only
  1195. slot launch_slot_: id 6 | task 5578 | processing task
  1196. slot update_slots: id 6 | task 5578 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1197. slot update_slots: id 6 | task 5578 | kv cache rm [0, end)
  1198. slot update_slots: id 6 | task 5578 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  1199. slot update_slots: id 6 | task 5578 | prompt done, n_past = 207, n_tokens = 210
  1200. slot update_slots: id 5 | task 5535 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1201. slot update_slots: id 3 | task 5441 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1202. slot release: id 3 | task 5441 | stop processing: n_past = 132, truncated = 1
  1203. slot print_timing: id 3 | task 5441 |
  1204. prompt eval time = 52.12 ms / 207 tokens ( 0.25 ms per token, 3971.45 tokens per second)
  1205. eval time = 3842.86 ms / 180 tokens ( 21.35 ms per token, 46.84 tokens per second)
  1206. total time = 3894.98 ms / 387 tokens
  1207. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1208. slot update_slots: id 6 | task 5578 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1209. srv params_from_: Chat format: Content-only
  1210. slot launch_slot_: id 7 | task 5630 | processing task
  1211. slot update_slots: id 7 | task 5630 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1212. slot update_slots: id 7 | task 5630 | kv cache rm [0, end)
  1213. slot update_slots: id 7 | task 5630 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  1214. slot update_slots: id 7 | task 5630 | prompt done, n_past = 207, n_tokens = 210
  1215. slot update_slots: id 4 | task 5484 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1216. slot release: id 4 | task 5484 | stop processing: n_past = 140, truncated = 1
  1217. slot print_timing: id 4 | task 5484 |
  1218. prompt eval time = 51.50 ms / 207 tokens ( 0.25 ms per token, 4019.34 tokens per second)
  1219. eval time = 3820.95 ms / 188 tokens ( 20.32 ms per token, 49.20 tokens per second)
  1220. total time = 3872.45 ms / 395 tokens
  1221. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1222. slot update_slots: id 7 | task 5630 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1223. srv params_from_: Chat format: Content-only
  1224. slot launch_slot_: id 8 | task 5683 | processing task
  1225. slot update_slots: id 8 | task 5683 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1226. slot update_slots: id 8 | task 5683 | kv cache rm [0, end)
  1227. slot update_slots: id 8 | task 5683 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  1228. slot update_slots: id 8 | task 5683 | prompt done, n_past = 207, n_tokens = 210
  1229. slot update_slots: id 5 | task 5535 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1230. slot update_slots: id 8 | task 5683 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1231. srv params_from_: Chat format: Content-only
  1232. slot launch_slot_: id 9 | task 5735 | processing task
  1233. slot update_slots: id 9 | task 5735 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1234. slot update_slots: id 9 | task 5735 | kv cache rm [0, end)
  1235. slot update_slots: id 9 | task 5735 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1236. slot update_slots: id 9 | task 5735 | prompt done, n_past = 207, n_tokens = 211
  1237. slot release: id 5 | task 5535 | stop processing: n_past = 156, truncated = 1
  1238. slot print_timing: id 5 | task 5535 |
  1239. prompt eval time = 52.86 ms / 207 tokens ( 0.26 ms per token, 3915.93 tokens per second)
  1240. eval time = 4383.39 ms / 204 tokens ( 21.49 ms per token, 46.54 tokens per second)
  1241. total time = 4436.25 ms / 411 tokens
  1242. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1243. slot update_slots: id 6 | task 5578 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1244. srv params_from_: Chat format: Content-only
  1245. slot launch_slot_: id 10 | task 5774 | processing task
  1246. slot update_slots: id 10 | task 5774 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1247. slot update_slots: id 10 | task 5774 | kv cache rm [0, end)
  1248. slot update_slots: id 10 | task 5774 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1249. slot update_slots: id 10 | task 5774 | prompt done, n_past = 207, n_tokens = 211
  1250. slot update_slots: id 9 | task 5735 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1251. slot release: id 6 | task 5578 | stop processing: n_past = 172, truncated = 1
  1252. slot print_timing: id 6 | task 5578 |
  1253. prompt eval time = 52.64 ms / 207 tokens ( 0.25 ms per token, 3932.67 tokens per second)
  1254. eval time = 4530.05 ms / 220 tokens ( 20.59 ms per token, 48.56 tokens per second)
  1255. total time = 4582.69 ms / 427 tokens
  1256. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1257. slot update_slots: id 7 | task 5630 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1258. slot update_slots: id 10 | task 5774 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1259. srv params_from_: Chat format: Content-only
  1260. slot launch_slot_: id 11 | task 5825 | processing task
  1261. slot update_slots: id 11 | task 5825 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1262. slot update_slots: id 11 | task 5825 | kv cache rm [0, end)
  1263. slot update_slots: id 11 | task 5825 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1264. slot update_slots: id 11 | task 5825 | prompt done, n_past = 207, n_tokens = 211
  1265. slot release: id 7 | task 5630 | stop processing: n_past = 151, truncated = 1
  1266. slot print_timing: id 7 | task 5630 |
  1267. prompt eval time = 51.52 ms / 207 tokens ( 0.25 ms per token, 4017.62 tokens per second)
  1268. eval time = 4154.40 ms / 199 tokens ( 20.88 ms per token, 47.90 tokens per second)
  1269. total time = 4205.93 ms / 406 tokens
  1270. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1271. slot update_slots: id 8 | task 5683 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1272. slot update_slots: id 11 | task 5825 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1273. srv params_from_: Chat format: Content-only
  1274. slot launch_slot_: id 12 | task 5876 | processing task
  1275. slot update_slots: id 12 | task 5876 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1276. slot update_slots: id 12 | task 5876 | kv cache rm [0, end)
  1277. slot update_slots: id 12 | task 5876 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1278. slot update_slots: id 12 | task 5876 | prompt done, n_past = 207, n_tokens = 211
  1279. slot release: id 8 | task 5683 | stop processing: n_past = 155, truncated = 1
  1280. slot print_timing: id 8 | task 5683 |
  1281. prompt eval time = 53.90 ms / 207 tokens ( 0.26 ms per token, 3840.52 tokens per second)
  1282. eval time = 4448.20 ms / 203 tokens ( 21.91 ms per token, 45.64 tokens per second)
  1283. total time = 4502.10 ms / 410 tokens
  1284. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1285. slot update_slots: id 9 | task 5735 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1286. srv params_from_: Chat format: Content-only
  1287. slot launch_slot_: id 13 | task 5918 | processing task
  1288. slot update_slots: id 13 | task 5918 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1289. slot update_slots: id 13 | task 5918 | kv cache rm [0, end)
  1290. slot update_slots: id 13 | task 5918 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1291. slot update_slots: id 13 | task 5918 | prompt done, n_past = 207, n_tokens = 211
  1292. slot release: id 9 | task 5735 | stop processing: n_past = 139, truncated = 1
  1293. slot print_timing: id 9 | task 5735 |
  1294. prompt eval time = 51.43 ms / 207 tokens ( 0.25 ms per token, 4025.12 tokens per second)
  1295. eval time = 4147.37 ms / 187 tokens ( 22.18 ms per token, 45.09 tokens per second)
  1296. total time = 4198.80 ms / 394 tokens
  1297. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1298. slot update_slots: id 12 | task 5876 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1299. slot update_slots: id 10 | task 5774 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1300. slot release: id 10 | task 5774 | stop processing: n_past = 142, truncated = 1
  1301. slot print_timing: id 10 | task 5774 |
  1302. prompt eval time = 51.05 ms / 207 tokens ( 0.25 ms per token, 4054.53 tokens per second)
  1303. eval time = 3913.70 ms / 190 tokens ( 20.60 ms per token, 48.55 tokens per second)
  1304. total time = 3964.76 ms / 397 tokens
  1305. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1306. slot update_slots: id 13 | task 5918 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1307. srv params_from_: Chat format: Content-only
  1308. slot launch_slot_: id 14 | task 5969 | processing task
  1309. slot update_slots: id 14 | task 5969 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1310. slot update_slots: id 14 | task 5969 | kv cache rm [0, end)
  1311. slot update_slots: id 14 | task 5969 | prompt processing progress, n_past = 207, n_tokens = 210, progress = 1.000000
  1312. slot update_slots: id 14 | task 5969 | prompt done, n_past = 207, n_tokens = 210
  1313. slot update_slots: id 11 | task 5825 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1314. slot update_slots: id 14 | task 5969 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1315. srv params_from_: Chat format: Content-only
  1316. slot launch_slot_: id 15 | task 6021 | processing task
  1317. slot update_slots: id 15 | task 6021 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1318. slot update_slots: id 15 | task 6021 | kv cache rm [0, end)
  1319. slot update_slots: id 15 | task 6021 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1320. slot update_slots: id 15 | task 6021 | prompt done, n_past = 207, n_tokens = 211
  1321. slot release: id 11 | task 5825 | stop processing: n_past = 172, truncated = 1
  1322. slot print_timing: id 11 | task 5825 |
  1323. prompt eval time = 51.23 ms / 207 tokens ( 0.25 ms per token, 4040.68 tokens per second)
  1324. eval time = 4770.88 ms / 220 tokens ( 21.69 ms per token, 46.11 tokens per second)
  1325. total time = 4822.11 ms / 427 tokens
  1326. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1327. slot update_slots: id 12 | task 5876 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1328. srv params_from_: Chat format: Content-only
  1329. slot launch_slot_: id 16 | task 6059 | processing task
  1330. slot update_slots: id 16 | task 6059 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1331. slot update_slots: id 16 | task 6059 | kv cache rm [0, end)
  1332. slot update_slots: id 16 | task 6059 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1333. slot update_slots: id 16 | task 6059 | prompt done, n_past = 207, n_tokens = 211
  1334. slot release: id 12 | task 5876 | stop processing: n_past = 137, truncated = 1
  1335. slot print_timing: id 12 | task 5876 |
  1336. prompt eval time = 54.24 ms / 207 tokens ( 0.26 ms per token, 3816.02 tokens per second)
  1337. eval time = 4269.31 ms / 185 tokens ( 23.08 ms per token, 43.33 tokens per second)
  1338. total time = 4323.56 ms / 392 tokens
  1339. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1340. slot update_slots: id 15 | task 6021 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1341. slot update_slots: id 13 | task 5918 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1342. srv params_from_: Chat format: Content-only
  1343. slot launch_slot_: id 17 | task 6102 | processing task
  1344. slot update_slots: id 17 | task 6102 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1345. slot update_slots: id 17 | task 6102 | kv cache rm [0, end)
  1346. slot update_slots: id 17 | task 6102 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1347. slot update_slots: id 17 | task 6102 | prompt done, n_past = 207, n_tokens = 211
  1348. slot update_slots: id 16 | task 6059 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1349. slot release: id 13 | task 5918 | stop processing: n_past = 155, truncated = 1
  1350. slot print_timing: id 13 | task 5918 |
  1351. prompt eval time = 51.06 ms / 207 tokens ( 0.25 ms per token, 4053.82 tokens per second)
  1352. eval time = 4628.33 ms / 203 tokens ( 22.80 ms per token, 43.86 tokens per second)
  1353. total time = 4679.39 ms / 410 tokens
  1354. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1355. srv params_from_: Chat format: Content-only
  1356. slot launch_slot_: id 18 | task 6143 | processing task
  1357. slot update_slots: id 18 | task 6143 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1358. slot update_slots: id 18 | task 6143 | kv cache rm [0, end)
  1359. slot update_slots: id 18 | task 6143 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1360. slot update_slots: id 18 | task 6143 | prompt done, n_past = 207, n_tokens = 211
  1361. slot update_slots: id 14 | task 5969 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1362. slot update_slots: id 17 | task 6102 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1363. slot release: id 14 | task 5969 | stop processing: n_past = 149, truncated = 1
  1364. slot print_timing: id 14 | task 5969 |
  1365. prompt eval time = 53.08 ms / 207 tokens ( 0.26 ms per token, 3900.14 tokens per second)
  1366. eval time = 4702.05 ms / 197 tokens ( 23.87 ms per token, 41.90 tokens per second)
  1367. total time = 4755.13 ms / 404 tokens
  1368. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1369. srv params_from_: Chat format: Content-only
  1370. slot launch_slot_: id 19 | task 6185 | processing task
  1371. slot update_slots: id 19 | task 6185 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1372. slot update_slots: id 19 | task 6185 | kv cache rm [0, end)
  1373. slot update_slots: id 19 | task 6185 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1374. slot update_slots: id 19 | task 6185 | prompt done, n_past = 207, n_tokens = 211
  1375. slot update_slots: id 18 | task 6143 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1376. slot update_slots: id 15 | task 6021 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1377. srv params_from_: Chat format: Content-only
  1378. slot launch_slot_: id 20 | task 6226 | processing task
  1379. slot update_slots: id 20 | task 6226 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1380. slot update_slots: id 20 | task 6226 | kv cache rm [0, end)
  1381. slot update_slots: id 20 | task 6226 | prompt processing progress, n_past = 207, n_tokens = 212, progress = 1.000000
  1382. slot update_slots: id 20 | task 6226 | prompt done, n_past = 207, n_tokens = 212
  1383. slot release: id 15 | task 6021 | stop processing: n_past = 158, truncated = 1
  1384. slot print_timing: id 15 | task 6021 |
  1385. prompt eval time = 51.60 ms / 207 tokens ( 0.25 ms per token, 4011.24 tokens per second)
  1386. eval time = 5098.82 ms / 206 tokens ( 24.75 ms per token, 40.40 tokens per second)
  1387. total time = 5150.43 ms / 413 tokens
  1388. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1389. slot update_slots: id 19 | task 6185 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1390. slot update_slots: id 16 | task 6059 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1391. slot release: id 16 | task 6059 | stop processing: n_past = 152, truncated = 1
  1392. slot print_timing: id 16 | task 6059 |
  1393. prompt eval time = 51.38 ms / 207 tokens ( 0.25 ms per token, 4029.12 tokens per second)
  1394. eval time = 4892.63 ms / 200 tokens ( 24.46 ms per token, 40.88 tokens per second)
  1395. total time = 4944.01 ms / 407 tokens
  1396. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1397. srv params_from_: Chat format: Content-only
  1398. slot launch_slot_: id 21 | task 6267 | processing task
  1399. slot update_slots: id 21 | task 6267 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1400. slot update_slots: id 21 | task 6267 | kv cache rm [0, end)
  1401. slot update_slots: id 21 | task 6267 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1402. slot update_slots: id 21 | task 6267 | prompt done, n_past = 207, n_tokens = 211
  1403. slot update_slots: id 20 | task 6226 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1404. slot update_slots: id 17 | task 6102 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1405. slot release: id 17 | task 6102 | stop processing: n_past = 150, truncated = 1
  1406. slot print_timing: id 17 | task 6102 |
  1407. prompt eval time = 52.07 ms / 207 tokens ( 0.25 ms per token, 3975.19 tokens per second)
  1408. eval time = 4883.65 ms / 198 tokens ( 24.66 ms per token, 40.54 tokens per second)
  1409. total time = 4935.73 ms / 405 tokens
  1410. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1411. srv params_from_: Chat format: Content-only
  1412. slot launch_slot_: id 22 | task 6309 | processing task
  1413. slot update_slots: id 22 | task 6309 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1414. slot update_slots: id 22 | task 6309 | kv cache rm [0, end)
  1415. slot update_slots: id 22 | task 6309 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1416. slot update_slots: id 22 | task 6309 | prompt done, n_past = 207, n_tokens = 211
  1417. slot update_slots: id 21 | task 6267 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1418. slot update_slots: id 18 | task 6143 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1419. srv params_from_: Chat format: Content-only
  1420. slot launch_slot_: id 23 | task 6350 | processing task
  1421. slot update_slots: id 23 | task 6350 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1422. slot update_slots: id 23 | task 6350 | kv cache rm [0, end)
  1423. slot update_slots: id 23 | task 6350 | prompt processing progress, n_past = 207, n_tokens = 212, progress = 1.000000
  1424. slot update_slots: id 23 | task 6350 | prompt done, n_past = 207, n_tokens = 212
  1425. slot release: id 18 | task 6143 | stop processing: n_past = 158, truncated = 1
  1426. slot print_timing: id 18 | task 6143 |
  1427. prompt eval time = 52.10 ms / 207 tokens ( 0.25 ms per token, 3972.90 tokens per second)
  1428. eval time = 5059.27 ms / 206 tokens ( 24.56 ms per token, 40.72 tokens per second)
  1429. total time = 5111.37 ms / 413 tokens
  1430. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1431. slot update_slots: id 22 | task 6309 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1432. slot update_slots: id 19 | task 6185 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1433. srv params_from_: Chat format: Content-only
  1434. slot launch_slot_: id 24 | task 6390 | processing task
  1435. slot update_slots: id 24 | task 6390 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1436. slot update_slots: id 24 | task 6390 | kv cache rm [0, end)
  1437. slot update_slots: id 24 | task 6390 | prompt processing progress, n_past = 207, n_tokens = 212, progress = 1.000000
  1438. slot update_slots: id 24 | task 6390 | prompt done, n_past = 207, n_tokens = 212
  1439. slot release: id 19 | task 6185 | stop processing: n_past = 154, truncated = 1
  1440. slot print_timing: id 19 | task 6185 |
  1441. prompt eval time = 51.67 ms / 207 tokens ( 0.25 ms per token, 4006.04 tokens per second)
  1442. eval time = 5024.85 ms / 202 tokens ( 24.88 ms per token, 40.20 tokens per second)
  1443. total time = 5076.53 ms / 409 tokens
  1444. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1445. slot update_slots: id 23 | task 6350 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1446. slot update_slots: id 20 | task 6226 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1447. slot release: id 20 | task 6226 | stop processing: n_past = 146, truncated = 1
  1448. slot print_timing: id 20 | task 6226 |
  1449. prompt eval time = 52.53 ms / 207 tokens ( 0.25 ms per token, 3940.68 tokens per second)
  1450. eval time = 4832.27 ms / 194 tokens ( 24.91 ms per token, 40.15 tokens per second)
  1451. total time = 4884.80 ms / 401 tokens
  1452. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1453. srv params_from_: Chat format: Content-only
  1454. slot launch_slot_: id 25 | task 6432 | processing task
  1455. slot update_slots: id 25 | task 6432 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1456. slot update_slots: id 25 | task 6432 | kv cache rm [0, end)
  1457. slot update_slots: id 25 | task 6432 | prompt processing progress, n_past = 207, n_tokens = 211, progress = 1.000000
  1458. slot update_slots: id 25 | task 6432 | prompt done, n_past = 207, n_tokens = 211
  1459. slot update_slots: id 24 | task 6390 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1460. slot update_slots: id 21 | task 6267 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1461. srv params_from_: Chat format: Content-only
  1462. slot launch_slot_: id 26 | task 6473 | processing task
  1463. slot update_slots: id 26 | task 6473 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1464. slot update_slots: id 26 | task 6473 | kv cache rm [0, end)
  1465. slot update_slots: id 26 | task 6473 | prompt processing progress, n_past = 207, n_tokens = 212, progress = 1.000000
  1466. slot update_slots: id 26 | task 6473 | prompt done, n_past = 207, n_tokens = 212
  1467. slot release: id 21 | task 6267 | stop processing: n_past = 157, truncated = 1
  1468. slot print_timing: id 21 | task 6267 |
  1469. prompt eval time = 52.66 ms / 207 tokens ( 0.25 ms per token, 3930.88 tokens per second)
  1470. eval time = 5056.33 ms / 205 tokens ( 24.67 ms per token, 40.54 tokens per second)
  1471. total time = 5108.99 ms / 412 tokens
  1472. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1473. slot update_slots: id 25 | task 6432 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1474. slot update_slots: id 22 | task 6309 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1475. srv params_from_: Chat format: Content-only
  1476. slot launch_slot_: id 27 | task 6514 | processing task
  1477. slot update_slots: id 27 | task 6514 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1478. slot update_slots: id 27 | task 6514 | kv cache rm [0, end)
  1479. slot update_slots: id 27 | task 6514 | prompt processing progress, n_past = 207, n_tokens = 212, progress = 1.000000
  1480. slot update_slots: id 27 | task 6514 | prompt done, n_past = 207, n_tokens = 212
  1481. slot update_slots: id 26 | task 6473 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1482. slot release: id 22 | task 6309 | stop processing: n_past = 168, truncated = 1
  1483. slot print_timing: id 22 | task 6309 |
  1484. prompt eval time = 51.56 ms / 207 tokens ( 0.25 ms per token, 4015.05 tokens per second)
  1485. eval time = 5314.41 ms / 216 tokens ( 24.60 ms per token, 40.64 tokens per second)
  1486. total time = 5365.96 ms / 423 tokens
  1487. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1488. slot update_slots: id 23 | task 6350 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1489. srv params_from_: Chat format: Content-only
  1490. slot launch_slot_: id 28 | task 6553 | processing task
  1491. slot update_slots: id 28 | task 6553 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1492. slot update_slots: id 28 | task 6553 | kv cache rm [0, end)
  1493. slot update_slots: id 28 | task 6553 | prompt processing progress, n_past = 207, n_tokens = 212, progress = 1.000000
  1494. slot update_slots: id 28 | task 6553 | prompt done, n_past = 207, n_tokens = 212
  1495. slot release: id 23 | task 6350 | stop processing: n_past = 152, truncated = 1
  1496. slot print_timing: id 23 | task 6350 |
  1497. prompt eval time = 51.83 ms / 207 tokens ( 0.25 ms per token, 3993.90 tokens per second)
  1498. eval time = 5005.01 ms / 200 tokens ( 25.03 ms per token, 39.96 tokens per second)
  1499. total time = 5056.84 ms / 407 tokens
  1500. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1501. slot update_slots: id 27 | task 6514 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1502. slot update_slots: id 24 | task 6390 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1503. slot release: id 24 | task 6390 | stop processing: n_past = 132, truncated = 1
  1504. slot print_timing: id 24 | task 6390 |
  1505. prompt eval time = 54.62 ms / 207 tokens ( 0.26 ms per token, 3789.96 tokens per second)
  1506. eval time = 4561.86 ms / 180 tokens ( 25.34 ms per token, 39.46 tokens per second)
  1507. total time = 4616.47 ms / 387 tokens
  1508. srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
  1509. srv params_from_: Chat format: Content-only
  1510. slot launch_slot_: id 29 | task 6595 | processing task
  1511. slot update_slots: id 29 | task 6595 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1512. slot update_slots: id 29 | task 6595 | kv cache rm [0, end)
  1513. slot update_slots: id 29 | task 6595 | prompt processing progress, n_
  1514. ...
  1515. task 17150 | processing task
  1516. slot update_slots: id 22 | task 17150 | new prompt, n_ctx_slot = 256, n_keep = 0, n_prompt_tokens = 207
  1517. slot update_slots: id 22 | task 17150 | kv cache rm [0, end)
  1518. slot update_slots: id 22 | task 17150 | prompt processing progress, n_past = 207, n_tokens = 270, progress = 1.000000
  1519. slot update_slots: id 22 | task 17150 | prompt done, n_past = 207, n_tokens = 270
  1520. slot update_slots: id 19 | task 18377 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1521. slot release: id 2 | task 18349 | stop processing: n_past = 147, truncated = 1
  1522. slot print_timing: id 2 | task 18349 |
  1523. prompt eval time = 100.38 ms / 207 tokens ( 0.48 ms per token, 2062.25 tokens per second)
  1524. eval time = 21899.89 ms / 195 tokens ( 112.31 ms per token, 8.90 tokens per second)
  1525. total time = 22000.26 ms / 402 tokens
  1526. slot release: id 56 | task 18352 | stop processing: n_past = 147, truncated = 1
  1527. slot print_timing: id 56 | task 18352 |
  1528. prompt eval time = 104.38 ms / 207 tokens ( 0.50 ms per token, 1983.21 tokens per second)
  1529. eval time = 21900.07 ms / 195 tokens ( 112.31 ms per token, 8.90 tokens per second)
  1530. total time = 22004.45 ms / 402 tokens
  1531. slot release: id 58 | task 18347 | stop processing: n_past = 156, truncated = 1
  1532. slot print_timing: id 58 | task 18347 |
  1533. prompt eval time = 72.15 ms / 207 tokens ( 0.35 ms per token, 2869.10 tokens per second)
  1534. eval time = 23572.45 ms / 204 tokens ( 115.55 ms per token, 8.65 tokens per second)
  1535. total time = 23644.60 ms / 411 tokens
  1536. slot update_slots: id 39 | task 18378 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1537. slot release: id 19 | task 18377 | stop processing: n_past = 132, truncated = 1
  1538. slot print_timing: id 19 | task 18377 |
  1539. prompt eval time = 69.09 ms / 207 tokens ( 0.33 ms per token, 2995.96 tokens per second)
  1540. eval time = 20104.56 ms / 180 tokens ( 111.69 ms per token, 8.95 tokens per second)
  1541. total time = 20173.66 ms / 387 tokens
  1542. slot update_slots: id 26 | task 18380 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1543. slot update_slots: id 47 | task 18381 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1544. slot release: id 12 | task 18345 | stop processing: n_past = 157, truncated = 1
  1545. slot print_timing: id 12 | task 18345 |
  1546. prompt eval time = 239.03 ms / 207 tokens ( 1.15 ms per token, 866.01 tokens per second)
  1547. eval time = 22917.47 ms / 205 tokens ( 111.79 ms per token, 8.95 tokens per second)
  1548. total time = 23156.49 ms / 412 tokens
  1549. slot update_slots: id 28 | task 18382 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1550. slot update_slots: id 6 | task 18384 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1551. slot update_slots: id 24 | task 18386 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1552. slot update_slots: id 0 | task 18387 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1553. slot update_slots: id 27 | task 18388 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1554. slot release: id 63 | task 18350 | stop processing: n_past = 159, truncated = 1
  1555. slot print_timing: id 63 | task 18350 |
  1556. prompt eval time = 332.26 ms / 207 tokens ( 1.61 ms per token, 623.01 tokens per second)
  1557. eval time = 23043.08 ms / 207 tokens ( 111.32 ms per token, 8.98 tokens per second)
  1558. total time = 23375.34 ms / 414 tokens
  1559. slot update_slots: id 15 | task 18394 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1560. slot update_slots: id 33 | task 18395 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1561. slot release: id 16 | task 18423 | stop processing: n_past = 201, truncated = 1
  1562. slot print_timing: id 16 | task 18423 |
  1563. prompt eval time = 68.05 ms / 207 tokens ( 0.33 ms per token, 3042.10 tokens per second)
  1564. eval time = 11507.20 ms / 122 tokens ( 94.32 ms per token, 10.60 tokens per second)
  1565. total time = 11575.24 ms / 329 tokens
  1566. slot release: id 23 | task 18357 | stop processing: n_past = 154, truncated = 1
  1567. slot print_timing: id 23 | task 18357 |
  1568. prompt eval time = 69.08 ms / 207 tokens ( 0.33 ms per token, 2996.53 tokens per second)
  1569. eval time = 22361.60 ms / 202 tokens ( 110.70 ms per token, 9.03 tokens per second)
  1570. total time = 22430.68 ms / 409 tokens
  1571. slot release: id 57 | task 18346 | stop processing: n_past = 161, truncated = 1
  1572. slot print_timing: id 57 | task 18346 |
  1573. prompt eval time = 100.42 ms / 207 tokens ( 0.49 ms per token, 2061.34 tokens per second)
  1574. eval time = 23587.38 ms / 209 tokens ( 112.86 ms per token, 8.86 tokens per second)
  1575. total time = 23687.80 ms / 416 tokens
  1576. slot update_slots: id 1 | task 16657 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1577. slot update_slots: id 31 | task 18396 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1578. slot release: id 26 | task 18380 | stop processing: n_past = 139, truncated = 1
  1579. slot print_timing: id 26 | task 18380 |
  1580. prompt eval time = 272.44 ms / 207 tokens ( 1.32 ms per token, 759.81 tokens per second)
  1581. eval time = 20368.85 ms / 187 tokens ( 108.92 ms per token, 9.18 tokens per second)
  1582. total time = 20641.28 ms / 394 tokens
  1583. slot release: id 61 | task 18351 | stop processing: n_past = 163, truncated = 1
  1584. slot print_timing: id 61 | task 18351 |
  1585. prompt eval time = 250.36 ms / 207 tokens ( 1.21 ms per token, 826.80 tokens per second)
  1586. eval time = 23318.32 ms / 211 tokens ( 110.51 ms per token, 9.05 tokens per second)
  1587. total time = 23568.68 ms / 418 tokens
  1588. slot release: id 55 | task 18364 | stop processing: n_past = 158, truncated = 1
  1589. slot print_timing: id 55 | task 18364 |
  1590. prompt eval time = 68.06 ms / 207 tokens ( 0.33 ms per token, 3041.61 tokens per second)
  1591. eval time = 22641.48 ms / 206 tokens ( 109.91 ms per token, 9.10 tokens per second)
  1592. total time = 22709.54 ms / 413 tokens
  1593. slot release: id 62 | task 18372 | stop processing: n_past = 156, truncated = 1
  1594. slot print_timing: id 62 | task 18372 |
  1595. prompt eval time = 72.45 ms / 207 tokens ( 0.35 ms per token, 2857.10 tokens per second)
  1596. eval time = 22298.42 ms / 204 tokens ( 109.31 ms per token, 9.15 tokens per second)
  1597. total time = 22370.87 ms / 411 tokens
  1598. slot update_slots: id 49 | task 18473 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1599. slot release: id 28 | task 18382 | stop processing: n_past = 141, truncated = 1
  1600. slot print_timing: id 28 | task 18382 |
  1601. prompt eval time = 69.78 ms / 207 tokens ( 0.34 ms per token, 2966.34 tokens per second)
  1602. eval time = 20223.10 ms / 189 tokens ( 107.00 ms per token, 9.35 tokens per second)
  1603. total time = 20292.89 ms / 396 tokens
  1604. slot release: id 32 | task 18371 | stop processing: n_past = 159, truncated = 1
  1605. slot print_timing: id 32 | task 18371 |
  1606. prompt eval time = 261.29 ms / 207 tokens ( 1.26 ms per token, 792.23 tokens per second)
  1607. eval time = 22440.94 ms / 207 tokens ( 108.41 ms per token, 9.22 tokens per second)
  1608. total time = 22702.23 ms / 414 tokens
  1609. slot update_slots: id 10 | task 18401 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1610. slot update_slots: id 25 | task 18400 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1611. slot update_slots: id 45 | task 18402 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1612. slot release: id 27 | task 18388 | stop processing: n_past = 141, truncated = 1
  1613. slot print_timing: id 27 | task 18388 |
  1614. prompt eval time = 237.58 ms / 207 tokens ( 1.15 ms per token, 871.29 tokens per second)
  1615. eval time = 20084.16 ms / 189 tokens ( 106.27 ms per token, 9.41 tokens per second)
  1616. total time = 20321.74 ms / 396 tokens
  1617. slot update_slots: id 5 | task 18404 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1618. slot update_slots: id 4 | task 18405 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1619. slot update_slots: id 35 | task 18476 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1620. slot update_slots: id 46 | task 18406 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1621. slot release: id 31 | task 18396 | stop processing: n_past = 142, truncated = 1
  1622. slot print_timing: id 31 | task 18396 |
  1623. prompt eval time = 65.92 ms / 207 tokens ( 0.32 ms per token, 3140.31 tokens per second)
  1624. eval time = 19701.24 ms / 190 tokens ( 103.69 ms per token, 9.64 tokens per second)
  1625. total time = 19767.16 ms / 397 tokens
  1626. slot release: id 15 | task 18394 | stop processing: n_past = 146, truncated = 1
  1627. slot print_timing: id 15 | task 18394 |
  1628. prompt eval time = 271.41 ms / 207 tokens ( 1.31 ms per token, 762.67 tokens per second)
  1629. eval time = 20220.92 ms / 194 tokens ( 104.23 ms per token, 9.59 tokens per second)
  1630. total time = 20492.33 ms / 401 tokens
  1631. slot release: id 47 | task 18381 | stop processing: n_past = 153, truncated = 1
  1632. slot print_timing: id 47 | task 18381 |
  1633. prompt eval time = 273.90 ms / 207 tokens ( 1.32 ms per token, 755.74 tokens per second)
  1634. eval time = 21396.34 ms / 201 tokens ( 106.45 ms per token, 9.39 tokens per second)
  1635. total time = 21670.24 ms / 408 tokens
  1636. slot update_slots: id 8 | task 18407 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1637. slot release: id 24 | task 18386 | stop processing: n_past = 155, truncated = 1
  1638. slot print_timing: id 24 | task 18386 |
  1639. prompt eval time = 99.02 ms / 207 tokens ( 0.48 ms per token, 2090.47 tokens per second)
  1640. eval time = 21166.37 ms / 203 tokens ( 104.27 ms per token, 9.59 tokens per second)
  1641. total time = 21265.39 ms / 410 tokens
  1642. slot update_slots: id 50 | task 18408 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1643. slot release: id 33 | task 18395 | stop processing: n_past = 153, truncated = 1
  1644. slot print_timing: id 33 | task 18395 |
  1645. prompt eval time = 273.11 ms / 207 tokens ( 1.32 ms per token, 757.94 tokens per second)
  1646. eval time = 20440.12 ms / 201 tokens ( 101.69 ms per token, 9.83 tokens per second)
  1647. total time = 20713.23 ms / 408 tokens
  1648. slot release: id 45 | task 18402 | stop processing: n_past = 143, truncated = 1
  1649. slot print_timing: id 45 | task 18402 |
  1650. prompt eval time = 100.71 ms / 207 tokens ( 0.49 ms per token, 2055.47 tokens per second)
  1651. eval time = 19371.28 ms / 191 tokens ( 101.42 ms per token, 9.86 tokens per second)
  1652. total time = 19471.99 ms / 398 tokens
  1653. slot update_slots: id 14 | task 18478 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1654. slot update_slots: id 17 | task 18409 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1655. slot release: id 6 | task 18384 | stop processing: n_past = 158, truncated = 1
  1656. slot print_timing: id 6 | task 18384 |
  1657. prompt eval time = 97.75 ms / 207 tokens ( 0.47 ms per token, 2117.56 tokens per second)
  1658. eval time = 21437.71 ms / 206 tokens ( 104.07 ms per token, 9.61 tokens per second)
  1659. total time = 21535.47 ms / 413 tokens
  1660. slot release: id 8 | task 18407 | stop processing: n_past = 132, truncated = 1
  1661. slot print_timing: id 8 | task 18407 |
  1662. prompt eval time = 242.06 ms / 207 tokens ( 1.17 ms per token, 855.15 tokens per second)
  1663. eval time = 17675.97 ms / 180 tokens ( 98.20 ms per token, 10.18 tokens per second)
  1664. total time = 17918.03 ms / 387 tokens
  1665. slot update_slots: id 40 | task 18410 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1666. slot release: id 50 | task 18408 | stop processing: n_past = 132, truncated = 1
  1667. slot print_timing: id 50 | task 18408 |
  1668. prompt eval time = 71.55 ms / 207 tokens ( 0.35 ms per token, 2893.12 tokens per second)
  1669. eval time = 17634.19 ms / 180 tokens ( 97.97 ms per token, 10.21 tokens per second)
  1670. total time = 17705.74 ms / 387 tokens
  1671. slot update_slots: id 3 | task 18411 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1672. slot release: id 0 | task 18387 | stop processing: n_past = 160, truncated = 1
  1673. slot print_timing: id 0 | task 18387 |
  1674. prompt eval time = 237.39 ms / 207 tokens ( 1.15 ms per token, 871.97 tokens per second)
  1675. eval time = 21465.11 ms / 208 tokens ( 103.20 ms per token, 9.69 tokens per second)
  1676. total time = 21702.50 ms / 415 tokens
  1677. slot release: id 25 | task 18400 | stop processing: n_past = 148, truncated = 1
  1678. slot print_timing: id 25 | task 18400 |
  1679. prompt eval time = 99.29 ms / 207 tokens ( 0.48 ms per token, 2084.72 tokens per second)
  1680. eval time = 19878.37 ms / 196 tokens ( 101.42 ms per token, 9.86 tokens per second)
  1681. total time = 19977.66 ms / 403 tokens
  1682. slot release: id 39 | task 18378 | stop processing: n_past = 167, truncated = 1
  1683. slot print_timing: id 39 | task 18378 |
  1684. prompt eval time = 71.45 ms / 207 tokens ( 0.35 ms per token, 2897.21 tokens per second)
  1685. eval time = 22721.10 ms / 215 tokens ( 105.68 ms per token, 9.46 tokens per second)
  1686. total time = 22792.55 ms / 422 tokens
  1687. slot update_slots: id 38 | task 17128 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1688. slot update_slots: id 42 | task 18389 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1689. slot update_slots: id 7 | task 18413 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1690. slot update_slots: id 22 | task 17150 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1691. slot release: id 5 | task 18404 | stop processing: n_past = 149, truncated = 1
  1692. slot print_timing: id 5 | task 18404 |
  1693. prompt eval time = 68.72 ms / 207 tokens ( 0.33 ms per token, 3012.40 tokens per second)
  1694. eval time = 19660.99 ms / 197 tokens ( 99.80 ms per token, 10.02 tokens per second)
  1695. total time = 19729.70 ms / 404 tokens
  1696. slot update_slots: id 21 | task 18414 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1697. slot release: id 4 | task 18405 | stop processing: n_past = 152, truncated = 1
  1698. slot print_timing: id 4 | task 18405 |
  1699. prompt eval time = 303.48 ms / 207 tokens ( 1.47 ms per token, 682.08 tokens per second)
  1700. eval time = 19689.79 ms / 200 tokens ( 98.45 ms per token, 10.16 tokens per second)
  1701. total time = 19993.27 ms / 407 tokens
  1702. slot release: id 10 | task 18401 | stop processing: n_past = 158, truncated = 1
  1703. slot print_timing: id 10 | task 18401 |
  1704. prompt eval time = 64.93 ms / 207 tokens ( 0.31 ms per token, 3188.00 tokens per second)
  1705. eval time = 20475.26 ms / 206 tokens ( 99.39 ms per token, 10.06 tokens per second)
  1706. total time = 20540.19 ms / 413 tokens
  1707. slot release: id 7 | task 18413 | stop processing: n_past = 137, truncated = 1
  1708. slot print_timing: id 7 | task 18413 |
  1709. prompt eval time = 260.85 ms / 207 tokens ( 1.26 ms per token, 793.57 tokens per second)
  1710. eval time = 17028.36 ms / 185 tokens ( 92.05 ms per token, 10.86 tokens per second)
  1711. total time = 17289.20 ms / 392 tokens
  1712. slot release: id 46 | task 18406 | stop processing: n_past = 155, truncated = 1
  1713. slot print_timing: id 46 | task 18406 |
  1714. prompt eval time = 72.82 ms / 207 tokens ( 0.35 ms per token, 2842.51 tokens per second)
  1715. eval time = 19721.20 ms / 203 tokens ( 97.15 ms per token, 10.29 tokens per second)
  1716. total time = 19794.02 ms / 410 tokens
  1717. slot update_slots: id 37 | task 18416 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1718. slot release: id 3 | task 18411 | stop processing: n_past = 148, truncated = 1
  1719. slot print_timing: id 3 | task 18411 |
  1720. prompt eval time = 239.54 ms / 207 tokens ( 1.16 ms per token, 864.15 tokens per second)
  1721. eval time = 18005.35 ms / 196 tokens ( 91.86 ms per token, 10.89 tokens per second)
  1722. total time = 18244.89 ms / 403 tokens
  1723. slot update_slots: id 43 | task 18418 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1724. slot update_slots: id 36 | task 18417 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1725. slot update_slots: id 53 | task 18422 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1726. slot release: id 42 | task 18389 | stop processing: n_past = 148, truncated = 1
  1727. slot print_timing: id 42 | task 18389 |
  1728. prompt eval time = 67.36 ms / 207 tokens ( 0.33 ms per token, 3073.27 tokens per second)
  1729. eval time = 17777.27 ms / 196 tokens ( 90.70 ms per token, 11.03 tokens per second)
  1730. total time = 17844.62 ms / 403 tokens
  1731. slot release: id 17 | task 18409 | stop processing: n_past = 162, truncated = 1
  1732. slot print_timing: id 17 | task 18409 |
  1733. prompt eval time = 285.31 ms / 207 tokens ( 1.38 ms per token, 725.53 tokens per second)
  1734. eval time = 18993.04 ms / 210 tokens ( 90.44 ms per token, 11.06 tokens per second)
  1735. total time = 19278.35 ms / 417 tokens
  1736. slot release: id 37 | task 18416 | stop processing: n_past = 145, truncated = 1
  1737. slot print_timing: id 37 | task 18416 |
  1738. prompt eval time = 66.82 ms / 207 tokens ( 0.32 ms per token, 3097.78 tokens per second)
  1739. eval time = 16902.05 ms / 193 tokens ( 87.58 ms per token, 11.42 tokens per second)
  1740. total time = 16968.87 ms / 400 tokens
  1741. slot update_slots: id 29 | task 18424 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1742. slot release: id 40 | task 18410 | stop processing: n_past = 164, truncated = 1
  1743. slot print_timing: id 40 | task 18410 |
  1744. prompt eval time = 243.07 ms / 207 tokens ( 1.17 ms per token, 851.59 tokens per second)
  1745. eval time = 18815.54 ms / 212 tokens ( 88.75 ms per token, 11.27 tokens per second)
  1746. total time = 19058.61 ms / 419 tokens
  1747. slot update_slots: id 60 | task 18430 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1748. slot release: id 21 | task 18414 | stop processing: n_past = 158, truncated = 1
  1749. slot print_timing: id 21 | task 18414 |
  1750. prompt eval time = 65.36 ms / 207 tokens ( 0.32 ms per token, 3166.88 tokens per second)
  1751. eval time = 17701.99 ms / 206 tokens ( 85.93 ms per token, 11.64 tokens per second)
  1752. total time = 17767.35 ms / 413 tokens
  1753. slot update_slots: id 13 | task 18431 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1754. slot update_slots: id 18 | task 18434 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1755. slot release: id 53 | task 18422 | stop processing: n_past = 150, truncated = 1
  1756. slot print_timing: id 53 | task 18422 |
  1757. prompt eval time = 266.26 ms / 207 tokens ( 1.29 ms per token, 777.45 tokens per second)
  1758. eval time = 16377.97 ms / 198 tokens ( 82.72 ms per token, 12.09 tokens per second)
  1759. total time = 16644.22 ms / 405 tokens
  1760. slot update_slots: id 44 | task 18432 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1761. slot update_slots: id 11 | task 18433 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1762. slot update_slots: id 52 | task 18444 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1763. slot release: id 29 | task 18424 | stop processing: n_past = 144, truncated = 1
  1764. slot print_timing: id 29 | task 18424 |
  1765. prompt eval time = 68.72 ms / 207 tokens ( 0.33 ms per token, 3012.31 tokens per second)
  1766. eval time = 15358.76 ms / 192 tokens ( 79.99 ms per token, 12.50 tokens per second)
  1767. total time = 15427.48 ms / 399 tokens
  1768. slot update_slots: id 34 | task 18445 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1769. slot update_slots: id 59 | task 18446 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1770. slot release: id 36 | task 18417 | stop processing: n_past = 160, truncated = 1
  1771. slot print_timing: id 36 | task 18417 |
  1772. prompt eval time = 265.01 ms / 207 tokens ( 1.28 ms per token, 781.11 tokens per second)
  1773. eval time = 17001.68 ms / 208 tokens ( 81.74 ms per token, 12.23 tokens per second)
  1774. total time = 17266.68 ms / 415 tokens
  1775. slot release: id 60 | task 18430 | stop processing: n_past = 149, truncated = 1
  1776. slot print_timing: id 60 | task 18430 |
  1777. prompt eval time = 249.04 ms / 207 tokens ( 1.20 ms per token, 831.21 tokens per second)
  1778. eval time = 15252.59 ms / 197 tokens ( 77.42 ms per token, 12.92 tokens per second)
  1779. total time = 15501.63 ms / 404 tokens
  1780. slot update_slots: id 30 | task 18447 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1781. slot release: id 43 | task 18418 | stop processing: n_past = 165, truncated = 1
  1782. slot print_timing: id 43 | task 18418 |
  1783. prompt eval time = 70.08 ms / 207 tokens ( 0.34 ms per token, 2953.81 tokens per second)
  1784. eval time = 17370.91 ms / 213 tokens ( 81.55 ms per token, 12.26 tokens per second)
  1785. total time = 17440.99 ms / 420 tokens
  1786. slot release: id 44 | task 18432 | stop processing: n_past = 142, truncated = 1
  1787. slot print_timing: id 44 | task 18432 |
  1788. prompt eval time = 70.02 ms / 207 tokens ( 0.34 ms per token, 2956.34 tokens per second)
  1789. eval time = 14485.95 ms / 190 tokens ( 76.24 ms per token, 13.12 tokens per second)
  1790. total time = 14555.97 ms / 397 tokens
  1791. slot update_slots: id 48 | task 18448 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1792. slot release: id 18 | task 18434 | stop processing: n_past = 152, truncated = 1
  1793. slot print_timing: id 18 | task 18434 |
  1794. prompt eval time = 94.44 ms / 207 tokens ( 0.46 ms per token, 2191.91 tokens per second)
  1795. eval time = 14830.41 ms / 200 tokens ( 74.15 ms per token, 13.49 tokens per second)
  1796. total time = 14924.85 ms / 407 tokens
  1797. slot update_slots: id 51 | task 18449 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1798. slot release: id 52 | task 18444 | stop processing: n_past = 145, truncated = 1
  1799. slot print_timing: id 52 | task 18444 |
  1800. prompt eval time = 70.81 ms / 207 tokens ( 0.34 ms per token, 2923.36 tokens per second)
  1801. eval time = 14295.37 ms / 193 tokens ( 74.07 ms per token, 13.50 tokens per second)
  1802. total time = 14366.18 ms / 400 tokens
  1803. slot update_slots: id 54 | task 18450 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1804. slot release: id 13 | task 18431 | stop processing: n_past = 157, truncated = 1
  1805. slot print_timing: id 13 | task 18431 |
  1806. prompt eval time = 94.05 ms / 207 tokens ( 0.45 ms per token, 2201.07 tokens per second)
  1807. eval time = 15157.72 ms / 205 tokens ( 73.94 ms per token, 13.52 tokens per second)
  1808. total time = 15251.76 ms / 412 tokens
  1809. slot release: id 59 | task 18446 | stop processing: n_past = 146, truncated = 1
  1810. slot print_timing: id 59 | task 18446 |
  1811. prompt eval time = 274.99 ms / 207 tokens ( 1.33 ms per token, 752.74 tokens per second)
  1812. eval time = 14071.08 ms / 194 tokens ( 72.53 ms per token, 13.79 tokens per second)
  1813. total time = 14346.07 ms / 401 tokens
  1814. slot release: id 11 | task 18433 | stop processing: n_past = 150, truncated = 1
  1815. slot print_timing: id 11 | task 18433 |
  1816. prompt eval time = 239.94 ms / 207 tokens ( 1.16 ms per token, 862.73 tokens per second)
  1817. eval time = 14656.67 ms / 198 tokens ( 74.02 ms per token, 13.51 tokens per second)
  1818. total time = 14896.60 ms / 405 tokens
  1819. slot release: id 34 | task 18445 | stop processing: n_past = 148, truncated = 1
  1820. slot print_timing: id 34 | task 18445 |
  1821. prompt eval time = 273.03 ms / 207 tokens ( 1.32 ms per token, 758.16 tokens per second)
  1822. eval time = 14118.00 ms / 196 tokens ( 72.03 ms per token, 13.88 tokens per second)
  1823. total time = 14391.03 ms / 403 tokens
  1824. slot update_slots: id 9 | task 18451 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1825. slot update_slots: id 41 | task 18452 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1826. slot release: id 51 | task 18449 | stop processing: n_past = 142, truncated = 1
  1827. slot print_timing: id 51 | task 18449 |
  1828. prompt eval time = 66.77 ms / 207 tokens ( 0.32 ms per token, 3100.43 tokens per second)
  1829. eval time = 13196.81 ms / 190 tokens ( 69.46 ms per token, 14.40 tokens per second)
  1830. total time = 13263.58 ms / 397 tokens
  1831. slot release: id 30 | task 18447 | stop processing: n_past = 150, truncated = 1
  1832. slot print_timing: id 30 | task 18447 |
  1833. prompt eval time = 65.08 ms / 207 tokens ( 0.31 ms per token, 3180.51 tokens per second)
  1834. eval time = 13884.45 ms / 198 tokens ( 70.12 ms per token, 14.26 tokens per second)
  1835. total time = 13949.53 ms / 405 tokens
  1836. slot update_slots: id 20 | task 18471 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1837. slot release: id 48 | task 18448 | stop processing: n_past = 155, truncated = 1
  1838. slot print_timing: id 48 | task 18448 |
  1839. prompt eval time = 66.36 ms / 207 tokens ( 0.32 ms per token, 3119.58 tokens per second)
  1840. eval time = 13915.74 ms / 203 tokens ( 68.55 ms per token, 14.59 tokens per second)
  1841. total time = 13982.10 ms / 410 tokens
  1842. slot release: id 20 | task 18471 | stop processing: n_past = 140, truncated = 1
  1843. slot print_timing: id 20 | task 18471 |
  1844. prompt eval time = 64.84 ms / 207 tokens ( 0.31 ms per token, 3192.57 tokens per second)
  1845. eval time = 12019.93 ms / 188 tokens ( 63.94 ms per token, 15.64 tokens per second)
  1846. total time = 12084.77 ms / 395 tokens
  1847. slot release: id 9 | task 18451 | stop processing: n_past = 152, truncated = 1
  1848. slot print_timing: id 9 | task 18451 |
  1849. prompt eval time = 67.95 ms / 207 tokens ( 0.33 ms per token, 3046.27 tokens per second)
  1850. eval time = 13274.74 ms / 200 tokens ( 66.37 ms per token, 15.07 tokens per second)
  1851. total time = 13342.69 ms / 407 tokens
  1852. slot release: id 54 | task 18450 | stop processing: n_past = 163, truncated = 1
  1853. slot print_timing: id 54 | task 18450 |
  1854. prompt eval time = 67.28 ms / 207 tokens ( 0.33 ms per token, 3076.92 tokens per second)
  1855. eval time = 13594.90 ms / 211 tokens ( 64.43 ms per token, 15.52 tokens per second)
  1856. total time = 13662.18 ms / 418 tokens
  1857. slot update_slots: id 1 | task 16657 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1858. slot update_slots: id 49 | task 18473 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1859. slot release: id 41 | task 18452 | stop processing: n_past = 162, truncated = 1
  1860. slot print_timing: id 41 | task 18452 |
  1861. prompt eval time = 70.42 ms / 207 tokens ( 0.34 ms per token, 2939.59 tokens per second)
  1862. eval time = 13007.56 ms / 210 tokens ( 61.94 ms per token, 16.14 tokens per second)
  1863. total time = 13077.97 ms / 417 tokens
  1864. slot release: id 1 | task 16657 | stop processing: n_past = 135, truncated = 1
  1865. slot print_timing: id 1 | task 16657 |
  1866. prompt eval time = 63.73 ms / 207 tokens ( 0.31 ms per token, 3248.23 tokens per second)
  1867. eval time = 10734.93 ms / 183 tokens ( 58.66 ms per token, 17.05 tokens per second)
  1868. total time = 10798.66 ms / 390 tokens
  1869. slot update_slots: id 35 | task 18476 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1870. slot release: id 49 | task 18473 | stop processing: n_past = 141, truncated = 1
  1871. slot print_timing: id 49 | task 18473 |
  1872. prompt eval time = 67.33 ms / 207 tokens ( 0.33 ms per token, 3074.18 tokens per second)
  1873. eval time = 10574.60 ms / 189 tokens ( 55.95 ms per token, 17.87 tokens per second)
  1874. total time = 10641.93 ms / 396 tokens
  1875. slot release: id 35 | task 18476 | stop processing: n_past = 138, truncated = 1
  1876. slot print_timing: id 35 | task 18476 |
  1877. prompt eval time = 66.50 ms / 207 tokens ( 0.32 ms per token, 3112.83 tokens per second)
  1878. eval time = 10172.91 ms / 186 tokens ( 54.69 ms per token, 18.28 tokens per second)
  1879. total time = 10239.40 ms / 393 tokens
  1880. slot update_slots: id 14 | task 18478 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1881. slot update_slots: id 38 | task 17128 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1882. slot update_slots: id 22 | task 17150 | slot context shift, n_keep = 0, n_left = 255, n_discard = 127
  1883. slot release: id 38 | task 17128 | stop processing: n_past = 140, truncated = 1
  1884. slot print_timing: id 38 | task 17128 |
  1885. prompt eval time = 244.46 ms / 207 tokens ( 1.18 ms per token, 846.75 tokens per second)
  1886. eval time = 8892.84 ms / 188 tokens ( 47.30 ms per token, 21.14 tokens per second)
  1887. total time = 9137.31 ms / 395 tokens
  1888. slot release: id 14 | task 18478 | stop processing: n_past = 155, truncated = 1
  1889. slot print_timing: id 14 | task 18478 |
  1890. prompt eval time = 267.84 ms / 207 tokens ( 1.29 ms per token, 772.86 tokens per second)
  1891. eval time = 9704.56 ms / 203 tokens ( 47.81 ms per token, 20.92 tokens per second)
  1892. total time = 9972.40 ms / 410 tokens
  1893. slot release: id 22 | task 17150 | stop processing: n_past = 158, truncated = 1
  1894. slot print_timing: id 22 | task 17150 |
  1895. prompt eval time = 64.90 ms / 207 tokens ( 0.31 ms per token, 3189.52 tokens per second)
  1896. eval time = 9116.47 ms / 206 tokens ( 44.25 ms per token, 22.60 tokens per second)
  1897. total time = 9181.37 ms / 413 tokens
  1898. srv update_slots: all slots are idle
Add Comment
Please, Sign In to add comment