Advertisement
Guest User

Untitled

a guest
May 30th, 2025
34
0
361 days
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 21.16 KB | None | 0 0
  1. #### in llama.cpp distribution directory, this shell script: go_test_huge_model.sh
  2. cmake -B build
  3. cmake --build build --config Release -j 4
  4. cmake --build build --config Release -t llama-server
  5. cmake --build build --config Release -t llama-cli
  6.  
  7. # example from https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-32B-GGUF
  8.  
  9.  
  10.  
  11. # model that easily fits in RAM:
  12. MODEL=/aimodels/cache/DeepSeek-R1-Distill-Qwen-14B-Q5_K_M.gguf
  13.  
  14. # models that don't easily fit into RAM:
  15. MODEL=/u03/aimodels/cache/Llama-3.3-70B-Instruct-Q4_K_L.gguf
  16.  
  17. # model that is an order of magnitude larger than RAM:
  18. MODEL=/u03/aimodels/deepseek/DeepSeek-V3-GGUF/DeepSeek-V3-BF16-256x20B-Q8_0-00001-of-00016.gguf
  19.  
  20.  
  21.  
  22. # clear Linux cache for test
  23. echo dropping caches ...
  24. echo 3 | sudo tee /proc/sys/vm/drop_caches
  25. echo caches dropped
  26. # sudo sh -c "/usr/bin/echo 3 > /proc/sys/vm/drop_caches"
  27.  
  28. SEED=12
  29.  
  30. ./build/bin/llama-cli \
  31. --model $MODEL \
  32. --no-warmup \
  33. --cache-type-k q8_0 \
  34. --threads 4 \
  35. --seed $SEED \
  36. --temp 0.7 \
  37. --top-p 0.95 \
  38. --no-warmup \
  39. --ctx-size 2048 \
  40. --prompt '<|User|>
  41. Specifications: Using PyGame, open a window that is 4-inches square, with a black background, and create an equilateral triangle that is 2-inches per side, hollow, white lines. It is rotating clockwise at 6 degrees per second - to match that of a second-hand on a watch. Inside the equilateral triangle is a red ball, 1/8" in diameter. Unaffected by gravity, it is moving with an arbitrary speed, that is configurable in a variable. The ball has proper hitbox checking and can detect when it hits a wall. Hitting a wall should deflect/redirect the ball into the proper direction based on the angle of the wall that it hit and how it should deflect given the angle and speed of the wall. Be careful of bounds checking such that the ball does not get stuck outside a wall. Remember this is all in 2D space. Start with the ball located in the center of the triangle, initially moving in a random direction and with an initial speed of about one inch per second. To clarify, the triangle has three equal 2-inch sides, rotating at 6 degrees per second (not 6 degrees per frame). Be careful of pygame functions that may return zero at times (e.g. framerate) and avoid divide-by-zero errors.
  42.  
  43. Assumptions you may make:
  44. Screen size is adequate with 154 pixels per inch resolution.
  45. Frame rate is nominally 60 frames per second but may vary with system load.
  46. Pygame is installed.
  47.  
  48. Window title requirement: Please title the pygame window "Triangle-Ball '$(basename $MODEL)-$SEED'".
  49.  
  50. Code block requirement: Emit a fully functioning python program within a single code block. Enclose your python program in a single codeblock as such:
  51. \\n
  52. ```python
  53. # your program here
  54. ```
  55.  
  56. Your code will be evaluated with the following rubric:
  57. Does the program fail to execute due to a Python error? 0 points.
  58. Does the program load a pygame window with a slowly rotating equilateral triangle and a red ball? +1 point.
  59. Are the dimensions, ball speed, and triangle rotation exactly as specified? +1 point.
  60. Does the ball interact with the rotating sides? +1 point.
  61. Does the ball always correctly bounce inside the triangle and never get caught in a side or vertex? +1 point.
  62. Does the program function correctly in extreme varying frame rate conditions down to 4 fps? +1 point.
  63.  
  64. Review all the specifications and requirements, then begin coding. If you would like to think in text before beginning your codeblock, that is perfectly acceptable and may help improve your answer. Only the code in your codeblock will be evaluated.
  65.  
  66. Thanks!
  67. <|Assistant|>' \
  68. -no-cnv 2>&1 | tee $(basename $MODEL)_hugemodel.out.$SEED.txt
  69.  
  70. cat $(basename $MODEL)_hugemodel.out.$SEED.txt |
  71. sed 's/.*```/```/;' |
  72. sed -n '/^```python/,/^```/p' |
  73. sed '1d; $d' |
  74. sed 's/.*```.*/#&/' > $(basename $MODEL)_hugemodel.out.$SEED.py
  75.  
  76. ########### END OF TEST SCRIPT ###########
  77.  
  78. ########### OUTPUT ###########
  79.  
  80. build: 4735 (73e2ed3c) with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu
  81. main: llama backend init
  82. main: load the model and apply lora adapter, if any
  83. llama_model_loader: additional 15 GGUFs metadata loaded.
  84. llama_model_loader: loaded meta data with 46 key-value pairs and 1025 tensors from /u03/aimodels/deepseek/DeepSeek-V3-GGUF/DeepSeek-V3-BF16-256x20B-Q8_0-00001-of-00016.gguf (version GGUF V3 (latest))
  85. llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
  86. llama_model_loader: - kv 0: general.architecture str = deepseek2
  87. llama_model_loader: - kv 1: general.type str = model
  88. llama_model_loader: - kv 2: general.name str = DeepSeek V3 BF16
  89. llama_model_loader: - kv 3: general.size_label str = 256x20B
  90. llama_model_loader: - kv 4: deepseek2.block_count u32 = 61
  91. llama_model_loader: - kv 5: deepseek2.context_length u32 = 163840
  92. llama_model_loader: - kv 6: deepseek2.embedding_length u32 = 7168
  93. llama_model_loader: - kv 7: deepseek2.feed_forward_length u32 = 18432
  94. llama_model_loader: - kv 8: deepseek2.attention.head_count u32 = 128
  95. llama_model_loader: - kv 9: deepseek2.attention.head_count_kv u32 = 128
  96. llama_model_loader: - kv 10: deepseek2.rope.freq_base f32 = 10000.000000
  97. llama_model_loader: - kv 11: deepseek2.attention.layer_norm_rms_epsilon f32 = 0.000001
  98. llama_model_loader: - kv 12: deepseek2.expert_used_count u32 = 8
  99. llama_model_loader: - kv 13: general.file_type u32 = 7
  100. llama_model_loader: - kv 14: deepseek2.leading_dense_block_count u32 = 3
  101. llama_model_loader: - kv 15: deepseek2.vocab_size u32 = 129280
  102. llama_model_loader: - kv 16: deepseek2.attention.q_lora_rank u32 = 1536
  103. llama_model_loader: - kv 17: deepseek2.attention.kv_lora_rank u32 = 512
  104. llama_model_loader: - kv 18: deepseek2.attention.key_length u32 = 192
  105. llama_model_loader: - kv 19: deepseek2.attention.value_length u32 = 128
  106. llama_model_loader: - kv 20: deepseek2.expert_feed_forward_length u32 = 2048
  107. llama_model_loader: - kv 21: deepseek2.expert_count u32 = 256
  108. llama_model_loader: - kv 22: deepseek2.expert_shared_count u32 = 1
  109. llama_model_loader: - kv 23: deepseek2.expert_weights_scale f32 = 2.500000
  110. llama_model_loader: - kv 24: deepseek2.expert_weights_norm bool = true
  111. llama_model_loader: - kv 25: deepseek2.expert_gating_func u32 = 2
  112. llama_model_loader: - kv 26: deepseek2.rope.dimension_count u32 = 64
  113. llama_model_loader: - kv 27: deepseek2.rope.scaling.type str = yarn
  114. llama_model_loader: - kv 28: deepseek2.rope.scaling.factor f32 = 40.000000
  115. llama_model_loader: - kv 29: deepseek2.rope.scaling.original_context_length u32 = 4096
  116. llama_model_loader: - kv 30: deepseek2.rope.scaling.yarn_log_multiplier f32 = 0.100000
  117. llama_model_loader: - kv 31: tokenizer.ggml.model str = gpt2
  118. llama_model_loader: - kv 32: tokenizer.ggml.pre str = deepseek-v3
  119. llama_model_loader: - kv 33: tokenizer.ggml.tokens arr[str,129280] = ["<|begin▁of▁sentence|>", "<�...
  120. llama_model_loader: - kv 34: tokenizer.ggml.token_type arr[i32,129280] = [3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
  121. llama_model_loader: - kv 35: tokenizer.ggml.merges arr[str,127741] = ["Ġ t", "Ġ a", "i n", "Ġ Ġ", "h e...
  122. llama_model_loader: - kv 36: tokenizer.ggml.bos_token_id u32 = 0
  123. llama_model_loader: - kv 37: tokenizer.ggml.eos_token_id u32 = 1
  124. llama_model_loader: - kv 38: tokenizer.ggml.padding_token_id u32 = 1
  125. llama_model_loader: - kv 39: tokenizer.ggml.add_bos_token bool = true
  126. llama_model_loader: - kv 40: tokenizer.ggml.add_eos_token bool = false
  127. llama_model_loader: - kv 41: tokenizer.chat_template str = {% if not add_generation_prompt is de...
  128. llama_model_loader: - kv 42: general.quantization_version u32 = 2
  129. llama_model_loader: - kv 43: split.no u16 = 0
  130. llama_model_loader: - kv 44: split.count u16 = 16
  131. llama_model_loader: - kv 45: split.tensors.count i32 = 1025
  132. llama_model_loader: - type f32: 361 tensors
  133. llama_model_loader: - type q8_0: 664 tensors
  134. print_info: file format = GGUF V3 (latest)
  135. print_info: file type = Q8_0
  136. print_info: file size = 664.29 GiB (8.50 BPW)
  137. load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
  138. load: special tokens cache size = 818
  139. load: token to piece cache size = 0.8223 MB
  140. print_info: arch = deepseek2
  141. print_info: vocab_only = 0
  142. print_info: n_ctx_train = 163840
  143. print_info: n_embd = 7168
  144. print_info: n_layer = 61
  145. print_info: n_head = 128
  146. print_info: n_head_kv = 128
  147. print_info: n_rot = 64
  148. print_info: n_swa = 0
  149. print_info: n_embd_head_k = 192
  150. print_info: n_embd_head_v = 128
  151. print_info: n_gqa = 1
  152. print_info: n_embd_k_gqa = 24576
  153. print_info: n_embd_v_gqa = 16384
  154. print_info: f_norm_eps = 0.0e+00
  155. print_info: f_norm_rms_eps = 1.0e-06
  156. print_info: f_clamp_kqv = 0.0e+00
  157. print_info: f_max_alibi_bias = 0.0e+00
  158. print_info: f_logit_scale = 0.0e+00
  159. print_info: n_ff = 18432
  160. print_info: n_expert = 256
  161. print_info: n_expert_used = 8
  162. print_info: causal attn = 1
  163. print_info: pooling type = 0
  164. print_info: rope type = 0
  165. print_info: rope scaling = yarn
  166. print_info: freq_base_train = 10000.0
  167. print_info: freq_scale_train = 0.025
  168. print_info: n_ctx_orig_yarn = 4096
  169. print_info: rope_finetuned = unknown
  170. print_info: ssm_d_conv = 0
  171. print_info: ssm_d_inner = 0
  172. print_info: ssm_d_state = 0
  173. print_info: ssm_dt_rank = 0
  174. print_info: ssm_dt_b_c_rms = 0
  175. print_info: model type = 671B
  176. print_info: model params = 671.03 B
  177. print_info: general.name = DeepSeek V3 BF16
  178. print_info: n_layer_dense_lead = 3
  179. print_info: n_lora_q = 1536
  180. print_info: n_lora_kv = 512
  181. print_info: n_ff_exp = 2048
  182. print_info: n_expert_shared = 1
  183. print_info: expert_weights_scale = 2.5
  184. print_info: expert_weights_norm = 1
  185. print_info: expert_gating_func = sigmoid
  186. print_info: rope_yarn_log_mul = 0.1000
  187. print_info: vocab type = BPE
  188. print_info: n_vocab = 129280
  189. print_info: n_merges = 127741
  190. print_info: BOS token = 0 '<|begin▁of▁sentence|>'
  191. print_info: EOS token = 1 '<|end▁of▁sentence|>'
  192. print_info: EOT token = 1 '<|end▁of▁sentence|>'
  193. print_info: PAD token = 1 '<|end▁of▁sentence|>'
  194. print_info: LF token = 201 'Ċ'
  195. print_info: FIM PRE token = 128801 '<|fim▁begin|>'
  196. print_info: FIM SUF token = 128800 '<|fim▁hole|>'
  197. print_info: FIM MID token = 128802 '<|fim▁end|>'
  198. print_info: EOG token = 1 '<|end▁of▁sentence|>'
  199. print_info: max token length = 256
  200. load_tensors: loading model tensors, this can take a while... (mmap = true)
  201. load_tensors: CPU_Mapped model buffer size = 41757.90 MiB
  202. load_tensors: CPU_Mapped model buffer size = 42853.11 MiB
  203. load_tensors: CPU_Mapped model buffer size = 42611.83 MiB
  204. load_tensors: CPU_Mapped model buffer size = 42853.11 MiB
  205. load_tensors: CPU_Mapped model buffer size = 42853.11 MiB
  206. load_tensors: CPU_Mapped model buffer size = 42611.83 MiB
  207. load_tensors: CPU_Mapped model buffer size = 42853.11 MiB
  208. load_tensors: CPU_Mapped model buffer size = 42853.11 MiB
  209. load_tensors: CPU_Mapped model buffer size = 42611.83 MiB
  210. load_tensors: CPU_Mapped model buffer size = 42853.11 MiB
  211. load_tensors: CPU_Mapped model buffer size = 42853.11 MiB
  212. load_tensors: CPU_Mapped model buffer size = 42611.83 MiB
  213. load_tensors: CPU_Mapped model buffer size = 42853.11 MiB
  214. load_tensors: CPU_Mapped model buffer size = 42853.11 MiB
  215. load_tensors: CPU_Mapped model buffer size = 42611.83 MiB
  216. load_tensors: CPU_Mapped model buffer size = 39742.87 MiB
  217. ....................................................................................................
  218. llama_init_from_model: n_seq_max = 1
  219. llama_init_from_model: n_ctx = 2048
  220. llama_init_from_model: n_ctx_per_seq = 2048
  221. llama_init_from_model: n_batch = 2048
  222. llama_init_from_model: n_ubatch = 512
  223. llama_init_from_model: flash_attn = 0
  224. llama_init_from_model: freq_base = 10000.0
  225. llama_init_from_model: freq_scale = 0.025
  226. llama_init_from_model: n_ctx_per_seq (2048) < n_ctx_train (163840) -- the full capacity of the model will not be utilized
  227. llama_kv_cache_init: kv_size = 2048, offload = 1, type_k = 'q8_0', type_v = 'f16', n_layer = 61, can_shift = 0
  228. llama_kv_cache_init: CPU KV buffer size = 7015.00 MiB
  229. llama_init_from_model: KV self size = 7015.00 MiB, K (q8_0): 3111.00 MiB, V (f16): 3904.00 MiB
  230. llama_init_from_model: CPU output buffer size = 0.49 MiB
  231. llama_init_from_model: CPU compute buffer size = 670.01 MiB
  232. llama_init_from_model: graph nodes = 5025
  233. llama_init_from_model: graph splits = 1
  234. common_init_from_params: KV cache shifting is not supported for this model, disabling KV cache shifting
  235. common_init_from_params: setting dry_penalty_last_n to ctx_size = 2048
  236. main: llama threadpool init, n_threads = 4
  237.  
  238. system_info: n_threads = 4 (n_threads_batch = 4) / 4 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
  239.  
  240. sampler seed: 12
  241. sampler params:
  242. repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
  243. dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 2048
  244. top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.700
  245. mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
  246. sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
  247. generate: n_ctx = 2048, n_batch = 2048, n_predict = -1, n_keep = 1
  248.  
  249.  
  250. Specifications: Using PyGame, open a window that is 4-inches square, with a black background, and create an equilateral triangle that is 2-inches per side, hollow, white lines. It is rotating clockwise at 6 degrees per second - to match that of a second-hand on a watch. Inside the equilateral triangle is a red ball, 1/8" in diameter. Unaffected by gravity, it is moving with an arbitrary speed, that is configurable in a variable. The ball has proper hitbox checking and can detect when it hits a wall. Hitting a wall should deflect/redirect the ball into the proper direction based on the angle of the wall that it hit and how it should deflect given the angle and speed of the wall. Be careful of bounds checking such that the ball does not get stuck outside a wall. Remember this is all in 2D space. Start with the ball located in the center of the triangle, initially moving in a random direction and with an initial speed of about one inch per second. To clarify, the triangle has three equal 2-inch sides, rotating at 6 degrees per second (not 6 degrees per frame). Be careful of pygame functions that may return zero at times (e.g. framerate) and avoid divide-by-zero errors.
  251.  
  252. Assumptions you may make:
  253. Screen size is adequate with 154 pixels per inch resolution.
  254. Frame rate is nominally 60 frames per second but may vary with system load.
  255. Pygame is installed.
  256.  
  257. Window title requirement: Please title the pygame window "Triangle-Ball DeepSeek-V3-BF16-256x20B-Q8_0-00001-of-00016.gguf-12".
  258.  
  259. Code block requirement: Emit a fully functioning python program within a single code block. Enclose your python program in a single codeblock as such:
  260. \n
  261. ```python
  262. # your program here
  263. ```
  264.  
  265. Your code will be evaluated with the following rubric:
  266. Does the program fail to execute due to a Python error? 0 points.
  267. Does the program load a pygame window with a slowly rotating equilateral triangle and a red ball? +1 point.
  268. Are the dimensions, ball speed, and triangle rotation exactly as specified? +1 point.
  269. Does the ball interact with the rotating sides? +1 point.
  270. Does the ball always correctly bounce inside the triangle and never get caught in a side or vertex? +1 point.
  271. Does the program function correctly in extreme varying frame rate conditions down to 4 fps? +1 point.
  272.  
  273. Review all the specifications and requirements, then begin coding. If you would like to think in text before beginning your codeblock, that is perfectly acceptable and may help improve your answer. Only the code in your codeblock will be evaluated.
  274.  
  275. Thanks!
  276. ```python
  277. import pygame
  278. import math
  279. import random
  280.  
  281. # Constants
  282. PIXELS_PER_INCH = 154
  283. WINDOW_SIZE = 4 * PIXELS_PER_INCH
  284. TRIANGLE_SIDE = 2 * PIXELS_PER_INCH
  285. BALL_DIAMETER = (1/8) * PIXELS_PER_INCH
  286. BALL_RADIUS = BALL_DIAMETER / 2
  287. TRIANGLE_ROTATION_SPEED = 6 # degrees per second
  288. BALL_SPEED = 1 * PIXELS_PER_INCH # pixels per second
  289. FRAME_RATE = 60
  290.  
  291. # Colors
  292. BLACK = (0, 0, 0)
  293. WHITE = (255, 255, 255)
  294. RED = (255, 0, 0)
  295.  
  296. # Initialize Pygame
  297. pygame.init()
  298. screen = pygame.display.set_mode((WINDOW_SIZE, WINDOW_SIZE))
  299. pygame.display.set_caption("Triangle-Ball DeepSeek-V3-BF16-256x20B-Q8_0-00001-of-00016.gguf-12")
  300. clock = pygame.time.Clock()
  301.  
  302. # Triangle vertices
  303. def get_triangle_vertices(angle):
  304. center = (WINDOW_SIZE // 2, WINDOW_SIZE // 2)
  305. vertices = []
  306. for i in range(3):
  307. theta = math.radians(angle + 120 * i)
  308. x = center[0] + (TRIANGLE_SIDE / math.sqrt(3)) * math.cos(theta)
  309. y = center[1] + (TRIANGLE_SIDE / math.sqrt(3)) * math.sin(theta)
  310. vertices.append((x, y))
  311. return vertices
  312.  
  313. # Ball properties
  314. ball_pos = [WINDOW_SIZE // 2, WINDOW_SIZE // 2]
  315. ball_velocity = [random.uniform(-1, 1), random.uniform(-1, 1)]
  316. ball_velocity = [v / math.hypot(*ball_velocity) * BALL_SPEED for v in ball_velocity]
  317.  
  318. # Rotation angle
  319. angle = 0
  320.  
  321. # Main loop
  322. running = True
  323. while running:
  324. dt = clock.tick(FRAME_RATE) / 1000 # Time since last frame in seconds
  325.  
  326. # Handle events
  327. for event in pygame.event.get():
  328. if event.type == pygame.QUIT:
  329. running = False
  330.  
  331. # Update triangle rotation
  332. angle += TRIANGLE_ROTATION_SPEED * dt
  333. triangle_vertices = get_triangle_vertices(angle)
  334.  
  335. # Update ball position
  336. ball_pos[0] += ball_velocity[0] * dt
  337. ball_pos[1] += ball_velocity[1] * dt
  338.  
  339. # Ball-wall collision detection
  340. for i in range(3):
  341. p1 = triangle_vertices[i]
  342. p2 = triangle_vertices[(i + 1) % 3]
  343.  
  344. # Calculate the normal vector of the wall
  345. wall_vector = (p2[0] - p1[0], p2[1] - p1[1])
  346. normal_vector = (-wall_vector[1], wall_vector[0])
  347. normal_length = math.hypot(*normal_vector)
  348. normal_vector = (normal_vector[0] / normal_length, normal_vector[1] / normal_length)
  349.  
  350. # Calculate the distance from the ball to the wall
  351. ball_to_wall = (ball_pos[0] - p1[0], ball_pos[1] - p1[1])
  352. distance = abs(ball_to_wall[0] * normal_vector[0] + ball_to_wall[1] * normal_vector[1])
  353.  
  354. if distance <= BALL_RADIUS:
  355. # Reflect the ball's velocity
  356. dot_product = ball_velocity[0] * normal_vector[0] + ball_velocity[1] * normal_vector[1]
  357. ball_velocity[0] -= 2 * dot_product * normal_vector[0]
  358. ball_velocity[1] -= 2 * dot_product * normal_vector[1]
  359.  
  360. # Move the ball away from the wall to prevent sticking
  361. ball_pos[0] += normal_vector[0] * (BALL_RADIUS - distance)
  362. ball_pos[1] += normal_vector[1] * (BALL_RADIUS - distance)
  363.  
  364. # Draw everything
  365. screen.fill(BLACK)
  366. pygame.draw.polygon(screen, WHITE, triangle_vertices, 1)
  367. pygame.draw.circle(screen, RED, (int(ball_pos[0]), int(ball_pos[1])), int(BALL_RADIUS))
  368. pygame.display.flip()
  369.  
  370. pygame.quit()
  371. ``` [end of text]
  372.  
  373.  
  374. llama_perf_sampler_print: sampling time = 331.37 ms / 1585 runs ( 0.21 ms per token, 4783.22 tokens per second)
  375. llama_perf_context_print: load time = 872359.94 ms
  376. llama_perf_context_print: prompt eval time = 601031.08 ms / 570 tokens ( 1054.44 ms per token, 0.95 tokens per second)
  377. llama_perf_context_print: eval time = 12235806.10 ms / 1014 runs (12066.87 ms per token, 0.08 tokens per second)
  378. llama_perf_context_print: total time = 13109331.82 ms / 1584 tokens
  379.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement