Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- #### in llama.cpp distribution directory, this shell script: go_test_huge_model.sh
- cmake -B build
- cmake --build build --config Release -j 4
- cmake --build build --config Release -t llama-server
- cmake --build build --config Release -t llama-cli
- # example from https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-32B-GGUF
- # model that easily fits in RAM:
- MODEL=/aimodels/cache/DeepSeek-R1-Distill-Qwen-14B-Q5_K_M.gguf
- # models that don't easily fit into RAM:
- MODEL=/u03/aimodels/cache/Llama-3.3-70B-Instruct-Q4_K_L.gguf
- # model that is an order of magnitude larger than RAM:
- MODEL=/u03/aimodels/deepseek/DeepSeek-V3-GGUF/DeepSeek-V3-BF16-256x20B-Q8_0-00001-of-00016.gguf
- # clear Linux cache for test
- echo dropping caches ...
- echo 3 | sudo tee /proc/sys/vm/drop_caches
- echo caches dropped
- # sudo sh -c "/usr/bin/echo 3 > /proc/sys/vm/drop_caches"
- SEED=12
- ./build/bin/llama-cli \
- --model $MODEL \
- --no-warmup \
- --cache-type-k q8_0 \
- --threads 4 \
- --seed $SEED \
- --temp 0.7 \
- --top-p 0.95 \
- --no-warmup \
- --ctx-size 2048 \
- --prompt '<|User|>
- Specifications: Using PyGame, open a window that is 4-inches square, with a black background, and create an equilateral triangle that is 2-inches per side, hollow, white lines. It is rotating clockwise at 6 degrees per second - to match that of a second-hand on a watch. Inside the equilateral triangle is a red ball, 1/8" in diameter. Unaffected by gravity, it is moving with an arbitrary speed, that is configurable in a variable. The ball has proper hitbox checking and can detect when it hits a wall. Hitting a wall should deflect/redirect the ball into the proper direction based on the angle of the wall that it hit and how it should deflect given the angle and speed of the wall. Be careful of bounds checking such that the ball does not get stuck outside a wall. Remember this is all in 2D space. Start with the ball located in the center of the triangle, initially moving in a random direction and with an initial speed of about one inch per second. To clarify, the triangle has three equal 2-inch sides, rotating at 6 degrees per second (not 6 degrees per frame). Be careful of pygame functions that may return zero at times (e.g. framerate) and avoid divide-by-zero errors.
- Assumptions you may make:
- Screen size is adequate with 154 pixels per inch resolution.
- Frame rate is nominally 60 frames per second but may vary with system load.
- Pygame is installed.
- Window title requirement: Please title the pygame window "Triangle-Ball '$(basename $MODEL)-$SEED'".
- Code block requirement: Emit a fully functioning python program within a single code block. Enclose your python program in a single codeblock as such:
- \\n
- ```python
- # your program here
- ```
- Your code will be evaluated with the following rubric:
- Does the program fail to execute due to a Python error? 0 points.
- Does the program load a pygame window with a slowly rotating equilateral triangle and a red ball? +1 point.
- Are the dimensions, ball speed, and triangle rotation exactly as specified? +1 point.
- Does the ball interact with the rotating sides? +1 point.
- Does the ball always correctly bounce inside the triangle and never get caught in a side or vertex? +1 point.
- Does the program function correctly in extreme varying frame rate conditions down to 4 fps? +1 point.
- Review all the specifications and requirements, then begin coding. If you would like to think in text before beginning your codeblock, that is perfectly acceptable and may help improve your answer. Only the code in your codeblock will be evaluated.
- Thanks!
- <|Assistant|>' \
- -no-cnv 2>&1 | tee $(basename $MODEL)_hugemodel.out.$SEED.txt
- cat $(basename $MODEL)_hugemodel.out.$SEED.txt |
- sed 's/.*```/```/;' |
- sed -n '/^```python/,/^```/p' |
- sed '1d; $d' |
- sed 's/.*```.*/#&/' > $(basename $MODEL)_hugemodel.out.$SEED.py
- ########### END OF TEST SCRIPT ###########
- ########### OUTPUT ###########
- build: 4735 (73e2ed3c) with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu
- main: llama backend init
- main: load the model and apply lora adapter, if any
- llama_model_loader: additional 15 GGUFs metadata loaded.
- llama_model_loader: loaded meta data with 46 key-value pairs and 1025 tensors from /u03/aimodels/deepseek/DeepSeek-V3-GGUF/DeepSeek-V3-BF16-256x20B-Q8_0-00001-of-00016.gguf (version GGUF V3 (latest))
- llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
- llama_model_loader: - kv 0: general.architecture str = deepseek2
- llama_model_loader: - kv 1: general.type str = model
- llama_model_loader: - kv 2: general.name str = DeepSeek V3 BF16
- llama_model_loader: - kv 3: general.size_label str = 256x20B
- llama_model_loader: - kv 4: deepseek2.block_count u32 = 61
- llama_model_loader: - kv 5: deepseek2.context_length u32 = 163840
- llama_model_loader: - kv 6: deepseek2.embedding_length u32 = 7168
- llama_model_loader: - kv 7: deepseek2.feed_forward_length u32 = 18432
- llama_model_loader: - kv 8: deepseek2.attention.head_count u32 = 128
- llama_model_loader: - kv 9: deepseek2.attention.head_count_kv u32 = 128
- llama_model_loader: - kv 10: deepseek2.rope.freq_base f32 = 10000.000000
- llama_model_loader: - kv 11: deepseek2.attention.layer_norm_rms_epsilon f32 = 0.000001
- llama_model_loader: - kv 12: deepseek2.expert_used_count u32 = 8
- llama_model_loader: - kv 13: general.file_type u32 = 7
- llama_model_loader: - kv 14: deepseek2.leading_dense_block_count u32 = 3
- llama_model_loader: - kv 15: deepseek2.vocab_size u32 = 129280
- llama_model_loader: - kv 16: deepseek2.attention.q_lora_rank u32 = 1536
- llama_model_loader: - kv 17: deepseek2.attention.kv_lora_rank u32 = 512
- llama_model_loader: - kv 18: deepseek2.attention.key_length u32 = 192
- llama_model_loader: - kv 19: deepseek2.attention.value_length u32 = 128
- llama_model_loader: - kv 20: deepseek2.expert_feed_forward_length u32 = 2048
- llama_model_loader: - kv 21: deepseek2.expert_count u32 = 256
- llama_model_loader: - kv 22: deepseek2.expert_shared_count u32 = 1
- llama_model_loader: - kv 23: deepseek2.expert_weights_scale f32 = 2.500000
- llama_model_loader: - kv 24: deepseek2.expert_weights_norm bool = true
- llama_model_loader: - kv 25: deepseek2.expert_gating_func u32 = 2
- llama_model_loader: - kv 26: deepseek2.rope.dimension_count u32 = 64
- llama_model_loader: - kv 27: deepseek2.rope.scaling.type str = yarn
- llama_model_loader: - kv 28: deepseek2.rope.scaling.factor f32 = 40.000000
- llama_model_loader: - kv 29: deepseek2.rope.scaling.original_context_length u32 = 4096
- llama_model_loader: - kv 30: deepseek2.rope.scaling.yarn_log_multiplier f32 = 0.100000
- llama_model_loader: - kv 31: tokenizer.ggml.model str = gpt2
- llama_model_loader: - kv 32: tokenizer.ggml.pre str = deepseek-v3
- llama_model_loader: - kv 33: tokenizer.ggml.tokens arr[str,129280] = ["<|begin▁of▁sentence|>", "<�...
- llama_model_loader: - kv 34: tokenizer.ggml.token_type arr[i32,129280] = [3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
- llama_model_loader: - kv 35: tokenizer.ggml.merges arr[str,127741] = ["Ġ t", "Ġ a", "i n", "Ġ Ġ", "h e...
- llama_model_loader: - kv 36: tokenizer.ggml.bos_token_id u32 = 0
- llama_model_loader: - kv 37: tokenizer.ggml.eos_token_id u32 = 1
- llama_model_loader: - kv 38: tokenizer.ggml.padding_token_id u32 = 1
- llama_model_loader: - kv 39: tokenizer.ggml.add_bos_token bool = true
- llama_model_loader: - kv 40: tokenizer.ggml.add_eos_token bool = false
- llama_model_loader: - kv 41: tokenizer.chat_template str = {% if not add_generation_prompt is de...
- llama_model_loader: - kv 42: general.quantization_version u32 = 2
- llama_model_loader: - kv 43: split.no u16 = 0
- llama_model_loader: - kv 44: split.count u16 = 16
- llama_model_loader: - kv 45: split.tensors.count i32 = 1025
- llama_model_loader: - type f32: 361 tensors
- llama_model_loader: - type q8_0: 664 tensors
- print_info: file format = GGUF V3 (latest)
- print_info: file type = Q8_0
- print_info: file size = 664.29 GiB (8.50 BPW)
- load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
- load: special tokens cache size = 818
- load: token to piece cache size = 0.8223 MB
- print_info: arch = deepseek2
- print_info: vocab_only = 0
- print_info: n_ctx_train = 163840
- print_info: n_embd = 7168
- print_info: n_layer = 61
- print_info: n_head = 128
- print_info: n_head_kv = 128
- print_info: n_rot = 64
- print_info: n_swa = 0
- print_info: n_embd_head_k = 192
- print_info: n_embd_head_v = 128
- print_info: n_gqa = 1
- print_info: n_embd_k_gqa = 24576
- print_info: n_embd_v_gqa = 16384
- print_info: f_norm_eps = 0.0e+00
- print_info: f_norm_rms_eps = 1.0e-06
- print_info: f_clamp_kqv = 0.0e+00
- print_info: f_max_alibi_bias = 0.0e+00
- print_info: f_logit_scale = 0.0e+00
- print_info: n_ff = 18432
- print_info: n_expert = 256
- print_info: n_expert_used = 8
- print_info: causal attn = 1
- print_info: pooling type = 0
- print_info: rope type = 0
- print_info: rope scaling = yarn
- print_info: freq_base_train = 10000.0
- print_info: freq_scale_train = 0.025
- print_info: n_ctx_orig_yarn = 4096
- print_info: rope_finetuned = unknown
- print_info: ssm_d_conv = 0
- print_info: ssm_d_inner = 0
- print_info: ssm_d_state = 0
- print_info: ssm_dt_rank = 0
- print_info: ssm_dt_b_c_rms = 0
- print_info: model type = 671B
- print_info: model params = 671.03 B
- print_info: general.name = DeepSeek V3 BF16
- print_info: n_layer_dense_lead = 3
- print_info: n_lora_q = 1536
- print_info: n_lora_kv = 512
- print_info: n_ff_exp = 2048
- print_info: n_expert_shared = 1
- print_info: expert_weights_scale = 2.5
- print_info: expert_weights_norm = 1
- print_info: expert_gating_func = sigmoid
- print_info: rope_yarn_log_mul = 0.1000
- print_info: vocab type = BPE
- print_info: n_vocab = 129280
- print_info: n_merges = 127741
- print_info: BOS token = 0 '<|begin▁of▁sentence|>'
- print_info: EOS token = 1 '<|end▁of▁sentence|>'
- print_info: EOT token = 1 '<|end▁of▁sentence|>'
- print_info: PAD token = 1 '<|end▁of▁sentence|>'
- print_info: LF token = 201 'Ċ'
- print_info: FIM PRE token = 128801 '<|fim▁begin|>'
- print_info: FIM SUF token = 128800 '<|fim▁hole|>'
- print_info: FIM MID token = 128802 '<|fim▁end|>'
- print_info: EOG token = 1 '<|end▁of▁sentence|>'
- print_info: max token length = 256
- load_tensors: loading model tensors, this can take a while... (mmap = true)
- load_tensors: CPU_Mapped model buffer size = 41757.90 MiB
- load_tensors: CPU_Mapped model buffer size = 42853.11 MiB
- load_tensors: CPU_Mapped model buffer size = 42611.83 MiB
- load_tensors: CPU_Mapped model buffer size = 42853.11 MiB
- load_tensors: CPU_Mapped model buffer size = 42853.11 MiB
- load_tensors: CPU_Mapped model buffer size = 42611.83 MiB
- load_tensors: CPU_Mapped model buffer size = 42853.11 MiB
- load_tensors: CPU_Mapped model buffer size = 42853.11 MiB
- load_tensors: CPU_Mapped model buffer size = 42611.83 MiB
- load_tensors: CPU_Mapped model buffer size = 42853.11 MiB
- load_tensors: CPU_Mapped model buffer size = 42853.11 MiB
- load_tensors: CPU_Mapped model buffer size = 42611.83 MiB
- load_tensors: CPU_Mapped model buffer size = 42853.11 MiB
- load_tensors: CPU_Mapped model buffer size = 42853.11 MiB
- load_tensors: CPU_Mapped model buffer size = 42611.83 MiB
- load_tensors: CPU_Mapped model buffer size = 39742.87 MiB
- ....................................................................................................
- llama_init_from_model: n_seq_max = 1
- llama_init_from_model: n_ctx = 2048
- llama_init_from_model: n_ctx_per_seq = 2048
- llama_init_from_model: n_batch = 2048
- llama_init_from_model: n_ubatch = 512
- llama_init_from_model: flash_attn = 0
- llama_init_from_model: freq_base = 10000.0
- llama_init_from_model: freq_scale = 0.025
- llama_init_from_model: n_ctx_per_seq (2048) < n_ctx_train (163840) -- the full capacity of the model will not be utilized
- llama_kv_cache_init: kv_size = 2048, offload = 1, type_k = 'q8_0', type_v = 'f16', n_layer = 61, can_shift = 0
- llama_kv_cache_init: CPU KV buffer size = 7015.00 MiB
- llama_init_from_model: KV self size = 7015.00 MiB, K (q8_0): 3111.00 MiB, V (f16): 3904.00 MiB
- llama_init_from_model: CPU output buffer size = 0.49 MiB
- llama_init_from_model: CPU compute buffer size = 670.01 MiB
- llama_init_from_model: graph nodes = 5025
- llama_init_from_model: graph splits = 1
- common_init_from_params: KV cache shifting is not supported for this model, disabling KV cache shifting
- common_init_from_params: setting dry_penalty_last_n to ctx_size = 2048
- main: llama threadpool init, n_threads = 4
- system_info: n_threads = 4 (n_threads_batch = 4) / 4 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
- sampler seed: 12
- sampler params:
- repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
- dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 2048
- top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.700
- mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
- sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
- generate: n_ctx = 2048, n_batch = 2048, n_predict = -1, n_keep = 1
- Specifications: Using PyGame, open a window that is 4-inches square, with a black background, and create an equilateral triangle that is 2-inches per side, hollow, white lines. It is rotating clockwise at 6 degrees per second - to match that of a second-hand on a watch. Inside the equilateral triangle is a red ball, 1/8" in diameter. Unaffected by gravity, it is moving with an arbitrary speed, that is configurable in a variable. The ball has proper hitbox checking and can detect when it hits a wall. Hitting a wall should deflect/redirect the ball into the proper direction based on the angle of the wall that it hit and how it should deflect given the angle and speed of the wall. Be careful of bounds checking such that the ball does not get stuck outside a wall. Remember this is all in 2D space. Start with the ball located in the center of the triangle, initially moving in a random direction and with an initial speed of about one inch per second. To clarify, the triangle has three equal 2-inch sides, rotating at 6 degrees per second (not 6 degrees per frame). Be careful of pygame functions that may return zero at times (e.g. framerate) and avoid divide-by-zero errors.
- Assumptions you may make:
- Screen size is adequate with 154 pixels per inch resolution.
- Frame rate is nominally 60 frames per second but may vary with system load.
- Pygame is installed.
- Window title requirement: Please title the pygame window "Triangle-Ball DeepSeek-V3-BF16-256x20B-Q8_0-00001-of-00016.gguf-12".
- Code block requirement: Emit a fully functioning python program within a single code block. Enclose your python program in a single codeblock as such:
- \n
- ```python
- # your program here
- ```
- Your code will be evaluated with the following rubric:
- Does the program fail to execute due to a Python error? 0 points.
- Does the program load a pygame window with a slowly rotating equilateral triangle and a red ball? +1 point.
- Are the dimensions, ball speed, and triangle rotation exactly as specified? +1 point.
- Does the ball interact with the rotating sides? +1 point.
- Does the ball always correctly bounce inside the triangle and never get caught in a side or vertex? +1 point.
- Does the program function correctly in extreme varying frame rate conditions down to 4 fps? +1 point.
- Review all the specifications and requirements, then begin coding. If you would like to think in text before beginning your codeblock, that is perfectly acceptable and may help improve your answer. Only the code in your codeblock will be evaluated.
- Thanks!
- ```python
- import pygame
- import math
- import random
- # Constants
- PIXELS_PER_INCH = 154
- WINDOW_SIZE = 4 * PIXELS_PER_INCH
- TRIANGLE_SIDE = 2 * PIXELS_PER_INCH
- BALL_DIAMETER = (1/8) * PIXELS_PER_INCH
- BALL_RADIUS = BALL_DIAMETER / 2
- TRIANGLE_ROTATION_SPEED = 6 # degrees per second
- BALL_SPEED = 1 * PIXELS_PER_INCH # pixels per second
- FRAME_RATE = 60
- # Colors
- BLACK = (0, 0, 0)
- WHITE = (255, 255, 255)
- RED = (255, 0, 0)
- # Initialize Pygame
- pygame.init()
- screen = pygame.display.set_mode((WINDOW_SIZE, WINDOW_SIZE))
- pygame.display.set_caption("Triangle-Ball DeepSeek-V3-BF16-256x20B-Q8_0-00001-of-00016.gguf-12")
- clock = pygame.time.Clock()
- # Triangle vertices
- def get_triangle_vertices(angle):
- center = (WINDOW_SIZE // 2, WINDOW_SIZE // 2)
- vertices = []
- for i in range(3):
- theta = math.radians(angle + 120 * i)
- x = center[0] + (TRIANGLE_SIDE / math.sqrt(3)) * math.cos(theta)
- y = center[1] + (TRIANGLE_SIDE / math.sqrt(3)) * math.sin(theta)
- vertices.append((x, y))
- return vertices
- # Ball properties
- ball_pos = [WINDOW_SIZE // 2, WINDOW_SIZE // 2]
- ball_velocity = [random.uniform(-1, 1), random.uniform(-1, 1)]
- ball_velocity = [v / math.hypot(*ball_velocity) * BALL_SPEED for v in ball_velocity]
- # Rotation angle
- angle = 0
- # Main loop
- running = True
- while running:
- dt = clock.tick(FRAME_RATE) / 1000 # Time since last frame in seconds
- # Handle events
- for event in pygame.event.get():
- if event.type == pygame.QUIT:
- running = False
- # Update triangle rotation
- angle += TRIANGLE_ROTATION_SPEED * dt
- triangle_vertices = get_triangle_vertices(angle)
- # Update ball position
- ball_pos[0] += ball_velocity[0] * dt
- ball_pos[1] += ball_velocity[1] * dt
- # Ball-wall collision detection
- for i in range(3):
- p1 = triangle_vertices[i]
- p2 = triangle_vertices[(i + 1) % 3]
- # Calculate the normal vector of the wall
- wall_vector = (p2[0] - p1[0], p2[1] - p1[1])
- normal_vector = (-wall_vector[1], wall_vector[0])
- normal_length = math.hypot(*normal_vector)
- normal_vector = (normal_vector[0] / normal_length, normal_vector[1] / normal_length)
- # Calculate the distance from the ball to the wall
- ball_to_wall = (ball_pos[0] - p1[0], ball_pos[1] - p1[1])
- distance = abs(ball_to_wall[0] * normal_vector[0] + ball_to_wall[1] * normal_vector[1])
- if distance <= BALL_RADIUS:
- # Reflect the ball's velocity
- dot_product = ball_velocity[0] * normal_vector[0] + ball_velocity[1] * normal_vector[1]
- ball_velocity[0] -= 2 * dot_product * normal_vector[0]
- ball_velocity[1] -= 2 * dot_product * normal_vector[1]
- # Move the ball away from the wall to prevent sticking
- ball_pos[0] += normal_vector[0] * (BALL_RADIUS - distance)
- ball_pos[1] += normal_vector[1] * (BALL_RADIUS - distance)
- # Draw everything
- screen.fill(BLACK)
- pygame.draw.polygon(screen, WHITE, triangle_vertices, 1)
- pygame.draw.circle(screen, RED, (int(ball_pos[0]), int(ball_pos[1])), int(BALL_RADIUS))
- pygame.display.flip()
- pygame.quit()
- ``` [end of text]
- llama_perf_sampler_print: sampling time = 331.37 ms / 1585 runs ( 0.21 ms per token, 4783.22 tokens per second)
- llama_perf_context_print: load time = 872359.94 ms
- llama_perf_context_print: prompt eval time = 601031.08 ms / 570 tokens ( 1054.44 ms per token, 0.95 tokens per second)
- llama_perf_context_print: eval time = 12235806.10 ms / 1014 runs (12066.87 ms per token, 0.08 tokens per second)
- llama_perf_context_print: total time = 13109331.82 ms / 1584 tokens
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement