Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ##################### With Discrete GPU 6800M - gfx1031c
- D:\Apps\llama\llama-master-a113689-bin-win-clblast-x64\main.exe -m D:\Apps\llama\nous-hermes-llama2-13b.ggmlv3.q4_1.bin -ngl 50 -i --threads 8 --interactive-first -r "### Human:" --temp 0.7 -c 2048 --top_k 40 --top_p 0.1 --repeat_last_n 0 --repeat_penalty 1.1764705882352942 --instruct
- main: build = 929 (a113689)
- main: seed = 1690903767
- ggml_opencl: selecting platform: 'AMD Accelerated Parallel Processing'
- ggml_opencl: selecting device: 'gfx1031'
- ggml_opencl: device FP16 support: true
- llama.cpp: loading model from D:\Apps\llama\nous-hermes-llama2-13b.ggmlv3.q4_1.bin
- llama_model_load_internal: format = ggjt v3 (latest)
- llama_model_load_internal: n_vocab = 32032
- llama_model_load_internal: n_ctx = 2048
- llama_model_load_internal: n_embd = 5120
- llama_model_load_internal: n_mult = 256
- llama_model_load_internal: n_head = 40
- llama_model_load_internal: n_head_kv = 40
- llama_model_load_internal: n_layer = 40
- llama_model_load_internal: n_rot = 128
- llama_model_load_internal: n_gqa = 1
- llama_model_load_internal: rnorm_eps = 5.0e-06
- llama_model_load_internal: n_ff = 13824
- llama_model_load_internal: freq_base = 10000.0
- llama_model_load_internal: freq_scale = 1
- llama_model_load_internal: ftype = 3 (mostly Q4_1)
- llama_model_load_internal: model size = 13B
- llama_model_load_internal: ggml ctx size = 0.11 MB
- llama_model_load_internal: using OpenCL for GPU acceleration
- llama_model_load_internal: mem required = 195.62 MB (+ 1600.00 MB per state)
- llama_model_load_internal: offloading 40 repeating layers to GPU
- llama_model_load_internal: offloading non-repeating layers to GPU
- llama_model_load_internal: offloaded 41/41 layers to GPU
- llama_model_load_internal: total VRAM used: 7565 MB
- llama_new_context_with_model: kv self size = 1600.00 MB
- llama_new_context_with_model: compute buffer total size = 191.35 MB
- system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
- main: interactive mode on.
- Reverse prompt: '### Human:'
- Reverse prompt: '### Instruction:
- '
- sampling: repeat_last_n = 0, repeat_penalty = 1.176471, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.100000, typical_p = 1.000000, temp = 0.700000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
- generate: n_ctx = 2048, n_batch = 512, n_predict = -1, n_keep = 2
- == Running in interactive mode. ==
- - Press Ctrl+C to interject at any time.
- - Press Return to return control to LLaMa.
- - To return control without starting a new line, end your input with '/'.
- - If you want to submit another line, end your input with '\'.
- > can you write an R shiny app to generate a data-frame of 4 columns by 4 rows
- library(shiny)
- ui <- fluidPage(
- dataTableOutput("my_table")
- )
- server <- function(input, output) {
- output$my_table <- renderDataTable(data.frame(col1 = 1:4, col2 = 1:4, col3 = 1:4, col4 = 1:4))
- }
- shinyApp(ui = ui, server = server)
- >
- llama_print_timings: load time = 60188.90 ms
- llama_print_timings: sample time = 3.58 ms / 103 runs ( 0.03 ms per token, 28770.95 tokens per second)
- llama_print_timings: prompt eval time = 7133.18 ms / 43 tokens ( 165.89 ms per token, 6.03 tokens per second)
- llama_print_timings: eval time = 13003.63 ms / 102 runs ( 127.49 ms per token, 7.84 tokens per second)
- llama_print_timings: total time = 622870.10 ms
- ##################### END DISCRETE ######################
- ##################### With Integrated GPU Cezanne architecture - gfx1031c
- D:\Apps\llama\llama-master-a113689-bin-win-clblast-x64\main.exe -m D:\Apps\llama\nous-hermes-llama2-13b.ggmlv3.q4_1.bin -ngl 50 -i --threads 8 --interactive-first -r "### Human:" --temp 0.7 -c 2048 --top_k 40 --top_p 0.1 --repeat_last_n 0 --repeat_penalty 1.1764705882352942 --instruct
- main: build = 929 (a113689)
- main: seed = 1690905475
- ggml_opencl: selecting platform: 'AMD Accelerated Parallel Processing'
- ggml_opencl: selecting device: 'gfx90c'
- ggml_opencl: device FP16 support: true
- llama.cpp: loading model from D:\Apps\llama\gpt4all\nous-hermes-llama2-13b.ggmlv3.q4_1.bin
- llama_model_load_internal: format = ggjt v3 (latest)
- llama_model_load_internal: n_vocab = 32032
- llama_model_load_internal: n_ctx = 2048
- llama_model_load_internal: n_embd = 5120
- llama_model_load_internal: n_mult = 256
- llama_model_load_internal: n_head = 40
- llama_model_load_internal: n_head_kv = 40
- llama_model_load_internal: n_layer = 40
- llama_model_load_internal: n_rot = 128
- llama_model_load_internal: n_gqa = 1
- llama_model_load_internal: rnorm_eps = 5.0e-06
- llama_model_load_internal: n_ff = 13824
- llama_model_load_internal: freq_base = 10000.0
- llama_model_load_internal: freq_scale = 1
- llama_model_load_internal: ftype = 3 (mostly Q4_1)
- llama_model_load_internal: model size = 13B
- llama_model_load_internal: ggml ctx size = 0.11 MB
- llama_model_load_internal: using OpenCL for GPU acceleration
- llama_model_load_internal: mem required = 195.62 MB (+ 1600.00 MB per state)
- llama_model_load_internal: offloading 40 repeating layers to GPU
- llama_model_load_internal: offloading non-repeating layers to GPU
- llama_model_load_internal: offloaded 41/41 layers to GPU
- llama_model_load_internal: total VRAM used: 7565 MB
- llama_new_context_with_model: kv self size = 1600.00 MB
- llama_new_context_with_model: compute buffer total size = 191.35 MB
- system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
- main: interactive mode on.
- Reverse prompt: '### Human:'
- Reverse prompt: '### Instruction:
- '
- sampling: repeat_last_n = 0, repeat_penalty = 1.176471, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.100000, typical_p = 1.000000, temp = 0.700000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
- generate: n_ctx = 2048, n_batch = 512, n_predict = -1, n_keep = 2
- == Running in interactive mode. ==
- - Press Ctrl+C to interject at any time.
- - Press Return to return control to LLaMa.
- - To return control without starting a new line, end your input with '/'.
- - If you want to submit another line, end your input with '\'.
- > can you write an R shiny app to generate a data-frame of 4 columns by 4 rows
- library(shiny)
- ui <- fluidPage(
- dataTableOutput("my_table")
- )
- server <- function(input, output) {
- output$my_table <- renderDataTable(data.frame(col1 = 1:4, col2 = 1:4, col3 = 1:4, col4 = 1:4))
- }
- shinyApp(ui = ui, server = server)
- >
- llama_print_timings: load time = 26205.90 ms
- llama_print_timings: sample time = 6.34 ms / 103 runs ( 0.06 ms per token, 16235.81 tokens per second)
- llama_print_timings: prompt eval time = 29234.08 ms / 43 tokens ( 679.86 ms per token, 1.47 tokens per second)
- llama_print_timings: eval time = 118847.32 ms / 102 runs ( 1165.17 ms per token, 0.86 tokens per second)
- llama_print_timings: total time = 159929.10 ms
- ################### END INTEGRATED ##################
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement