Advertisement
fatboy93

LLM test on Discrete GPU 6800M vs Integrated

Aug 1st, 2023
429
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 7.34 KB | Software | 0 0
  1.  
  2. ##################### With Discrete GPU 6800M - gfx1031c
  3. D:\Apps\llama\llama-master-a113689-bin-win-clblast-x64\main.exe -m D:\Apps\llama\nous-hermes-llama2-13b.ggmlv3.q4_1.bin -ngl 50 -i --threads 8 --interactive-first -r "### Human:" --temp 0.7 -c 2048 --top_k 40 --top_p 0.1 --repeat_last_n 0 --repeat_penalty 1.1764705882352942 --instruct
  4. main: build = 929 (a113689)
  5. main: seed = 1690903767
  6. ggml_opencl: selecting platform: 'AMD Accelerated Parallel Processing'
  7. ggml_opencl: selecting device: 'gfx1031'
  8. ggml_opencl: device FP16 support: true
  9. llama.cpp: loading model from D:\Apps\llama\nous-hermes-llama2-13b.ggmlv3.q4_1.bin
  10. llama_model_load_internal: format = ggjt v3 (latest)
  11. llama_model_load_internal: n_vocab = 32032
  12. llama_model_load_internal: n_ctx = 2048
  13. llama_model_load_internal: n_embd = 5120
  14. llama_model_load_internal: n_mult = 256
  15. llama_model_load_internal: n_head = 40
  16. llama_model_load_internal: n_head_kv = 40
  17. llama_model_load_internal: n_layer = 40
  18. llama_model_load_internal: n_rot = 128
  19. llama_model_load_internal: n_gqa = 1
  20. llama_model_load_internal: rnorm_eps = 5.0e-06
  21. llama_model_load_internal: n_ff = 13824
  22. llama_model_load_internal: freq_base = 10000.0
  23. llama_model_load_internal: freq_scale = 1
  24. llama_model_load_internal: ftype = 3 (mostly Q4_1)
  25. llama_model_load_internal: model size = 13B
  26. llama_model_load_internal: ggml ctx size = 0.11 MB
  27. llama_model_load_internal: using OpenCL for GPU acceleration
  28. llama_model_load_internal: mem required = 195.62 MB (+ 1600.00 MB per state)
  29. llama_model_load_internal: offloading 40 repeating layers to GPU
  30. llama_model_load_internal: offloading non-repeating layers to GPU
  31. llama_model_load_internal: offloaded 41/41 layers to GPU
  32. llama_model_load_internal: total VRAM used: 7565 MB
  33. llama_new_context_with_model: kv self size = 1600.00 MB
  34. llama_new_context_with_model: compute buffer total size = 191.35 MB
  35.  
  36. system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
  37. main: interactive mode on.
  38. Reverse prompt: '### Human:'
  39. Reverse prompt: '### Instruction:
  40.  
  41. '
  42. sampling: repeat_last_n = 0, repeat_penalty = 1.176471, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.100000, typical_p = 1.000000, temp = 0.700000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
  43. generate: n_ctx = 2048, n_batch = 512, n_predict = -1, n_keep = 2
  44.  
  45.  
  46. == Running in interactive mode. ==
  47. - Press Ctrl+C to interject at any time.
  48. - Press Return to return control to LLaMa.
  49. - To return control without starting a new line, end your input with '/'.
  50. - If you want to submit another line, end your input with '\'.
  51.  
  52.  
  53. > can you write an R shiny app to generate a data-frame of 4 columns by 4 rows
  54. library(shiny)
  55.  
  56. ui <- fluidPage(
  57. dataTableOutput("my_table")
  58. )
  59.  
  60. server <- function(input, output) {
  61.  
  62. output$my_table <- renderDataTable(data.frame(col1 = 1:4, col2 = 1:4, col3 = 1:4, col4 = 1:4))
  63.  
  64. }
  65.  
  66. shinyApp(ui = ui, server = server)
  67.  
  68. >
  69.  
  70. llama_print_timings: load time = 60188.90 ms
  71. llama_print_timings: sample time = 3.58 ms / 103 runs ( 0.03 ms per token, 28770.95 tokens per second)
  72. llama_print_timings: prompt eval time = 7133.18 ms / 43 tokens ( 165.89 ms per token, 6.03 tokens per second)
  73. llama_print_timings: eval time = 13003.63 ms / 102 runs ( 127.49 ms per token, 7.84 tokens per second)
  74. llama_print_timings: total time = 622870.10 ms
  75.  
  76. ##################### END DISCRETE ######################
  77. ##################### With Integrated GPU Cezanne architecture - gfx1031c
  78.  
  79. D:\Apps\llama\llama-master-a113689-bin-win-clblast-x64\main.exe -m D:\Apps\llama\nous-hermes-llama2-13b.ggmlv3.q4_1.bin -ngl 50 -i --threads 8 --interactive-first -r "### Human:" --temp 0.7 -c 2048 --top_k 40 --top_p 0.1 --repeat_last_n 0 --repeat_penalty 1.1764705882352942 --instruct
  80. main: build = 929 (a113689)
  81. main: seed = 1690905475
  82. ggml_opencl: selecting platform: 'AMD Accelerated Parallel Processing'
  83. ggml_opencl: selecting device: 'gfx90c'
  84. ggml_opencl: device FP16 support: true
  85. llama.cpp: loading model from D:\Apps\llama\gpt4all\nous-hermes-llama2-13b.ggmlv3.q4_1.bin
  86. llama_model_load_internal: format = ggjt v3 (latest)
  87. llama_model_load_internal: n_vocab = 32032
  88. llama_model_load_internal: n_ctx = 2048
  89. llama_model_load_internal: n_embd = 5120
  90. llama_model_load_internal: n_mult = 256
  91. llama_model_load_internal: n_head = 40
  92. llama_model_load_internal: n_head_kv = 40
  93. llama_model_load_internal: n_layer = 40
  94. llama_model_load_internal: n_rot = 128
  95. llama_model_load_internal: n_gqa = 1
  96. llama_model_load_internal: rnorm_eps = 5.0e-06
  97. llama_model_load_internal: n_ff = 13824
  98. llama_model_load_internal: freq_base = 10000.0
  99. llama_model_load_internal: freq_scale = 1
  100. llama_model_load_internal: ftype = 3 (mostly Q4_1)
  101. llama_model_load_internal: model size = 13B
  102. llama_model_load_internal: ggml ctx size = 0.11 MB
  103. llama_model_load_internal: using OpenCL for GPU acceleration
  104. llama_model_load_internal: mem required = 195.62 MB (+ 1600.00 MB per state)
  105. llama_model_load_internal: offloading 40 repeating layers to GPU
  106. llama_model_load_internal: offloading non-repeating layers to GPU
  107. llama_model_load_internal: offloaded 41/41 layers to GPU
  108. llama_model_load_internal: total VRAM used: 7565 MB
  109. llama_new_context_with_model: kv self size = 1600.00 MB
  110. llama_new_context_with_model: compute buffer total size = 191.35 MB
  111.  
  112. system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
  113. main: interactive mode on.
  114. Reverse prompt: '### Human:'
  115. Reverse prompt: '### Instruction:
  116.  
  117. '
  118. sampling: repeat_last_n = 0, repeat_penalty = 1.176471, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.100000, typical_p = 1.000000, temp = 0.700000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
  119. generate: n_ctx = 2048, n_batch = 512, n_predict = -1, n_keep = 2
  120.  
  121.  
  122. == Running in interactive mode. ==
  123. - Press Ctrl+C to interject at any time.
  124. - Press Return to return control to LLaMa.
  125. - To return control without starting a new line, end your input with '/'.
  126. - If you want to submit another line, end your input with '\'.
  127.  
  128.  
  129. > can you write an R shiny app to generate a data-frame of 4 columns by 4 rows
  130. library(shiny)
  131.  
  132. ui <- fluidPage(
  133. dataTableOutput("my_table")
  134. )
  135.  
  136. server <- function(input, output) {
  137.  
  138. output$my_table <- renderDataTable(data.frame(col1 = 1:4, col2 = 1:4, col3 = 1:4, col4 = 1:4))
  139.  
  140. }
  141.  
  142. shinyApp(ui = ui, server = server)
  143.  
  144. >
  145.  
  146. llama_print_timings: load time = 26205.90 ms
  147. llama_print_timings: sample time = 6.34 ms / 103 runs ( 0.06 ms per token, 16235.81 tokens per second)
  148. llama_print_timings: prompt eval time = 29234.08 ms / 43 tokens ( 679.86 ms per token, 1.47 tokens per second)
  149. llama_print_timings: eval time = 118847.32 ms / 102 runs ( 1165.17 ms per token, 0.86 tokens per second)
  150. llama_print_timings: total time = 159929.10 ms
  151.  
  152. ################### END INTEGRATED ##################
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement