daily pastebin goal


a guest Apr 21st, 2013 648 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  2. CudaMiner release April 17th 2013 - alpha release
  3. -------------------------------------------------
  5. this is a CUDA accelerated mining application for litecoin only.
  6. The most computationally heavy parts of the scrypt algorithm (the
  7. Salsa 20/8 iterations) are run on the GPU.
  9. You should see a notable speed-up compared to OpenCL based miners.
  10. Some numbers from my testing:
  12. GTX 260:    44  kHash/sec  (OpenCL: 20)
  13. GTX 640:    39  kHash/sec
  14. GTX 460:   101  kHash/sec
  15. GTX 560Ti: 140  kHash/sec
  16. GTX 660Ti: 156  kHash/sec  (OpenCL: 60-70)
  18. Your nVidia cards will now suck a little less for mining! This tool
  19. will automatically use all nVidia GPUs found in your system, but the
  20. used device count can be limited to a lower number using the "-t"
  21. option, or even selected individually with the "-d" option
  23. This code is based on the pooler cpuminer 2.2.3 release and inherits
  24. its command line interface and options.
  26. Additional command line options are:
  28. --no-autotune    disables the built-in autotuning feature for
  29.                  maximizing CUDA kernel efficiency and uses some
  30.                  heuristical guesswork, which might not be optimal.
  32. --devices        [-d] gives a list of CUDA device IDs to operate on.
  33.                  Device IDs start counting from 0!
  35. --launch-config  [-l] specify the kernel launch configuration per device.
  36.                  This replaces autotune or heuristic selection.
  38. --interactive    [-i] list of flags (0 or 1) to enable interactive
  39.                  desktop performance on individual cards. Use this
  40.                  to remove lag at the cost of some hashing performance.
  41.                  Do not use large launch configs for devices that shall
  42.                  run in interactive mode - it's best to use autotune!
  44. --texture-cache  [-C] list of flags (0 or 1 or 2) to enable use of the
  45.                  texture cache for reading from the scrypt scratchpad.
  46.                  1 uses a 1D cache, whereas 2 uses a 2D texture layout.
  47.                  This is very experimental and may hurt performance
  48.                  on some cards.
  50. --single-memory  [-m] list of flags (0 or 1) to make the devices
  51.                  allocate their scrypt scratchpad in a single,
  52.                  consecutive memory block. On Windows Vista, 7/8
  53.                  this may lead to a smaller memory size being used.
  56. >>> Example command line options, advanced use <<<
  58. cudaminer.exe -d 0,1,2 -i 1,0,0 -l auto,S27x3,28x4 -C 0,2,1
  59. -o http://ltc.kattare.com:9332 -O myworker.1:mypass
  61. I tell cudaminer to use devices 0,1 and 2. Because I have the monitor
  62. attached to device 0, I set that device to run in interactive mode so
  63. it is fully responsive for desktop use while mining.
  65. Device 0 performs autotune for interactive mode because I explicitly
  66. set it to auto. Device 1 will use kernel launch configuration S27x3 and
  67. device 2 uses 28x4.
  69. I turn on the use of the texture cache to 2D for device 1, 1D for device
  70. 2 and off for the other devices.
  72. The given -o/-O settings mine on Burnside's pool, on which I happen to have
  73. an account.
  76. >>> Additional Notes <<<
  78. The HMAC SHA-256 parts of scrypt are still executed on the CPU, and so
  79. any BitCoin mining will NOT be GPU accelerated. This tool is for LTC.
  81. This does not support the Stratum protocol. To do stratum mining
  82. you have to run a local proxy.
  84. This code should be fine on nVidia GPUs ranging from compute
  85. capability 1.1 up to compute capability 3.5. The Geforce Titan has
  86. received experimental and untested support.
  88. To see what autotuning does, enable the debug option (-D) switch.
  89. You will get a table of kHash/s for a variety of launch configurations.
  90. You may only want to do this when running on a single GPU, otherwise
  91. the autotuning output of multiple cards will mix.
  94. >>> RELEASE HISTORY <<<
  96. - the April 17th release fixes the texture cache feature (yay!) but
  97.   the even Kepler cards currently see no real benefits yet (boo!).
  99.   Ctrl-C will now also interrupt the autotuning loop, and pressing
  100.   Ctrl-C a second time will always result in a hard exit.
  102.   The Titan kernel was refactored into a write-to-scratchpad phase and
  103.   into a read-from-scratchpad case using const __restrict__ pointers,
  104.   which makes the Titan automatically use the 48kb texture cache in each
  105.   SMX during the read phase. No need to use the -C flag with Titan.
  107.   CPU utilization seems lower than in previous releases, especially in
  108.   interactive mode. In fact I barely see cudaminer.exe consuming CPU
  109.   resources all ;)
  111. - the April 14th release lowers the CPU use dramatically. I also fixed the
  112.   Windows specific driver crash on CTRL-C problem. You still should not
  113.   click the close button on the DOS box, as this does not leave the
  114.   program enough time for cleanly shutting down.
  116. - the April 13th release turns the broken texture cache feature OFF by
  117.   default, as now also seems detrimental to performance. So what remains of
  118.   yesterday's update is just the interactive mode and the restored
  119.   Geforce Titan support.
  121.   I also added a validation of GPU results by the CPU.
  123. - the April 12th update boosts Kepler performance by 15-20% by enabling
  124.   the texture cache on these devices to do its scrypt scratchpad lookups.
  125.   You can also override the use of the texture cache from command line.
  127.   I also add an interactive mode for cards that drive monitors, so you
  128.   can be almost lag-free when using the desktop. It costs some performance
  129.   though. In interactive mode autotuning, smaller kernel launch configs
  130.   are selected. Try not to override this with huge launch configs, or the
  131.   effect of interactive mode would be negated.  
  133.   Put Titan support back to its original state. I suspect that a CUDA
  134.   compiler bug made the kernel crash when I applied the same optimizations
  135.   that work so nicely on Compute 1.0 trough 3.0 devices.
  137. - the April 10th update speeds up the CUDA kernels SIGNIFICANTLY by using
  138.   larger memory transactions (yay!!!)
  140. - the April 9th update fixes an autotune problem and adds Linux autotools
  141.   support.
  143. - the April 8th release adds CUDA kernel optimizations that may get up to
  144.   20% more kHash out of newer cards (Fermi generation and later...).
  146.   It also adds UNTESTED Geforce Titan support.
  148.   I also use Microsoft's parallel patterns library to split up the CPU
  149.   HMAC SHA256 workload over several CPU cores. This was a limiting factor
  150.   for some GPUs before.
  152. - the April 6th release adds an auto-tuning feature that determines the
  153.   best kernel launch configuration per GPU. It takes up to a few minutes
  154.   while the GPU's memory and host CPU may be pegged a bit. You can disable
  155.   this tuning with the --no-autotune switch
  157. - April 4th initial release.
  160. >>> About CUDA Kernels <<<
  162. CUDA kernels do the computation. Which one we select and in which
  163. configuration it is run greatly affects performance. CUDA kernel
  164. launch configurations are given as a character string, e.g. S27x3
  166.                        prefix blocks x warps
  168. Currently there is just one prefix, which is "S". Later releases may
  169. see the introduction of more kernel variants with using other letters.
  171. Examples:
  173. e.g. S27x3 is a launch configuration that works well on GTX 260
  174.       28x4 is a launch configuration that works on Geforce GTX 460
  175.      290x2 is a launch configuration that works on Geforce GTX 660Ti
  177. You should wait through autotune to see what kernel is found best for
  178. your current hardware configuration.
  180. The choice between Non-Titan and Titan CUDA kernels is automatically
  181. made based on your device's compute capability. Titans cost around
  182. a thousand dollars, so you probably don't have one.
  185. Prefix  | Non-Titan          | Titan
  186. -------------------------------------------------------
  187.  <none> | low shared memory  | default kernel
  188.         | optimized kernel   | with funnel shifter
  189.         |                    |
  190.    S    | special kernel     | spinlock kernel
  191.         | for older GPUs     | with funnel shifter
  194. >>> TODO <<<
  196. Usability Improvements:
  197. - add reasonable error checking for CUDA API calls
  198. - fix Linux (and Windows?) 64bit compilation
  199. - add Stratum support
  200. - add failover support
  202. Further Optimization:
  203. - consider use of some inline assembly in CUDA
  204. - investigate benefits of a LOOKUP_GAP implementation
  205. - feature parity on the Titan kernels (optimization, texture cache)
  208. ***************************************************************
  209. If you find this tool useful and like to support its continued
  210.         development, then consider a donation in LTC.
  212.   The donation address is LKS1WDKGED647msBQfLBHV3Ls8sveGncnm
  213. ***************************************************************
  215. Source code is included to satisfy GNU GPL V2 requirements.
  218. With kind regards,
  220.    Christian Buchner ( Christian.Buchner@gmail.com )
RAW Paste Data