Advertisement
Guest User

Untitled

a guest
Apr 25th, 2023
20
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 48.09 KB | None | 0 0
  1. [2023-04-25 19:42:15,874] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
  2. [2023-04-25 19:42:15,901] [INFO] [runner.py:550:main] cmd = /local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgM119 --master_addr=127.0.0.1 --master_port=29500 --module --enable_each_rank_log=None training.trainer --input-model EleutherAI/pythia-6.9b --deepspeed /Workspace/Repos/opyate@gmail.com/dolly/config/ds_z3_bf16_config.json --epochs 2 --local-output-dir /local_disk0/dolly_training/dolly__2023-04-25T19:42:07 --dbfs-output-dir /dbfs/dolly_training/dolly__2023-04-25T19:42:07 --per-device-train-batch-size 6 --per-device-eval-batch-size 6 --logging-steps 10 --save-steps 200 --save-total-limit 20 --eval-steps 50 --warmup-steps 50 --test-size 200 --lr 5e-6
  3. [2023-04-25 19:42:19,290] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3]}
  4. [2023-04-25 19:42:19,290] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=4, node_rank=0
  5. [2023-04-25 19:42:19,290] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3]})
  6. [2023-04-25 19:42:19,290] [INFO] [launch.py:162:main] dist_world_size=4
  7. [2023-04-25 19:42:19,290] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3
  8. 2023-04-25 19:42:21.666669: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
  9. To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
  10. 2023-04-25 19:42:21.667881: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
  11. To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
  12. 2023-04-25 19:42:21.669724: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
  13. To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
  14. 2023-04-25 19:42:21.672882: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
  15. To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
  16. 2023-04-25 19:42:29 INFO [__main__] Loading tokenizer for EleutherAI/pythia-6.9b
  17. 2023-04-25 19:42:29 INFO [__main__] Loading tokenizer for EleutherAI/pythia-6.9b
  18. 2023-04-25 19:42:29 INFO [__main__] Loading tokenizer for EleutherAI/pythia-6.9b
  19. 2023-04-25 19:42:29 INFO [__main__] Loading tokenizer for EleutherAI/pythia-6.9b
  20. Downloading (…)okenizer_config.json: 100%|██████| 396/396 [00:00<00:00, 109kB/s]
  21. Downloading (…)/main/tokenizer.json: 100%|█| 2.11M/2.11M [00:00<00:00, 29.3MB/s]
  22. Downloading (…)cial_tokens_map.json: 100%|███| 99.0/99.0 [00:00<00:00, 28.1kB/s]
  23. 2023-04-25 19:42:30 INFO [__main__] Loading model for EleutherAI/pythia-6.9b
  24. 2023-04-25 19:42:30 INFO [__main__] Loading model for EleutherAI/pythia-6.9b
  25. 2023-04-25 19:42:30 INFO [__main__] Loading model for EleutherAI/pythia-6.9b
  26. 2023-04-25 19:42:30 INFO [__main__] Loading model for EleutherAI/pythia-6.9b
  27. Downloading (…)lve/main/config.json: 100%|██████| 571/571 [00:00<00:00, 339kB/s]
  28. Downloading (…)model.bin.index.json: 100%|█| 42.0k/42.0k [00:00<00:00, 25.8MB/s]
  29. Downloading shards: 0%| | 0/2 [00:00<?, ?it/s]
  30. Downloading (…)00001-of-00002.bin";: 0%| | 0.00/9.91G [00:00<?, ?B/s]
  31. Downloading (…)00001-of-00002.bin";: 0%| | 10.5M/9.91G [00:00<09:52, 16.7MB/s]
  32. Downloading (…)00001-of-00002.bin";: 0%| | 21.0M/9.91G [00:00<05:59, 27.5MB/s]
  33. Downloading (…)00001-of-00002.bin";: 0%| | 31.5M/9.91G [00:00<04:18, 38.2MB/s]
  34. Downloading (…)00001-of-00002.bin";: 0%| | 41.9M/9.91G [00:01<03:54, 42.0MB/s]
  35. Downloading (…)00001-of-00002.bin";: 1%| | 52.4M/9.91G [00:01<03:39, 45.0MB/s]
  36. Downloading (…)00001-of-00002.bin";: 1%| | 62.9M/9.91G [00:01<03:13, 51.0MB/s]
  37. Downloading (…)00001-of-00002.bin";: 1%| | 73.4M/9.91G [00:01<03:13, 50.9MB/s]
  38. Downloading (…)00001-of-00002.bin";: 1%| | 83.9M/9.91G [00:01<03:14, 50.5MB/s]
  39. Downloading (…)00001-of-00002.bin";: 1%| | 94.4M/9.91G [00:02<02:56, 55.7MB/s]
  40. Downloading (…)00001-of-00002.bin";: 1%| | 105M/9.91G [00:02<03:13, 50.6MB/s]
  41. Downloading (…)00001-of-00002.bin";: 1%| | 115M/9.91G [00:02<03:22, 48.3MB/s]
  42. Downloading (…)00001-of-00002.bin";: 1%| | 126M/9.91G [00:02<03:20, 48.7MB/s]
  43. Downloading (…)00001-of-00002.bin";: 1%| | 136M/9.91G [00:03<03:52, 42.1MB/s]
  44. Downloading (…)00001-of-00002.bin";: 1%| | 147M/9.91G [00:03<03:41, 44.1MB/s]
  45. Downloading (…)00001-of-00002.bin";: 2%| | 157M/9.91G [00:03<03:30, 46.4MB/s]
  46. Downloading (…)00001-of-00002.bin";: 2%| | 168M/9.91G [00:03<03:11, 50.9MB/s]
  47. Downloading (…)00001-of-00002.bin";: 2%| | 178M/9.91G [00:03<03:12, 50.6MB/s]
  48. Downloading (…)00001-of-00002.bin";: 2%| | 189M/9.91G [00:04<02:56, 55.1MB/s]
  49. Downloading (…)00001-of-00002.bin";: 2%| | 199M/9.91G [00:04<03:00, 53.8MB/s]
  50. Downloading (…)00001-of-00002.bin";: 2%| | 210M/9.91G [00:04<03:04, 52.6MB/s]
  51. Downloading (…)00001-of-00002.bin";: 2%| | 220M/9.91G [00:04<02:49, 57.1MB/s]
  52. Downloading (…)00001-of-00002.bin";: 2%| | 231M/9.91G [00:04<02:57, 54.7MB/s]
  53. Downloading (…)00001-of-00002.bin";: 2%| | 241M/9.91G [00:05<03:00, 53.6MB/s]
  54. Downloading (…)00001-of-00002.bin";: 3%| | 252M/9.91G [00:05<02:50, 56.8MB/s]
  55. Downloading (…)00001-of-00002.bin";: 3%| | 262M/9.91G [00:05<02:53, 55.7MB/s]
  56. Downloading (…)00001-of-00002.bin";: 3%| | 273M/9.91G [00:05<02:42, 59.2MB/s]
  57. Downloading (…)00001-of-00002.bin";: 3%| | 283M/9.91G [00:05<02:52, 55.9MB/s]
  58. Downloading (…)00001-of-00002.bin";: 3%| | 294M/9.91G [00:05<02:57, 54.2MB/s]
  59. Downloading (…)00001-of-00002.bin";: 3%| | 304M/9.91G [00:06<02:44, 58.3MB/s]
  60. Downloading (…)00001-of-00002.bin";: 3%| | 315M/9.91G [00:06<02:52, 55.5MB/s]
  61. Downloading (…)00001-of-00002.bin";: 3%| | 325M/9.91G [00:06<02:41, 59.3MB/s]
  62. Downloading (…)00001-of-00002.bin";: 3%| | 336M/9.91G [00:06<02:52, 55.4MB/s]
  63. Downloading (…)00001-of-00002.bin";: 3%| | 346M/9.91G [00:06<02:56, 54.3MB/s]
  64. Downloading (…)00001-of-00002.bin";: 4%| | 357M/9.91G [00:07<02:44, 58.2MB/s]
  65. Downloading (…)00001-of-00002.bin";: 4%| | 367M/9.91G [00:07<03:11, 49.7MB/s]
  66. Downloading (…)00001-of-00002.bin";: 4%| | 377M/9.91G [00:07<03:19, 47.7MB/s]
  67. Downloading (…)00001-of-00002.bin";: 4%| | 388M/9.91G [00:07<03:01, 52.5MB/s]
  68. Downloading (…)00001-of-00002.bin";: 4%| | 398M/9.91G [00:07<03:12, 49.5MB/s]
  69. Downloading (…)00001-of-00002.bin";: 4%| | 409M/9.91G [00:08<03:08, 50.5MB/s]
  70. Downloading (…)00001-of-00002.bin";: 4%| | 419M/9.91G [00:08<03:09, 50.0MB/s]
  71. Downloading (…)00001-of-00002.bin";: 4%| | 430M/9.91G [00:08<02:54, 54.2MB/s]
  72. Downloading (…)00001-of-00002.bin";: 4%| | 440M/9.91G [00:08<02:59, 52.8MB/s]
  73. Downloading (…)00001-of-00002.bin";: 5%| | 451M/9.91G [00:08<02:45, 57.3MB/s]
  74. Downloading (…)00001-of-00002.bin";: 5%| | 461M/9.91G [00:09<02:52, 54.9MB/s]
  75. Downloading (…)00001-of-00002.bin";: 5%| | 472M/9.91G [00:09<02:55, 53.8MB/s]
  76. Downloading (…)00001-of-00002.bin";: 5%| | 482M/9.91G [00:09<03:00, 52.3MB/s]
  77. Downloading (…)00001-of-00002.bin";: 5%| | 493M/9.91G [00:09<03:01, 51.9MB/s]
  78. Downloading (…)00001-of-00002.bin";: 5%| | 503M/9.91G [00:09<02:48, 55.7MB/s]
  79. Downloading (…)00001-of-00002.bin";: 5%| | 514M/9.91G [00:10<02:53, 54.2MB/s]
  80. Downloading (…)00001-of-00002.bin";: 5%| | 524M/9.91G [00:10<03:15, 48.0MB/s]
  81. Downloading (…)00001-of-00002.bin";: 5%| | 535M/9.91G [00:10<03:12, 48.8MB/s]
  82. Downloading (…)00001-of-00002.bin";: 6%| | 545M/9.91G [00:10<02:54, 53.6MB/s]
  83. Downloading (…)00001-of-00002.bin";: 6%| | 556M/9.91G [00:10<03:07, 50.0MB/s]
  84. Downloading (…)00001-of-00002.bin";: 6%| | 566M/9.91G [00:11<03:49, 40.8MB/s]
  85. Downloading (…)00001-of-00002.bin";: 6%| | 577M/9.91G [00:11<03:30, 44.3MB/s]
  86. Downloading (…)00001-of-00002.bin";: 6%| | 587M/9.91G [00:11<03:48, 40.7MB/s]
  87. Downloading (…)00001-of-00002.bin";: 6%| | 598M/9.91G [00:12<03:44, 41.4MB/s]
  88. Downloading (…)00001-of-00002.bin";: 6%| | 608M/9.91G [00:12<03:40, 42.2MB/s]
  89. Downloading (…)00001-of-00002.bin";: 6%| | 619M/9.91G [00:12<03:30, 44.1MB/s]
  90. Downloading (…)00001-of-00002.bin";: 6%|▏ | 629M/9.91G [00:12<03:29, 44.2MB/s]
  91. Downloading (…)00001-of-00002.bin";: 6%|▏ | 640M/9.91G [00:12<03:29, 44.3MB/s]
  92. Downloading (…)00001-of-00002.bin";: 7%|▏ | 650M/9.91G [00:13<03:21, 45.9MB/s]
  93. Downloading (…)00001-of-00002.bin";: 7%|▏ | 661M/9.91G [00:13<03:25, 45.0MB/s]
  94. Downloading (…)00001-of-00002.bin";: 7%|▏ | 671M/9.91G [00:13<03:20, 46.1MB/s]
  95. Downloading (…)00001-of-00002.bin";: 7%|▏ | 682M/9.91G [00:13<03:16, 47.1MB/s]
  96. Downloading (…)00001-of-00002.bin";: 7%|▏ | 692M/9.91G [00:14<03:15, 47.1MB/s]
  97. Downloading (…)00001-of-00002.bin";: 7%|▏ | 703M/9.91G [00:14<03:08, 48.8MB/s]
  98. Downloading (…)00001-of-00002.bin";: 7%|▏ | 713M/9.91G [00:14<03:07, 49.1MB/s]
  99. Downloading (…)00001-of-00002.bin";: 7%|▏ | 724M/9.91G [00:14<03:04, 49.7MB/s]
  100. Downloading (…)00001-of-00002.bin";: 7%|▏ | 734M/9.91G [00:14<03:05, 49.5MB/s]
  101. Downloading (…)00001-of-00002.bin";: 8%|▏ | 744M/9.91G [00:15<03:03, 49.9MB/s]
  102. Downloading (…)00001-of-00002.bin";: 8%|▏ | 755M/9.91G [00:15<03:03, 49.8MB/s]
  103. Downloading (…)00001-of-00002.bin";: 8%|▏ | 765M/9.91G [00:15<03:30, 43.5MB/s]
  104. Downloading (…)00001-of-00002.bin";: 8%|▏ | 776M/9.91G [00:15<03:22, 45.1MB/s]
  105. Downloading (…)00001-of-00002.bin";: 8%|▏ | 786M/9.91G [00:16<03:48, 39.9MB/s]
  106. Downloading (…)00001-of-00002.bin";: 8%|▏ | 797M/9.91G [00:16<03:35, 42.2MB/s]
  107. Downloading (…)00001-of-00002.bin";: 8%|▏ | 807M/9.91G [00:16<03:09, 47.9MB/s]
  108. Downloading (…)00001-of-00002.bin";: 8%|▏ | 818M/9.91G [00:16<03:10, 47.8MB/s]
  109. Downloading (…)00001-of-00002.bin";: 8%|▏ | 828M/9.91G [00:16<03:06, 48.6MB/s]
  110. Downloading (…)00001-of-00002.bin";: 8%|▏ | 839M/9.91G [00:17<03:05, 48.9MB/s]
  111. Downloading (…)00001-of-00002.bin";: 9%|▏ | 849M/9.91G [00:17<02:50, 53.0MB/s]
  112. Downloading (…)00001-of-00002.bin";: 9%|▏ | 860M/9.91G [00:17<02:51, 52.8MB/s]
  113. Downloading (…)00001-of-00002.bin";: 9%|▏ | 870M/9.91G [00:17<02:52, 52.3MB/s]
  114. Downloading (…)00001-of-00002.bin";: 9%|▏ | 881M/9.91G [00:17<02:41, 56.0MB/s]
  115. Downloading (…)00001-of-00002.bin";: 9%|▏ | 891M/9.91G [00:18<03:40, 41.0MB/s]
  116. Downloading (…)00001-of-00002.bin";: 9%|▏ | 902M/9.91G [00:18<03:15, 46.2MB/s]
  117. Downloading (…)00001-of-00002.bin";: 9%|▏ | 912M/9.91G [00:18<03:05, 48.4MB/s]
  118. Downloading (…)00001-of-00002.bin";: 9%|▏ | 923M/9.91G [00:19<03:54, 38.4MB/s]
  119. Downloading (…)00001-of-00002.bin";: 9%|▏ | 933M/9.91G [00:19<03:36, 41.5MB/s]
  120. Downloading (…)00001-of-00002.bin";: 10%|▏ | 944M/9.91G [00:19<03:24, 43.9MB/s]
  121. Downloading (…)00001-of-00002.bin";: 10%|▏ | 954M/9.91G [00:19<03:02, 49.2MB/s]
  122. Downloading (…)00001-of-00002.bin";: 10%|▏ | 965M/9.91G [00:19<03:00, 49.6MB/s]
  123. Downloading (…)00001-of-00002.bin";: 10%|▏ | 975M/9.91G [00:20<03:01, 49.2MB/s]
  124. Downloading (…)00001-of-00002.bin";: 10%|▏ | 986M/9.91G [00:20<02:43, 54.7MB/s]
  125. Downloading (…)00001-of-00002.bin";: 10%|▏ | 996M/9.91G [00:20<02:47, 53.4MB/s]
  126. Downloading (…)00001-of-00002.bin";: 10%| | 1.01G/9.91G [00:20<02:50, 52.4MB/s]
  127. Downloading (…)00001-of-00002.bin";: 10%| | 1.02G/9.91G [00:20<02:51, 51.8MB/s]
  128. Downloading (…)00001-of-00002.bin";: 10%| | 1.03G/9.91G [00:21<02:38, 56.0MB/s]
  129. Downloading (…)00001-of-00002.bin";: 10%| | 1.04G/9.91G [00:21<02:44, 54.0MB/s]
  130. Downloading (…)00001-of-00002.bin";: 11%| | 1.05G/9.91G [00:21<02:48, 52.6MB/s]
  131. Downloading (…)00001-of-00002.bin";: 11%| | 1.06G/9.91G [00:21<02:37, 56.2MB/s]
  132. Downloading (…)00001-of-00002.bin";: 11%| | 1.07G/9.91G [00:21<02:41, 54.8MB/s]
  133. Downloading (…)00001-of-00002.bin";: 11%| | 1.08G/9.91G [00:21<02:44, 53.8MB/s]
  134. Downloading (…)00001-of-00002.bin";: 11%| | 1.09G/9.91G [00:22<03:05, 47.4MB/s]
  135. Downloading (…)00001-of-00002.bin";: 11%| | 1.10G/9.91G [00:22<03:25, 43.0MB/s]
  136. Downloading (…)00001-of-00002.bin";: 11%| | 1.11G/9.91G [00:22<03:02, 48.3MB/s]
  137. Downloading (…)00001-of-00002.bin";: 11%| | 1.12G/9.91G [00:22<02:59, 48.9MB/s]
  138. Downloading (…)00001-of-00002.bin";: 11%| | 1.13G/9.91G [00:23<04:14, 34.5MB/s]
  139. Downloading (…)00001-of-00002.bin";: 12%| | 1.14G/9.91G [00:23<03:39, 40.0MB/s]
  140. Downloading (…)00001-of-00002.bin";: 12%| | 1.15G/9.91G [00:23<03:19, 44.0MB/s]
  141. Downloading (…)00001-of-00002.bin";: 12%| | 1.16G/9.91G [00:23<03:02, 47.8MB/s]
  142. Downloading (…)00001-of-00002.bin";: 12%| | 1.17G/9.91G [00:24<03:01, 48.2MB/s]
  143. Downloading (…)00001-of-00002.bin";: 12%| | 1.18G/9.91G [00:24<02:44, 53.2MB/s]
  144. Downloading (…)00001-of-00002.bin";: 12%| | 1.20G/9.91G [00:24<02:47, 52.0MB/s]
  145. Downloading (…)00001-of-00002.bin";: 12%| | 1.21G/9.91G [00:24<02:48, 51.8MB/s]
  146. Downloading (…)00001-of-00002.bin";: 12%| | 1.22G/9.91G [00:24<02:36, 55.6MB/s]
  147. Downloading (…)00001-of-00002.bin";: 12%| | 1.23G/9.91G [00:25<02:39, 54.3MB/s]
  148. Downloading (…)00001-of-00002.bin";: 12%| | 1.24G/9.91G [00:25<02:44, 52.7MB/s]
  149. Downloading (…)00001-of-00002.bin";: 13%|▏| 1.25G/9.91G [00:25<02:45, 52.3MB/s]
  150. Downloading (…)00001-of-00002.bin";: 13%|▏| 1.26G/9.91G [00:25<03:21, 43.0MB/s]
  151. Downloading (…)00001-of-00002.bin";: 13%|▏| 1.27G/9.91G [00:26<02:58, 48.3MB/s]
  152. Downloading (…)00001-of-00002.bin";: 13%|▏| 1.28G/9.91G [00:26<02:55, 49.2MB/s]
  153. Downloading (…)00001-of-00002.bin";: 13%|▏| 1.29G/9.91G [00:26<02:55, 49.0MB/s]
  154. Downloading (…)00001-of-00002.bin";: 13%|▏| 1.30G/9.91G [00:26<03:47, 37.9MB/s]
  155. Downloading (…)00001-of-00002.bin";: 13%|▏| 1.31G/9.91G [00:27<03:14, 44.3MB/s]
  156. Downloading (…)00001-of-00002.bin";: 13%|▏| 1.32G/9.91G [00:27<03:08, 45.5MB/s]
  157. Downloading (…)00001-of-00002.bin";: 13%|▏| 1.33G/9.91G [00:27<03:01, 47.2MB/s]
  158. Downloading (…)00001-of-00002.bin";: 14%|▏| 1.34G/9.91G [00:27<02:51, 49.9MB/s]
  159. Downloading (…)00001-of-00002.bin";: 14%|▏| 1.35G/9.91G [00:27<02:51, 50.0MB/s]
  160. Downloading (…)00001-of-00002.bin";: 14%|▏| 1.36G/9.91G [00:28<02:50, 50.1MB/s]
  161. Downloading (…)00001-of-00002.bin";: 14%|▏| 1.37G/9.91G [00:28<02:50, 50.1MB/s]
  162. Downloading (…)00001-of-00002.bin";: 14%|▏| 1.38G/9.91G [00:28<02:36, 54.4MB/s]
  163. Downloading (…)00001-of-00002.bin";: 14%|▏| 1.39G/9.91G [00:28<03:13, 44.1MB/s]
  164. Downloading (…)00001-of-00002.bin";: 14%|▏| 1.41G/9.91G [00:28<03:03, 46.2MB/s]
  165. Downloading (…)00001-of-00002.bin";: 14%|▏| 1.42G/9.91G [00:29<03:01, 46.8MB/s]
  166. Downloading (…)00001-of-00002.bin";: 14%|▏| 1.43G/9.91G [00:29<03:09, 44.7MB/s]
  167. Downloading (…)00001-of-00002.bin";: 14%|▏| 1.44G/9.91G [00:29<02:58, 47.5MB/s]
  168. Downloading (…)00001-of-00002.bin";: 15%|▏| 1.45G/9.91G [00:29<02:45, 51.0MB/s]
  169. Downloading (…)00001-of-00002.bin";: 15%|▏| 1.46G/9.91G [00:29<02:45, 51.1MB/s]
  170. Downloading (…)00001-of-00002.bin";: 15%|▏| 1.47G/9.91G [00:30<02:46, 50.7MB/s]
  171. Downloading (…)00001-of-00002.bin";: 15%|▏| 1.48G/9.91G [00:30<02:32, 55.5MB/s]
  172. Downloading (…)00001-of-00002.bin";: 15%|▏| 1.49G/9.91G [00:30<02:37, 53.6MB/s]
  173. Downloading (…)00001-of-00002.bin";: 15%|▏| 1.50G/9.91G [00:30<02:39, 52.7MB/s]
  174. Downloading (…)00001-of-00002.bin";: 15%|▏| 1.51G/9.91G [00:30<02:42, 51.7MB/s]
  175. Downloading (…)00001-of-00002.bin";: 15%|▏| 1.52G/9.91G [00:31<02:29, 56.1MB/s]
  176. Downloading (…)00001-of-00002.bin";: 15%|▏| 1.53G/9.91G [00:31<02:35, 54.0MB/s]
  177. Downloading (…)00001-of-00002.bin";: 16%|▏| 1.54G/9.91G [00:31<02:25, 57.7MB/s]
  178. Downloading (…)00001-of-00002.bin";: 16%|▏| 1.55G/9.91G [00:31<02:31, 55.2MB/s]
  179. Downloading (…)00001-of-00002.bin";: 16%|▏| 1.56G/9.91G [00:31<02:34, 54.1MB/s]
  180. Downloading (…)00001-of-00002.bin";: 16%|▏| 1.57G/9.91G [00:32<02:23, 58.1MB/s]
  181. Downloading (…)00001-of-00002.bin";: 16%|▏| 1.58G/9.91G [00:32<02:29, 55.6MB/s]
  182. Downloading (…)00001-of-00002.bin";: 16%|▏| 1.59G/9.91G [00:32<02:35, 53.6MB/s]
  183. Downloading (…)00001-of-00002.bin";: 16%|▏| 1.60G/9.91G [00:32<02:23, 58.0MB/s]
  184. Downloading (…)00001-of-00002.bin";: 16%|▏| 1.61G/9.91G [00:32<02:29, 55.6MB/s]
  185. Downloading (…)00001-of-00002.bin";: 16%|▏| 1.63G/9.91G [00:33<02:34, 53.5MB/s]
  186. Downloading (…)00001-of-00002.bin";: 17%|▏| 1.64G/9.91G [00:33<02:38, 52.2MB/s]
  187. Downloading (…)00001-of-00002.bin";: 17%|▏| 1.65G/9.91G [00:33<02:26, 56.3MB/s]
  188. Downloading (…)00001-of-00002.bin";: 17%|▏| 1.66G/9.91G [00:33<02:31, 54.3MB/s]
  189. Downloading (…)00001-of-00002.bin";: 17%|▏| 1.67G/9.91G [00:33<02:35, 52.9MB/s]
  190. Downloading (…)00001-of-00002.bin";: 17%|▏| 1.68G/9.91G [00:33<02:24, 56.9MB/s]
  191. Downloading (…)00001-of-00002.bin";: 17%|▏| 1.69G/9.91G [00:34<02:30, 54.7MB/s]
  192. Downloading (…)00001-of-00002.bin";: 17%|▏| 1.70G/9.91G [00:34<02:34, 53.2MB/s]
  193. Downloading (…)00001-of-00002.bin";: 17%|▏| 1.71G/9.91G [00:34<02:32, 53.8MB/s]
  194. Downloading (…)00001-of-00002.bin";: 17%|▏| 1.72G/9.91G [00:34<02:26, 56.0MB/s]
  195. Downloading (…)00001-of-00002.bin";: 17%|▏| 1.73G/9.91G [00:34<02:34, 53.1MB/s]
  196. Downloading (…)00001-of-00002.bin";: 18%|▏| 1.74G/9.91G [00:35<02:37, 52.0MB/s]
  197. Downloading (…)00001-of-00002.bin";: 18%|▏| 1.75G/9.91G [00:35<02:24, 56.3MB/s]
  198. Downloading (…)00001-of-00002.bin";: 18%|▏| 1.76G/9.91G [00:35<02:56, 46.3MB/s]
  199. Downloading (…)00001-of-00002.bin";: 18%|▏| 1.77G/9.91G [00:35<02:50, 47.9MB/s]
  200. Downloading (…)00001-of-00002.bin";: 18%|▏| 1.78G/9.91G [00:36<02:47, 48.5MB/s]
  201. Downloading (…)00001-of-00002.bin";: 18%|▏| 1.79G/9.91G [00:36<02:35, 52.2MB/s]
  202. Downloading (…)00001-of-00002.bin";: 18%|▏| 1.80G/9.91G [00:36<02:33, 53.0MB/s]
  203. Downloading (…)00001-of-00002.bin";: 18%|▏| 1.81G/9.91G [00:37<04:56, 27.3MB/s]
  204. Downloading (…)00001-of-00002.bin";: 18%|▏| 1.82G/9.91G [00:37<04:02, 33.4MB/s]
  205. Downloading (…)00001-of-00002.bin";: 19%|▏| 1.84G/9.91G [00:37<03:41, 36.5MB/s]
  206. Downloading (…)00001-of-00002.bin";: 19%|▏| 1.85G/9.91G [00:37<03:24, 39.5MB/s]
  207. Downloading (…)00001-of-00002.bin";: 19%|▏| 1.86G/9.91G [00:38<03:09, 42.4MB/s]
  208. Downloading (…)00001-of-00002.bin";: 19%|▏| 1.87G/9.91G [00:38<03:24, 39.4MB/s]
  209. Downloading (…)00001-of-00002.bin";: 19%|▏| 1.88G/9.91G [00:38<03:09, 42.3MB/s]
  210. Downloading (…)00001-of-00002.bin";: 19%|▏| 1.89G/9.91G [00:38<03:01, 44.3MB/s]
  211. Downloading (…)00001-of-00002.bin";: 19%|▏| 1.90G/9.91G [00:38<02:40, 49.8MB/s]
  212. Downloading (…)00001-of-00002.bin";: 19%|▏| 1.91G/9.91G [00:39<02:40, 49.9MB/s]
  213. Downloading (…)00001-of-00002.bin";: 19%|▏| 1.92G/9.91G [00:39<02:26, 54.6MB/s]
  214. Downloading (…)00001-of-00002.bin";: 19%|▏| 1.93G/9.91G [00:39<02:29, 53.4MB/s]
  215. Downloading (…)00001-of-00002.bin";: 20%|▏| 1.94G/9.91G [00:39<02:31, 52.8MB/s]
  216. Downloading (…)00001-of-00002.bin";: 20%|▏| 1.95G/9.91G [00:39<02:32, 52.1MB/s]
  217. Downloading (…)00001-of-00002.bin";: 20%|▏| 1.96G/9.91G [00:40<02:21, 56.3MB/s]
  218. Downloading (…)00001-of-00002.bin";: 20%|▏| 1.97G/9.91G [00:40<03:15, 40.7MB/s]
  219. Downloading (…)00001-of-00002.bin";: 20%|▏| 1.98G/9.91G [00:40<03:01, 43.8MB/s]
  220. Downloading (…)00001-of-00002.bin";: 20%|▏| 1.99G/9.91G [00:40<03:15, 40.5MB/s]
  221. Downloading (…)00001-of-00002.bin";: 20%|▏| 2.00G/9.91G [00:41<03:03, 43.0MB/s]
  222. Downloading (…)00001-of-00002.bin";: 20%|▏| 2.01G/9.91G [00:41<02:55, 45.0MB/s]
  223. Downloading (…)00001-of-00002.bin";: 20%|▏| 2.02G/9.91G [00:41<03:13, 40.8MB/s]
  224. Downloading (…)00001-of-00002.bin";: 21%|▏| 2.03G/9.91G [00:41<02:50, 46.3MB/s]
  225. Downloading (…)00001-of-00002.bin";: 21%|▏| 2.04G/9.91G [00:42<03:14, 40.5MB/s]
  226. Downloading (…)00001-of-00002.bin";: 21%|▏| 2.06G/9.91G [00:42<03:03, 42.8MB/s]
  227. Downloading (…)00001-of-00002.bin";: 21%|▏| 2.07G/9.91G [00:42<02:42, 48.2MB/s]
  228. Downloading (…)00001-of-00002.bin";: 21%|▏| 2.08G/9.91G [00:42<03:08, 41.5MB/s]
  229. Downloading (…)00001-of-00002.bin";: 21%|▏| 2.09G/9.91G [00:43<02:49, 46.1MB/s]
  230. Downloading (…)00001-of-00002.bin";: 21%|▏| 2.10G/9.91G [00:43<02:45, 47.3MB/s]
  231. Downloading (…)00001-of-00002.bin";: 21%|▏| 2.11G/9.91G [00:43<02:28, 52.5MB/s]
  232. Downloading (…)00001-of-00002.bin";: 21%|▏| 2.12G/9.91G [00:43<02:32, 51.0MB/s]
  233. Downloading (…)00001-of-00002.bin";: 21%|▏| 2.13G/9.91G [00:43<02:36, 49.8MB/s]
  234. Downloading (…)00001-of-00002.bin";: 22%|▏| 2.14G/9.91G [00:44<02:35, 49.9MB/s]
  235. Downloading (…)00001-of-00002.bin";: 22%|▏| 2.15G/9.91G [00:44<02:34, 50.3MB/s]
  236. Downloading (…)00001-of-00002.bin";: 22%|▏| 2.16G/9.91G [00:44<02:26, 52.9MB/s]
  237. Downloading (…)00001-of-00002.bin";: 22%|▏| 2.17G/9.91G [00:44<02:23, 54.1MB/s]
  238. Downloading (…)00001-of-00002.bin";: 22%|▏| 2.18G/9.91G [00:45<03:39, 35.3MB/s]
  239. Downloading (…)00001-of-00002.bin";: 22%|▏| 2.19G/9.91G [00:45<03:06, 41.4MB/s]
  240. Downloading (…)00001-of-00002.bin";: 22%|▏| 2.20G/9.91G [00:45<03:52, 33.2MB/s]
  241. Downloading (…)00001-of-00002.bin";: 22%|▏| 2.21G/9.91G [00:45<03:16, 39.2MB/s]
  242. Downloading (…)00001-of-00002.bin";: 22%|▏| 2.22G/9.91G [00:46<03:02, 42.2MB/s]
  243. Downloading (…)00001-of-00002.bin";: 23%|▏| 2.23G/9.91G [00:46<03:05, 41.5MB/s]
  244. Downloading (…)00001-of-00002.bin";: 23%|▏| 2.24G/9.91G [00:46<02:51, 44.6MB/s]
  245. Downloading (…)00001-of-00002.bin";: 23%|▏| 2.25G/9.91G [00:46<02:45, 46.3MB/s]
  246. Downloading (…)00001-of-00002.bin";: 23%|▏| 2.26G/9.91G [00:46<02:33, 49.8MB/s]
  247. Downloading (…)00001-of-00002.bin";: 23%|▏| 2.28G/9.91G [00:47<02:27, 51.9MB/s]
  248. Downloading (…)00001-of-00002.bin";: 23%|▏| 2.29G/9.91G [00:47<02:33, 49.6MB/s]
  249. Downloading (…)00001-of-00002.bin";: 23%|▏| 2.30G/9.91G [00:47<02:19, 54.8MB/s]
  250. Downloading (…)00001-of-00002.bin";: 23%|▏| 2.31G/9.91G [00:47<02:45, 46.0MB/s]
  251. Downloading (…)00001-of-00002.bin";: 23%|▏| 2.32G/9.91G [00:48<02:41, 47.1MB/s]
  252. Downloading (…)00001-of-00002.bin";: 23%|▏| 2.33G/9.91G [00:48<02:38, 48.0MB/s]
  253. Downloading (…)00001-of-00002.bin";: 24%|▏| 2.34G/9.91G [00:48<02:24, 52.4MB/s]
  254. Downloading (…)00001-of-00002.bin";: 24%|▏| 2.35G/9.91G [00:48<02:24, 52.2MB/s]
  255. Downloading (…)00001-of-00002.bin";: 24%|▏| 2.36G/9.91G [00:48<03:04, 41.0MB/s]
  256. Downloading (…)00001-of-00002.bin";: 24%|▏| 2.37G/9.91G [00:49<02:54, 43.2MB/s]
  257. Downloading (…)00001-of-00002.bin";: 24%|▏| 2.38G/9.91G [00:49<02:33, 48.9MB/s]
  258. Downloading (…)00001-of-00002.bin";: 24%|▏| 2.39G/9.91G [00:49<02:32, 49.2MB/s]
  259. Downloading (…)00001-of-00002.bin";: 24%|▏| 2.40G/9.91G [00:49<02:31, 49.6MB/s]
  260. Downloading (…)00001-of-00002.bin";: 24%|▏| 2.41G/9.91G [00:49<02:31, 49.5MB/s]
  261. Downloading (…)00001-of-00002.bin";: 24%|▏| 2.42G/9.91G [00:50<02:18, 54.0MB/s]
  262. Downloading (…)00001-of-00002.bin";: 25%|▏| 2.43G/9.91G [00:50<02:21, 52.8MB/s]
  263. Downloading (…)00001-of-00002.bin";: 25%|▏| 2.44G/9.91G [00:50<02:28, 50.3MB/s]
  264. Downloading (…)00001-of-00002.bin";: 25%|▏| 2.45G/9.91G [00:50<02:28, 50.3MB/s]
  265. Downloading (…)00001-of-00002.bin";: 25%|▏| 2.46G/9.91G [00:50<02:28, 50.3MB/s]
  266. Downloading (…)00001-of-00002.bin";: 25%|▏| 2.47G/9.91G [00:51<02:23, 51.9MB/s]
  267. Downloading (…)00001-of-00002
  268.  
  269. *** WARNING: max output size exceeded, skipping output. ***
  270.  
  271. trainer.py", line 1731, in _inner_training_loop
  272. deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
  273. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
  274. deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
  275. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
  276. engine = DeepSpeedEngine(args=args,
  277. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
  278. self._configure_optimizer(optimizer, model_parameters)
  279. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1298, in _configure_optimizer
  280. self.optimizer = self._configure_zero_optimizer(basic_optimizer)
  281. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1599, in _configure_zero_optimizer
  282. optimizer = DeepSpeedZeroOptimizer_Stage3(
  283. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 312, in __init__
  284. self._setup_for_real_optimizer()
  285. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 371, in _setup_for_real_optimizer
  286. self.initialize_optimizer_states()
  287. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 938, in initialize_optimizer_states
  288. self._optimizer_step(i)
  289. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 858, in _optimizer_step
  290. self.optimizer.step()
  291. File "/databricks/python/lib/python3.9/site-packages/torch/optim/optimizer.py", line 140, in wrapper
  292. out = func(*args, **kwargs)
  293. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/ops/adam/fused_adam.py", line 135, in step
  294. state['exp_avg'] = torch.zeros_like(p.data)
  295. torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.63 GiB (GPU 2; 22.20 GiB total capacity; 20.97 GiB already allocated; 282.12 MiB free; 20.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
  296. Traceback (most recent call last):
  297. File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
  298. return _run_code(code, main_globals, None,
  299. File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
  300. exec(code, run_globals)
  301. File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 326, in <module>
  302. return _run_code(code, main_globals, None,
  303. File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
  304. exec(code, run_globals)
  305. File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 326, in <module>
  306. 2023-04-25 19:49:31 ERROR [__main__] main failed
  307. Traceback (most recent call last):
  308. File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 326, in <module>
  309. main()
  310. File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
  311. return self.main(*args, **kwargs)
  312. File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1053, in main
  313. rv = self.invoke(ctx)
  314. File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
  315. return ctx.invoke(self.callback, **ctx.params)
  316. File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
  317. return __callback(*args, **kwargs)
  318. File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 318, in main
  319. train(**kwargs)
  320. File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 274, in train
  321. trainer.train()
  322. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/trainer.py", line 1662, in train
  323. return inner_training_loop(
  324. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/trainer.py", line 1731, in _inner_training_loop
  325. deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
  326. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
  327. deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
  328. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
  329. engine = DeepSpeedEngine(args=args,
  330. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
  331. self._configure_optimizer(optimizer, model_parameters)
  332. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1298, in _configure_optimizer
  333. self.optimizer = self._configure_zero_optimizer(basic_optimizer)
  334. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1599, in _configure_zero_optimizer
  335. optimizer = DeepSpeedZeroOptimizer_Stage3(
  336. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 312, in __init__
  337. self._setup_for_real_optimizer()
  338. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 371, in _setup_for_real_optimizer
  339. self.initialize_optimizer_states()
  340. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 938, in initialize_optimizer_states
  341. self._optimizer_step(i)
  342. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 858, in _optimizer_step
  343. self.optimizer.step()
  344. File "/databricks/python/lib/python3.9/site-packages/torch/optim/optimizer.py", line 140, in wrapper
  345. out = func(*args, **kwargs)
  346. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/ops/adam/fused_adam.py", line 135, in step
  347. state['exp_avg'] = torch.zeros_like(p.data)
  348. torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.63 GiB (GPU 0; 22.20 GiB total capacity; 20.97 GiB already allocated; 282.12 MiB free; 20.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
  349. Traceback (most recent call last):
  350. File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
  351. return _run_code(code, main_globals, None,
  352. File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
  353. exec(code, run_globals)
  354. File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 326, in <module>
  355. main()
  356. File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
  357. return self.main(*args, **kwargs)
  358. File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1053, in main
  359. main()
  360. File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
  361. main()
  362. rv = self.invoke(ctx)
  363. File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
  364. File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
  365. return self.main(*args, **kwargs)
  366. File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1053, in main
  367. return self.main(*args, **kwargs)
  368. File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1053, in main
  369. rv = self.invoke(ctx)
  370. File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
  371. return ctx.invoke(self.callback, **ctx.params)
  372. File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
  373. rv = self.invoke(ctx)
  374. File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
  375. return __callback(*args, **kwargs)
  376. File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 318, in main
  377. return ctx.invoke(self.callback, **ctx.params)
  378. File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
  379. return __callback(*args, **kwargs)
  380. File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 318, in main
  381. return ctx.invoke(self.callback, **ctx.params)
  382. File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
  383. return __callback(*args, **kwargs)
  384. File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 318, in main
  385. train(**kwargs)
  386. File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 274, in train
  387. main()
  388. File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
  389. train(**kwargs)
  390. File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 274, in train
  391. return self.main(*args, **kwargs)
  392. File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1053, in main
  393. rv = self.invoke(ctx)
  394. File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
  395. return ctx.invoke(self.callback, **ctx.params)
  396. File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
  397. return __callback(*args, **kwargs)
  398. File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 318, in main
  399. train(**kwargs)
  400. File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 274, in train
  401. train(**kwargs)
  402. File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 274, in train
  403. trainer.train()
  404. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/trainer.py", line 1662, in train
  405. trainer.train()
  406. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/trainer.py", line 1662, in train
  407. trainer.train()
  408. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/trainer.py", line 1662, in train
  409. return inner_training_loop(
  410. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/trainer.py", line 1731, in _inner_training_loop
  411. return inner_training_loop(return inner_training_loop(
  412.  
  413. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/trainer.py", line 1731, in _inner_training_loop
  414. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/trainer.py", line 1731, in _inner_training_loop
  415. deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
  416. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
  417. deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
  418. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
  419. deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
  420. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
  421. deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(engine = DeepSpeedEngine(args=args,
  422.  
  423. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
  424. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
  425. deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
  426. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
  427. self._configure_optimizer(optimizer, model_parameters)
  428. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1298, in _configure_optimizer
  429. engine = DeepSpeedEngine(args=args,
  430. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
  431. deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
  432. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
  433. self._configure_optimizer(optimizer, model_parameters)
  434. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1298, in _configure_optimizer
  435. engine = DeepSpeedEngine(args=args,
  436. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
  437. self.optimizer = self._configure_zero_optimizer(basic_optimizer)
  438. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1599, in _configure_zero_optimizer
  439. self._configure_optimizer(optimizer, model_parameters)
  440. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1298, in _configure_optimizer
  441. self.optimizer = self._configure_zero_optimizer(basic_optimizer)
  442. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1599, in _configure_zero_optimizer
  443. self.optimizer = self._configure_zero_optimizer(basic_optimizer)optimizer = DeepSpeedZeroOptimizer_Stage3(
  444.  
  445. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1599, in _configure_zero_optimizer
  446. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 312, in __init__
  447. self._setup_for_real_optimizer()optimizer = DeepSpeedZeroOptimizer_Stage3(
  448.  
  449. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 371, in _setup_for_real_optimizer
  450. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 312, in __init__
  451. self.initialize_optimizer_states()self._setup_for_real_optimizer()
  452.  
  453. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 938, in initialize_optimizer_states
  454. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 371, in _setup_for_real_optimizer
  455. optimizer = DeepSpeedZeroOptimizer_Stage3(
  456. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 312, in __init__
  457. self.initialize_optimizer_states()
  458. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 938, in initialize_optimizer_states
  459. self._optimizer_step(i)
  460. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 858, in _optimizer_step
  461. self._setup_for_real_optimizer()
  462. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 371, in _setup_for_real_optimizer
  463. self._optimizer_step(i)
  464. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 858, in _optimizer_step
  465. self.initialize_optimizer_states()
  466. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 938, in initialize_optimizer_states
  467. self.optimizer.step()
  468. File "/databricks/python/lib/python3.9/site-packages/torch/optim/optimizer.py", line 140, in wrapper
  469. out = func(*args, **kwargs)
  470. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/ops/adam/fused_adam.py", line 135, in step
  471. self.optimizer.step()
  472. File "/databricks/python/lib/python3.9/site-packages/torch/optim/optimizer.py", line 140, in wrapper
  473. state['exp_avg'] = torch.zeros_like(p.data)
  474. self._optimizer_step(i)
  475. torch.cuda File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 858, in _optimizer_step
  476. . OutOfMemoryErrorout = func(*args, **kwargs):
  477. CUDA out of memory. Tried to allocate 2.63 GiB (GPU 0; 22.20 GiB total capacity; 20.97 GiB already allocated; 282.12 MiB free; 20.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/ops/adam/fused_adam.py", line 135, in step
  478.  
  479. state['exp_avg'] = torch.zeros_like(p.data)
  480. torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.63 GiB (GPU 1; 22.20 GiB total capacity; 20.97 GiB already allocated; 282.12 MiB free; 20.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
  481. self.optimizer.step()
  482. File "/databricks/python/lib/python3.9/site-packages/torch/optim/optimizer.py", line 140, in wrapper
  483. out = func(*args, **kwargs)
  484. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/ops/adam/fused_adam.py", line 135, in step
  485. trainer.train()
  486. state['exp_avg'] = torch.zeros_like(p.data) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/trainer.py", line 1662, in train
  487.  
  488. torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.63 GiB (GPU 3; 22.20 GiB total capacity; 20.97 GiB already allocated; 282.12 MiB free; 20.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
  489. return inner_training_loop(
  490. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/trainer.py", line 1731, in _inner_training_loop
  491. deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
  492. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
  493. deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
  494. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
  495. engine = DeepSpeedEngine(args=args,
  496. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
  497. self._configure_optimizer(optimizer, model_parameters)
  498. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1298, in _configure_optimizer
  499. self.optimizer = self._configure_zero_optimizer(basic_optimizer)
  500. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1599, in _configure_zero_optimizer
  501. optimizer = DeepSpeedZeroOptimizer_Stage3(
  502. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 312, in __init__
  503. self._setup_for_real_optimizer()
  504. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 371, in _setup_for_real_optimizer
  505. self.initialize_optimizer_states()
  506. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 938, in initialize_optimizer_states
  507. self._optimizer_step(i)
  508. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 858, in _optimizer_step
  509. self.optimizer.step()
  510. File "/databricks/python/lib/python3.9/site-packages/torch/optim/optimizer.py", line 140, in wrapper
  511. out = func(*args, **kwargs)
  512. File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/ops/adam/fused_adam.py", line 135, in step
  513. state['exp_avg'] = torch.zeros_like(p.data)
  514. torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.63 GiB (GPU 2; 22.20 GiB total capacity; 20.97 GiB already allocated; 282.12 MiB free; 20.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
  515. [2023-04-25 19:49:33,762] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 2328
  516. [2023-04-25 19:49:34,377] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 2329
  517. [2023-04-25 19:49:34,379] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 2330
  518. [2023-04-25 19:49:34,712] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 2331
  519. [2023-04-25 19:49:34,712] [ERROR] [launch.py:324:sigkill_handler] ['/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/bin/python', '-u', '-m', 'training.trainer', '--local_rank=3', '--input-model', 'EleutherAI/pythia-6.9b', '--deepspeed', '/Workspace/Repos/opyate@gmail.com/dolly/config/ds_z3_bf16_config.json', '--epochs', '2', '--local-output-dir', '/local_disk0/dolly_training/dolly__2023-04-25T19:42:07', '--dbfs-output-dir', '/dbfs/dolly_training/dolly__2023-04-25T19:42:07', '--per-device-train-batch-size', '6', '--per-device-eval-batch-size', '6', '--logging-steps', '10', '--save-steps', '200', '--save-total-limit', '20', '--eval-steps', '50', '--warmup-steps', '50', '--test-size', '200', '--lr', '5e-6'] exits with return code = 1
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement