Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- [2023-04-25 19:42:15,874] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
- [2023-04-25 19:42:15,901] [INFO] [runner.py:550:main] cmd = /local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgM119 --master_addr=127.0.0.1 --master_port=29500 --module --enable_each_rank_log=None training.trainer --input-model EleutherAI/pythia-6.9b --deepspeed /Workspace/Repos/opyate@gmail.com/dolly/config/ds_z3_bf16_config.json --epochs 2 --local-output-dir /local_disk0/dolly_training/dolly__2023-04-25T19:42:07 --dbfs-output-dir /dbfs/dolly_training/dolly__2023-04-25T19:42:07 --per-device-train-batch-size 6 --per-device-eval-batch-size 6 --logging-steps 10 --save-steps 200 --save-total-limit 20 --eval-steps 50 --warmup-steps 50 --test-size 200 --lr 5e-6
- [2023-04-25 19:42:19,290] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3]}
- [2023-04-25 19:42:19,290] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=4, node_rank=0
- [2023-04-25 19:42:19,290] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3]})
- [2023-04-25 19:42:19,290] [INFO] [launch.py:162:main] dist_world_size=4
- [2023-04-25 19:42:19,290] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3
- 2023-04-25 19:42:21.666669: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
- To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
- 2023-04-25 19:42:21.667881: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
- To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
- 2023-04-25 19:42:21.669724: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
- To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
- 2023-04-25 19:42:21.672882: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
- To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
- 2023-04-25 19:42:29 INFO [__main__] Loading tokenizer for EleutherAI/pythia-6.9b
- 2023-04-25 19:42:29 INFO [__main__] Loading tokenizer for EleutherAI/pythia-6.9b
- 2023-04-25 19:42:29 INFO [__main__] Loading tokenizer for EleutherAI/pythia-6.9b
- 2023-04-25 19:42:29 INFO [__main__] Loading tokenizer for EleutherAI/pythia-6.9b
- Downloading (…)okenizer_config.json: 100%|██████| 396/396 [00:00<00:00, 109kB/s]
- Downloading (…)/main/tokenizer.json: 100%|█| 2.11M/2.11M [00:00<00:00, 29.3MB/s]
- Downloading (…)cial_tokens_map.json: 100%|███| 99.0/99.0 [00:00<00:00, 28.1kB/s]
- 2023-04-25 19:42:30 INFO [__main__] Loading model for EleutherAI/pythia-6.9b
- 2023-04-25 19:42:30 INFO [__main__] Loading model for EleutherAI/pythia-6.9b
- 2023-04-25 19:42:30 INFO [__main__] Loading model for EleutherAI/pythia-6.9b
- 2023-04-25 19:42:30 INFO [__main__] Loading model for EleutherAI/pythia-6.9b
- Downloading (…)lve/main/config.json: 100%|██████| 571/571 [00:00<00:00, 339kB/s]
- Downloading (…)model.bin.index.json: 100%|█| 42.0k/42.0k [00:00<00:00, 25.8MB/s]
- Downloading shards: 0%| | 0/2 [00:00<?, ?it/s]
- Downloading (…)00001-of-00002.bin";: 0%| | 0.00/9.91G [00:00<?, ?B/s]
- Downloading (…)00001-of-00002.bin";: 0%| | 10.5M/9.91G [00:00<09:52, 16.7MB/s]
- Downloading (…)00001-of-00002.bin";: 0%| | 21.0M/9.91G [00:00<05:59, 27.5MB/s]
- Downloading (…)00001-of-00002.bin";: 0%| | 31.5M/9.91G [00:00<04:18, 38.2MB/s]
- Downloading (…)00001-of-00002.bin";: 0%| | 41.9M/9.91G [00:01<03:54, 42.0MB/s]
- Downloading (…)00001-of-00002.bin";: 1%| | 52.4M/9.91G [00:01<03:39, 45.0MB/s]
- Downloading (…)00001-of-00002.bin";: 1%| | 62.9M/9.91G [00:01<03:13, 51.0MB/s]
- Downloading (…)00001-of-00002.bin";: 1%| | 73.4M/9.91G [00:01<03:13, 50.9MB/s]
- Downloading (…)00001-of-00002.bin";: 1%| | 83.9M/9.91G [00:01<03:14, 50.5MB/s]
- Downloading (…)00001-of-00002.bin";: 1%| | 94.4M/9.91G [00:02<02:56, 55.7MB/s]
- Downloading (…)00001-of-00002.bin";: 1%| | 105M/9.91G [00:02<03:13, 50.6MB/s]
- Downloading (…)00001-of-00002.bin";: 1%| | 115M/9.91G [00:02<03:22, 48.3MB/s]
- Downloading (…)00001-of-00002.bin";: 1%| | 126M/9.91G [00:02<03:20, 48.7MB/s]
- Downloading (…)00001-of-00002.bin";: 1%| | 136M/9.91G [00:03<03:52, 42.1MB/s]
- Downloading (…)00001-of-00002.bin";: 1%| | 147M/9.91G [00:03<03:41, 44.1MB/s]
- Downloading (…)00001-of-00002.bin";: 2%| | 157M/9.91G [00:03<03:30, 46.4MB/s]
- Downloading (…)00001-of-00002.bin";: 2%| | 168M/9.91G [00:03<03:11, 50.9MB/s]
- Downloading (…)00001-of-00002.bin";: 2%| | 178M/9.91G [00:03<03:12, 50.6MB/s]
- Downloading (…)00001-of-00002.bin";: 2%| | 189M/9.91G [00:04<02:56, 55.1MB/s]
- Downloading (…)00001-of-00002.bin";: 2%| | 199M/9.91G [00:04<03:00, 53.8MB/s]
- Downloading (…)00001-of-00002.bin";: 2%| | 210M/9.91G [00:04<03:04, 52.6MB/s]
- Downloading (…)00001-of-00002.bin";: 2%| | 220M/9.91G [00:04<02:49, 57.1MB/s]
- Downloading (…)00001-of-00002.bin";: 2%| | 231M/9.91G [00:04<02:57, 54.7MB/s]
- Downloading (…)00001-of-00002.bin";: 2%| | 241M/9.91G [00:05<03:00, 53.6MB/s]
- Downloading (…)00001-of-00002.bin";: 3%| | 252M/9.91G [00:05<02:50, 56.8MB/s]
- Downloading (…)00001-of-00002.bin";: 3%| | 262M/9.91G [00:05<02:53, 55.7MB/s]
- Downloading (…)00001-of-00002.bin";: 3%| | 273M/9.91G [00:05<02:42, 59.2MB/s]
- Downloading (…)00001-of-00002.bin";: 3%| | 283M/9.91G [00:05<02:52, 55.9MB/s]
- Downloading (…)00001-of-00002.bin";: 3%| | 294M/9.91G [00:05<02:57, 54.2MB/s]
- Downloading (…)00001-of-00002.bin";: 3%| | 304M/9.91G [00:06<02:44, 58.3MB/s]
- Downloading (…)00001-of-00002.bin";: 3%| | 315M/9.91G [00:06<02:52, 55.5MB/s]
- Downloading (…)00001-of-00002.bin";: 3%| | 325M/9.91G [00:06<02:41, 59.3MB/s]
- Downloading (…)00001-of-00002.bin";: 3%| | 336M/9.91G [00:06<02:52, 55.4MB/s]
- Downloading (…)00001-of-00002.bin";: 3%| | 346M/9.91G [00:06<02:56, 54.3MB/s]
- Downloading (…)00001-of-00002.bin";: 4%| | 357M/9.91G [00:07<02:44, 58.2MB/s]
- Downloading (…)00001-of-00002.bin";: 4%| | 367M/9.91G [00:07<03:11, 49.7MB/s]
- Downloading (…)00001-of-00002.bin";: 4%| | 377M/9.91G [00:07<03:19, 47.7MB/s]
- Downloading (…)00001-of-00002.bin";: 4%| | 388M/9.91G [00:07<03:01, 52.5MB/s]
- Downloading (…)00001-of-00002.bin";: 4%| | 398M/9.91G [00:07<03:12, 49.5MB/s]
- Downloading (…)00001-of-00002.bin";: 4%| | 409M/9.91G [00:08<03:08, 50.5MB/s]
- Downloading (…)00001-of-00002.bin";: 4%| | 419M/9.91G [00:08<03:09, 50.0MB/s]
- Downloading (…)00001-of-00002.bin";: 4%| | 430M/9.91G [00:08<02:54, 54.2MB/s]
- Downloading (…)00001-of-00002.bin";: 4%| | 440M/9.91G [00:08<02:59, 52.8MB/s]
- Downloading (…)00001-of-00002.bin";: 5%| | 451M/9.91G [00:08<02:45, 57.3MB/s]
- Downloading (…)00001-of-00002.bin";: 5%| | 461M/9.91G [00:09<02:52, 54.9MB/s]
- Downloading (…)00001-of-00002.bin";: 5%| | 472M/9.91G [00:09<02:55, 53.8MB/s]
- Downloading (…)00001-of-00002.bin";: 5%| | 482M/9.91G [00:09<03:00, 52.3MB/s]
- Downloading (…)00001-of-00002.bin";: 5%| | 493M/9.91G [00:09<03:01, 51.9MB/s]
- Downloading (…)00001-of-00002.bin";: 5%| | 503M/9.91G [00:09<02:48, 55.7MB/s]
- Downloading (…)00001-of-00002.bin";: 5%| | 514M/9.91G [00:10<02:53, 54.2MB/s]
- Downloading (…)00001-of-00002.bin";: 5%| | 524M/9.91G [00:10<03:15, 48.0MB/s]
- Downloading (…)00001-of-00002.bin";: 5%| | 535M/9.91G [00:10<03:12, 48.8MB/s]
- Downloading (…)00001-of-00002.bin";: 6%| | 545M/9.91G [00:10<02:54, 53.6MB/s]
- Downloading (…)00001-of-00002.bin";: 6%| | 556M/9.91G [00:10<03:07, 50.0MB/s]
- Downloading (…)00001-of-00002.bin";: 6%| | 566M/9.91G [00:11<03:49, 40.8MB/s]
- Downloading (…)00001-of-00002.bin";: 6%| | 577M/9.91G [00:11<03:30, 44.3MB/s]
- Downloading (…)00001-of-00002.bin";: 6%| | 587M/9.91G [00:11<03:48, 40.7MB/s]
- Downloading (…)00001-of-00002.bin";: 6%| | 598M/9.91G [00:12<03:44, 41.4MB/s]
- Downloading (…)00001-of-00002.bin";: 6%| | 608M/9.91G [00:12<03:40, 42.2MB/s]
- Downloading (…)00001-of-00002.bin";: 6%| | 619M/9.91G [00:12<03:30, 44.1MB/s]
- Downloading (…)00001-of-00002.bin";: 6%|▏ | 629M/9.91G [00:12<03:29, 44.2MB/s]
- Downloading (…)00001-of-00002.bin";: 6%|▏ | 640M/9.91G [00:12<03:29, 44.3MB/s]
- Downloading (…)00001-of-00002.bin";: 7%|▏ | 650M/9.91G [00:13<03:21, 45.9MB/s]
- Downloading (…)00001-of-00002.bin";: 7%|▏ | 661M/9.91G [00:13<03:25, 45.0MB/s]
- Downloading (…)00001-of-00002.bin";: 7%|▏ | 671M/9.91G [00:13<03:20, 46.1MB/s]
- Downloading (…)00001-of-00002.bin";: 7%|▏ | 682M/9.91G [00:13<03:16, 47.1MB/s]
- Downloading (…)00001-of-00002.bin";: 7%|▏ | 692M/9.91G [00:14<03:15, 47.1MB/s]
- Downloading (…)00001-of-00002.bin";: 7%|▏ | 703M/9.91G [00:14<03:08, 48.8MB/s]
- Downloading (…)00001-of-00002.bin";: 7%|▏ | 713M/9.91G [00:14<03:07, 49.1MB/s]
- Downloading (…)00001-of-00002.bin";: 7%|▏ | 724M/9.91G [00:14<03:04, 49.7MB/s]
- Downloading (…)00001-of-00002.bin";: 7%|▏ | 734M/9.91G [00:14<03:05, 49.5MB/s]
- Downloading (…)00001-of-00002.bin";: 8%|▏ | 744M/9.91G [00:15<03:03, 49.9MB/s]
- Downloading (…)00001-of-00002.bin";: 8%|▏ | 755M/9.91G [00:15<03:03, 49.8MB/s]
- Downloading (…)00001-of-00002.bin";: 8%|▏ | 765M/9.91G [00:15<03:30, 43.5MB/s]
- Downloading (…)00001-of-00002.bin";: 8%|▏ | 776M/9.91G [00:15<03:22, 45.1MB/s]
- Downloading (…)00001-of-00002.bin";: 8%|▏ | 786M/9.91G [00:16<03:48, 39.9MB/s]
- Downloading (…)00001-of-00002.bin";: 8%|▏ | 797M/9.91G [00:16<03:35, 42.2MB/s]
- Downloading (…)00001-of-00002.bin";: 8%|▏ | 807M/9.91G [00:16<03:09, 47.9MB/s]
- Downloading (…)00001-of-00002.bin";: 8%|▏ | 818M/9.91G [00:16<03:10, 47.8MB/s]
- Downloading (…)00001-of-00002.bin";: 8%|▏ | 828M/9.91G [00:16<03:06, 48.6MB/s]
- Downloading (…)00001-of-00002.bin";: 8%|▏ | 839M/9.91G [00:17<03:05, 48.9MB/s]
- Downloading (…)00001-of-00002.bin";: 9%|▏ | 849M/9.91G [00:17<02:50, 53.0MB/s]
- Downloading (…)00001-of-00002.bin";: 9%|▏ | 860M/9.91G [00:17<02:51, 52.8MB/s]
- Downloading (…)00001-of-00002.bin";: 9%|▏ | 870M/9.91G [00:17<02:52, 52.3MB/s]
- Downloading (…)00001-of-00002.bin";: 9%|▏ | 881M/9.91G [00:17<02:41, 56.0MB/s]
- Downloading (…)00001-of-00002.bin";: 9%|▏ | 891M/9.91G [00:18<03:40, 41.0MB/s]
- Downloading (…)00001-of-00002.bin";: 9%|▏ | 902M/9.91G [00:18<03:15, 46.2MB/s]
- Downloading (…)00001-of-00002.bin";: 9%|▏ | 912M/9.91G [00:18<03:05, 48.4MB/s]
- Downloading (…)00001-of-00002.bin";: 9%|▏ | 923M/9.91G [00:19<03:54, 38.4MB/s]
- Downloading (…)00001-of-00002.bin";: 9%|▏ | 933M/9.91G [00:19<03:36, 41.5MB/s]
- Downloading (…)00001-of-00002.bin";: 10%|▏ | 944M/9.91G [00:19<03:24, 43.9MB/s]
- Downloading (…)00001-of-00002.bin";: 10%|▏ | 954M/9.91G [00:19<03:02, 49.2MB/s]
- Downloading (…)00001-of-00002.bin";: 10%|▏ | 965M/9.91G [00:19<03:00, 49.6MB/s]
- Downloading (…)00001-of-00002.bin";: 10%|▏ | 975M/9.91G [00:20<03:01, 49.2MB/s]
- Downloading (…)00001-of-00002.bin";: 10%|▏ | 986M/9.91G [00:20<02:43, 54.7MB/s]
- Downloading (…)00001-of-00002.bin";: 10%|▏ | 996M/9.91G [00:20<02:47, 53.4MB/s]
- Downloading (…)00001-of-00002.bin";: 10%| | 1.01G/9.91G [00:20<02:50, 52.4MB/s]
- Downloading (…)00001-of-00002.bin";: 10%| | 1.02G/9.91G [00:20<02:51, 51.8MB/s]
- Downloading (…)00001-of-00002.bin";: 10%| | 1.03G/9.91G [00:21<02:38, 56.0MB/s]
- Downloading (…)00001-of-00002.bin";: 10%| | 1.04G/9.91G [00:21<02:44, 54.0MB/s]
- Downloading (…)00001-of-00002.bin";: 11%| | 1.05G/9.91G [00:21<02:48, 52.6MB/s]
- Downloading (…)00001-of-00002.bin";: 11%| | 1.06G/9.91G [00:21<02:37, 56.2MB/s]
- Downloading (…)00001-of-00002.bin";: 11%| | 1.07G/9.91G [00:21<02:41, 54.8MB/s]
- Downloading (…)00001-of-00002.bin";: 11%| | 1.08G/9.91G [00:21<02:44, 53.8MB/s]
- Downloading (…)00001-of-00002.bin";: 11%| | 1.09G/9.91G [00:22<03:05, 47.4MB/s]
- Downloading (…)00001-of-00002.bin";: 11%| | 1.10G/9.91G [00:22<03:25, 43.0MB/s]
- Downloading (…)00001-of-00002.bin";: 11%| | 1.11G/9.91G [00:22<03:02, 48.3MB/s]
- Downloading (…)00001-of-00002.bin";: 11%| | 1.12G/9.91G [00:22<02:59, 48.9MB/s]
- Downloading (…)00001-of-00002.bin";: 11%| | 1.13G/9.91G [00:23<04:14, 34.5MB/s]
- Downloading (…)00001-of-00002.bin";: 12%| | 1.14G/9.91G [00:23<03:39, 40.0MB/s]
- Downloading (…)00001-of-00002.bin";: 12%| | 1.15G/9.91G [00:23<03:19, 44.0MB/s]
- Downloading (…)00001-of-00002.bin";: 12%| | 1.16G/9.91G [00:23<03:02, 47.8MB/s]
- Downloading (…)00001-of-00002.bin";: 12%| | 1.17G/9.91G [00:24<03:01, 48.2MB/s]
- Downloading (…)00001-of-00002.bin";: 12%| | 1.18G/9.91G [00:24<02:44, 53.2MB/s]
- Downloading (…)00001-of-00002.bin";: 12%| | 1.20G/9.91G [00:24<02:47, 52.0MB/s]
- Downloading (…)00001-of-00002.bin";: 12%| | 1.21G/9.91G [00:24<02:48, 51.8MB/s]
- Downloading (…)00001-of-00002.bin";: 12%| | 1.22G/9.91G [00:24<02:36, 55.6MB/s]
- Downloading (…)00001-of-00002.bin";: 12%| | 1.23G/9.91G [00:25<02:39, 54.3MB/s]
- Downloading (…)00001-of-00002.bin";: 12%| | 1.24G/9.91G [00:25<02:44, 52.7MB/s]
- Downloading (…)00001-of-00002.bin";: 13%|▏| 1.25G/9.91G [00:25<02:45, 52.3MB/s]
- Downloading (…)00001-of-00002.bin";: 13%|▏| 1.26G/9.91G [00:25<03:21, 43.0MB/s]
- Downloading (…)00001-of-00002.bin";: 13%|▏| 1.27G/9.91G [00:26<02:58, 48.3MB/s]
- Downloading (…)00001-of-00002.bin";: 13%|▏| 1.28G/9.91G [00:26<02:55, 49.2MB/s]
- Downloading (…)00001-of-00002.bin";: 13%|▏| 1.29G/9.91G [00:26<02:55, 49.0MB/s]
- Downloading (…)00001-of-00002.bin";: 13%|▏| 1.30G/9.91G [00:26<03:47, 37.9MB/s]
- Downloading (…)00001-of-00002.bin";: 13%|▏| 1.31G/9.91G [00:27<03:14, 44.3MB/s]
- Downloading (…)00001-of-00002.bin";: 13%|▏| 1.32G/9.91G [00:27<03:08, 45.5MB/s]
- Downloading (…)00001-of-00002.bin";: 13%|▏| 1.33G/9.91G [00:27<03:01, 47.2MB/s]
- Downloading (…)00001-of-00002.bin";: 14%|▏| 1.34G/9.91G [00:27<02:51, 49.9MB/s]
- Downloading (…)00001-of-00002.bin";: 14%|▏| 1.35G/9.91G [00:27<02:51, 50.0MB/s]
- Downloading (…)00001-of-00002.bin";: 14%|▏| 1.36G/9.91G [00:28<02:50, 50.1MB/s]
- Downloading (…)00001-of-00002.bin";: 14%|▏| 1.37G/9.91G [00:28<02:50, 50.1MB/s]
- Downloading (…)00001-of-00002.bin";: 14%|▏| 1.38G/9.91G [00:28<02:36, 54.4MB/s]
- Downloading (…)00001-of-00002.bin";: 14%|▏| 1.39G/9.91G [00:28<03:13, 44.1MB/s]
- Downloading (…)00001-of-00002.bin";: 14%|▏| 1.41G/9.91G [00:28<03:03, 46.2MB/s]
- Downloading (…)00001-of-00002.bin";: 14%|▏| 1.42G/9.91G [00:29<03:01, 46.8MB/s]
- Downloading (…)00001-of-00002.bin";: 14%|▏| 1.43G/9.91G [00:29<03:09, 44.7MB/s]
- Downloading (…)00001-of-00002.bin";: 14%|▏| 1.44G/9.91G [00:29<02:58, 47.5MB/s]
- Downloading (…)00001-of-00002.bin";: 15%|▏| 1.45G/9.91G [00:29<02:45, 51.0MB/s]
- Downloading (…)00001-of-00002.bin";: 15%|▏| 1.46G/9.91G [00:29<02:45, 51.1MB/s]
- Downloading (…)00001-of-00002.bin";: 15%|▏| 1.47G/9.91G [00:30<02:46, 50.7MB/s]
- Downloading (…)00001-of-00002.bin";: 15%|▏| 1.48G/9.91G [00:30<02:32, 55.5MB/s]
- Downloading (…)00001-of-00002.bin";: 15%|▏| 1.49G/9.91G [00:30<02:37, 53.6MB/s]
- Downloading (…)00001-of-00002.bin";: 15%|▏| 1.50G/9.91G [00:30<02:39, 52.7MB/s]
- Downloading (…)00001-of-00002.bin";: 15%|▏| 1.51G/9.91G [00:30<02:42, 51.7MB/s]
- Downloading (…)00001-of-00002.bin";: 15%|▏| 1.52G/9.91G [00:31<02:29, 56.1MB/s]
- Downloading (…)00001-of-00002.bin";: 15%|▏| 1.53G/9.91G [00:31<02:35, 54.0MB/s]
- Downloading (…)00001-of-00002.bin";: 16%|▏| 1.54G/9.91G [00:31<02:25, 57.7MB/s]
- Downloading (…)00001-of-00002.bin";: 16%|▏| 1.55G/9.91G [00:31<02:31, 55.2MB/s]
- Downloading (…)00001-of-00002.bin";: 16%|▏| 1.56G/9.91G [00:31<02:34, 54.1MB/s]
- Downloading (…)00001-of-00002.bin";: 16%|▏| 1.57G/9.91G [00:32<02:23, 58.1MB/s]
- Downloading (…)00001-of-00002.bin";: 16%|▏| 1.58G/9.91G [00:32<02:29, 55.6MB/s]
- Downloading (…)00001-of-00002.bin";: 16%|▏| 1.59G/9.91G [00:32<02:35, 53.6MB/s]
- Downloading (…)00001-of-00002.bin";: 16%|▏| 1.60G/9.91G [00:32<02:23, 58.0MB/s]
- Downloading (…)00001-of-00002.bin";: 16%|▏| 1.61G/9.91G [00:32<02:29, 55.6MB/s]
- Downloading (…)00001-of-00002.bin";: 16%|▏| 1.63G/9.91G [00:33<02:34, 53.5MB/s]
- Downloading (…)00001-of-00002.bin";: 17%|▏| 1.64G/9.91G [00:33<02:38, 52.2MB/s]
- Downloading (…)00001-of-00002.bin";: 17%|▏| 1.65G/9.91G [00:33<02:26, 56.3MB/s]
- Downloading (…)00001-of-00002.bin";: 17%|▏| 1.66G/9.91G [00:33<02:31, 54.3MB/s]
- Downloading (…)00001-of-00002.bin";: 17%|▏| 1.67G/9.91G [00:33<02:35, 52.9MB/s]
- Downloading (…)00001-of-00002.bin";: 17%|▏| 1.68G/9.91G [00:33<02:24, 56.9MB/s]
- Downloading (…)00001-of-00002.bin";: 17%|▏| 1.69G/9.91G [00:34<02:30, 54.7MB/s]
- Downloading (…)00001-of-00002.bin";: 17%|▏| 1.70G/9.91G [00:34<02:34, 53.2MB/s]
- Downloading (…)00001-of-00002.bin";: 17%|▏| 1.71G/9.91G [00:34<02:32, 53.8MB/s]
- Downloading (…)00001-of-00002.bin";: 17%|▏| 1.72G/9.91G [00:34<02:26, 56.0MB/s]
- Downloading (…)00001-of-00002.bin";: 17%|▏| 1.73G/9.91G [00:34<02:34, 53.1MB/s]
- Downloading (…)00001-of-00002.bin";: 18%|▏| 1.74G/9.91G [00:35<02:37, 52.0MB/s]
- Downloading (…)00001-of-00002.bin";: 18%|▏| 1.75G/9.91G [00:35<02:24, 56.3MB/s]
- Downloading (…)00001-of-00002.bin";: 18%|▏| 1.76G/9.91G [00:35<02:56, 46.3MB/s]
- Downloading (…)00001-of-00002.bin";: 18%|▏| 1.77G/9.91G [00:35<02:50, 47.9MB/s]
- Downloading (…)00001-of-00002.bin";: 18%|▏| 1.78G/9.91G [00:36<02:47, 48.5MB/s]
- Downloading (…)00001-of-00002.bin";: 18%|▏| 1.79G/9.91G [00:36<02:35, 52.2MB/s]
- Downloading (…)00001-of-00002.bin";: 18%|▏| 1.80G/9.91G [00:36<02:33, 53.0MB/s]
- Downloading (…)00001-of-00002.bin";: 18%|▏| 1.81G/9.91G [00:37<04:56, 27.3MB/s]
- Downloading (…)00001-of-00002.bin";: 18%|▏| 1.82G/9.91G [00:37<04:02, 33.4MB/s]
- Downloading (…)00001-of-00002.bin";: 19%|▏| 1.84G/9.91G [00:37<03:41, 36.5MB/s]
- Downloading (…)00001-of-00002.bin";: 19%|▏| 1.85G/9.91G [00:37<03:24, 39.5MB/s]
- Downloading (…)00001-of-00002.bin";: 19%|▏| 1.86G/9.91G [00:38<03:09, 42.4MB/s]
- Downloading (…)00001-of-00002.bin";: 19%|▏| 1.87G/9.91G [00:38<03:24, 39.4MB/s]
- Downloading (…)00001-of-00002.bin";: 19%|▏| 1.88G/9.91G [00:38<03:09, 42.3MB/s]
- Downloading (…)00001-of-00002.bin";: 19%|▏| 1.89G/9.91G [00:38<03:01, 44.3MB/s]
- Downloading (…)00001-of-00002.bin";: 19%|▏| 1.90G/9.91G [00:38<02:40, 49.8MB/s]
- Downloading (…)00001-of-00002.bin";: 19%|▏| 1.91G/9.91G [00:39<02:40, 49.9MB/s]
- Downloading (…)00001-of-00002.bin";: 19%|▏| 1.92G/9.91G [00:39<02:26, 54.6MB/s]
- Downloading (…)00001-of-00002.bin";: 19%|▏| 1.93G/9.91G [00:39<02:29, 53.4MB/s]
- Downloading (…)00001-of-00002.bin";: 20%|▏| 1.94G/9.91G [00:39<02:31, 52.8MB/s]
- Downloading (…)00001-of-00002.bin";: 20%|▏| 1.95G/9.91G [00:39<02:32, 52.1MB/s]
- Downloading (…)00001-of-00002.bin";: 20%|▏| 1.96G/9.91G [00:40<02:21, 56.3MB/s]
- Downloading (…)00001-of-00002.bin";: 20%|▏| 1.97G/9.91G [00:40<03:15, 40.7MB/s]
- Downloading (…)00001-of-00002.bin";: 20%|▏| 1.98G/9.91G [00:40<03:01, 43.8MB/s]
- Downloading (…)00001-of-00002.bin";: 20%|▏| 1.99G/9.91G [00:40<03:15, 40.5MB/s]
- Downloading (…)00001-of-00002.bin";: 20%|▏| 2.00G/9.91G [00:41<03:03, 43.0MB/s]
- Downloading (…)00001-of-00002.bin";: 20%|▏| 2.01G/9.91G [00:41<02:55, 45.0MB/s]
- Downloading (…)00001-of-00002.bin";: 20%|▏| 2.02G/9.91G [00:41<03:13, 40.8MB/s]
- Downloading (…)00001-of-00002.bin";: 21%|▏| 2.03G/9.91G [00:41<02:50, 46.3MB/s]
- Downloading (…)00001-of-00002.bin";: 21%|▏| 2.04G/9.91G [00:42<03:14, 40.5MB/s]
- Downloading (…)00001-of-00002.bin";: 21%|▏| 2.06G/9.91G [00:42<03:03, 42.8MB/s]
- Downloading (…)00001-of-00002.bin";: 21%|▏| 2.07G/9.91G [00:42<02:42, 48.2MB/s]
- Downloading (…)00001-of-00002.bin";: 21%|▏| 2.08G/9.91G [00:42<03:08, 41.5MB/s]
- Downloading (…)00001-of-00002.bin";: 21%|▏| 2.09G/9.91G [00:43<02:49, 46.1MB/s]
- Downloading (…)00001-of-00002.bin";: 21%|▏| 2.10G/9.91G [00:43<02:45, 47.3MB/s]
- Downloading (…)00001-of-00002.bin";: 21%|▏| 2.11G/9.91G [00:43<02:28, 52.5MB/s]
- Downloading (…)00001-of-00002.bin";: 21%|▏| 2.12G/9.91G [00:43<02:32, 51.0MB/s]
- Downloading (…)00001-of-00002.bin";: 21%|▏| 2.13G/9.91G [00:43<02:36, 49.8MB/s]
- Downloading (…)00001-of-00002.bin";: 22%|▏| 2.14G/9.91G [00:44<02:35, 49.9MB/s]
- Downloading (…)00001-of-00002.bin";: 22%|▏| 2.15G/9.91G [00:44<02:34, 50.3MB/s]
- Downloading (…)00001-of-00002.bin";: 22%|▏| 2.16G/9.91G [00:44<02:26, 52.9MB/s]
- Downloading (…)00001-of-00002.bin";: 22%|▏| 2.17G/9.91G [00:44<02:23, 54.1MB/s]
- Downloading (…)00001-of-00002.bin";: 22%|▏| 2.18G/9.91G [00:45<03:39, 35.3MB/s]
- Downloading (…)00001-of-00002.bin";: 22%|▏| 2.19G/9.91G [00:45<03:06, 41.4MB/s]
- Downloading (…)00001-of-00002.bin";: 22%|▏| 2.20G/9.91G [00:45<03:52, 33.2MB/s]
- Downloading (…)00001-of-00002.bin";: 22%|▏| 2.21G/9.91G [00:45<03:16, 39.2MB/s]
- Downloading (…)00001-of-00002.bin";: 22%|▏| 2.22G/9.91G [00:46<03:02, 42.2MB/s]
- Downloading (…)00001-of-00002.bin";: 23%|▏| 2.23G/9.91G [00:46<03:05, 41.5MB/s]
- Downloading (…)00001-of-00002.bin";: 23%|▏| 2.24G/9.91G [00:46<02:51, 44.6MB/s]
- Downloading (…)00001-of-00002.bin";: 23%|▏| 2.25G/9.91G [00:46<02:45, 46.3MB/s]
- Downloading (…)00001-of-00002.bin";: 23%|▏| 2.26G/9.91G [00:46<02:33, 49.8MB/s]
- Downloading (…)00001-of-00002.bin";: 23%|▏| 2.28G/9.91G [00:47<02:27, 51.9MB/s]
- Downloading (…)00001-of-00002.bin";: 23%|▏| 2.29G/9.91G [00:47<02:33, 49.6MB/s]
- Downloading (…)00001-of-00002.bin";: 23%|▏| 2.30G/9.91G [00:47<02:19, 54.8MB/s]
- Downloading (…)00001-of-00002.bin";: 23%|▏| 2.31G/9.91G [00:47<02:45, 46.0MB/s]
- Downloading (…)00001-of-00002.bin";: 23%|▏| 2.32G/9.91G [00:48<02:41, 47.1MB/s]
- Downloading (…)00001-of-00002.bin";: 23%|▏| 2.33G/9.91G [00:48<02:38, 48.0MB/s]
- Downloading (…)00001-of-00002.bin";: 24%|▏| 2.34G/9.91G [00:48<02:24, 52.4MB/s]
- Downloading (…)00001-of-00002.bin";: 24%|▏| 2.35G/9.91G [00:48<02:24, 52.2MB/s]
- Downloading (…)00001-of-00002.bin";: 24%|▏| 2.36G/9.91G [00:48<03:04, 41.0MB/s]
- Downloading (…)00001-of-00002.bin";: 24%|▏| 2.37G/9.91G [00:49<02:54, 43.2MB/s]
- Downloading (…)00001-of-00002.bin";: 24%|▏| 2.38G/9.91G [00:49<02:33, 48.9MB/s]
- Downloading (…)00001-of-00002.bin";: 24%|▏| 2.39G/9.91G [00:49<02:32, 49.2MB/s]
- Downloading (…)00001-of-00002.bin";: 24%|▏| 2.40G/9.91G [00:49<02:31, 49.6MB/s]
- Downloading (…)00001-of-00002.bin";: 24%|▏| 2.41G/9.91G [00:49<02:31, 49.5MB/s]
- Downloading (…)00001-of-00002.bin";: 24%|▏| 2.42G/9.91G [00:50<02:18, 54.0MB/s]
- Downloading (…)00001-of-00002.bin";: 25%|▏| 2.43G/9.91G [00:50<02:21, 52.8MB/s]
- Downloading (…)00001-of-00002.bin";: 25%|▏| 2.44G/9.91G [00:50<02:28, 50.3MB/s]
- Downloading (…)00001-of-00002.bin";: 25%|▏| 2.45G/9.91G [00:50<02:28, 50.3MB/s]
- Downloading (…)00001-of-00002.bin";: 25%|▏| 2.46G/9.91G [00:50<02:28, 50.3MB/s]
- Downloading (…)00001-of-00002.bin";: 25%|▏| 2.47G/9.91G [00:51<02:23, 51.9MB/s]
- Downloading (…)00001-of-00002
- *** WARNING: max output size exceeded, skipping output. ***
- trainer.py", line 1731, in _inner_training_loop
- deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
- deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
- engine = DeepSpeedEngine(args=args,
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
- self._configure_optimizer(optimizer, model_parameters)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1298, in _configure_optimizer
- self.optimizer = self._configure_zero_optimizer(basic_optimizer)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1599, in _configure_zero_optimizer
- optimizer = DeepSpeedZeroOptimizer_Stage3(
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 312, in __init__
- self._setup_for_real_optimizer()
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 371, in _setup_for_real_optimizer
- self.initialize_optimizer_states()
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 938, in initialize_optimizer_states
- self._optimizer_step(i)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 858, in _optimizer_step
- self.optimizer.step()
- File "/databricks/python/lib/python3.9/site-packages/torch/optim/optimizer.py", line 140, in wrapper
- out = func(*args, **kwargs)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/ops/adam/fused_adam.py", line 135, in step
- state['exp_avg'] = torch.zeros_like(p.data)
- torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.63 GiB (GPU 2; 22.20 GiB total capacity; 20.97 GiB already allocated; 282.12 MiB free; 20.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
- Traceback (most recent call last):
- File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
- return _run_code(code, main_globals, None,
- File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
- exec(code, run_globals)
- File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 326, in <module>
- return _run_code(code, main_globals, None,
- File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
- exec(code, run_globals)
- File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 326, in <module>
- 2023-04-25 19:49:31 ERROR [__main__] main failed
- Traceback (most recent call last):
- File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 326, in <module>
- main()
- File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
- return self.main(*args, **kwargs)
- File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1053, in main
- rv = self.invoke(ctx)
- File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
- return ctx.invoke(self.callback, **ctx.params)
- File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
- return __callback(*args, **kwargs)
- File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 318, in main
- train(**kwargs)
- File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 274, in train
- trainer.train()
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/trainer.py", line 1662, in train
- return inner_training_loop(
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/trainer.py", line 1731, in _inner_training_loop
- deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
- deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
- engine = DeepSpeedEngine(args=args,
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
- self._configure_optimizer(optimizer, model_parameters)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1298, in _configure_optimizer
- self.optimizer = self._configure_zero_optimizer(basic_optimizer)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1599, in _configure_zero_optimizer
- optimizer = DeepSpeedZeroOptimizer_Stage3(
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 312, in __init__
- self._setup_for_real_optimizer()
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 371, in _setup_for_real_optimizer
- self.initialize_optimizer_states()
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 938, in initialize_optimizer_states
- self._optimizer_step(i)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 858, in _optimizer_step
- self.optimizer.step()
- File "/databricks/python/lib/python3.9/site-packages/torch/optim/optimizer.py", line 140, in wrapper
- out = func(*args, **kwargs)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/ops/adam/fused_adam.py", line 135, in step
- state['exp_avg'] = torch.zeros_like(p.data)
- torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.63 GiB (GPU 0; 22.20 GiB total capacity; 20.97 GiB already allocated; 282.12 MiB free; 20.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
- Traceback (most recent call last):
- File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
- return _run_code(code, main_globals, None,
- File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
- exec(code, run_globals)
- File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 326, in <module>
- main()
- File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
- return self.main(*args, **kwargs)
- File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1053, in main
- main()
- File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
- main()
- rv = self.invoke(ctx)
- File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
- File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
- return self.main(*args, **kwargs)
- File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1053, in main
- return self.main(*args, **kwargs)
- File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1053, in main
- rv = self.invoke(ctx)
- File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
- return ctx.invoke(self.callback, **ctx.params)
- File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
- rv = self.invoke(ctx)
- File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
- return __callback(*args, **kwargs)
- File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 318, in main
- return ctx.invoke(self.callback, **ctx.params)
- File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
- return __callback(*args, **kwargs)
- File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 318, in main
- return ctx.invoke(self.callback, **ctx.params)
- File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
- return __callback(*args, **kwargs)
- File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 318, in main
- train(**kwargs)
- File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 274, in train
- main()
- File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
- train(**kwargs)
- File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 274, in train
- return self.main(*args, **kwargs)
- File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1053, in main
- rv = self.invoke(ctx)
- File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
- return ctx.invoke(self.callback, **ctx.params)
- File "/databricks/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
- return __callback(*args, **kwargs)
- File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 318, in main
- train(**kwargs)
- File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 274, in train
- train(**kwargs)
- File "/Workspace/Repos/opyate@gmail.com/dolly/training/trainer.py", line 274, in train
- trainer.train()
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/trainer.py", line 1662, in train
- trainer.train()
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/trainer.py", line 1662, in train
- trainer.train()
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/trainer.py", line 1662, in train
- return inner_training_loop(
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/trainer.py", line 1731, in _inner_training_loop
- return inner_training_loop(return inner_training_loop(
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/trainer.py", line 1731, in _inner_training_loop
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/trainer.py", line 1731, in _inner_training_loop
- deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
- deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
- deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
- deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(engine = DeepSpeedEngine(args=args,
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
- deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
- self._configure_optimizer(optimizer, model_parameters)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1298, in _configure_optimizer
- engine = DeepSpeedEngine(args=args,
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
- deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
- self._configure_optimizer(optimizer, model_parameters)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1298, in _configure_optimizer
- engine = DeepSpeedEngine(args=args,
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
- self.optimizer = self._configure_zero_optimizer(basic_optimizer)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1599, in _configure_zero_optimizer
- self._configure_optimizer(optimizer, model_parameters)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1298, in _configure_optimizer
- self.optimizer = self._configure_zero_optimizer(basic_optimizer)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1599, in _configure_zero_optimizer
- self.optimizer = self._configure_zero_optimizer(basic_optimizer)optimizer = DeepSpeedZeroOptimizer_Stage3(
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1599, in _configure_zero_optimizer
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 312, in __init__
- self._setup_for_real_optimizer()optimizer = DeepSpeedZeroOptimizer_Stage3(
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 371, in _setup_for_real_optimizer
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 312, in __init__
- self.initialize_optimizer_states()self._setup_for_real_optimizer()
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 938, in initialize_optimizer_states
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 371, in _setup_for_real_optimizer
- optimizer = DeepSpeedZeroOptimizer_Stage3(
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 312, in __init__
- self.initialize_optimizer_states()
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 938, in initialize_optimizer_states
- self._optimizer_step(i)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 858, in _optimizer_step
- self._setup_for_real_optimizer()
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 371, in _setup_for_real_optimizer
- self._optimizer_step(i)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 858, in _optimizer_step
- self.initialize_optimizer_states()
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 938, in initialize_optimizer_states
- self.optimizer.step()
- File "/databricks/python/lib/python3.9/site-packages/torch/optim/optimizer.py", line 140, in wrapper
- out = func(*args, **kwargs)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/ops/adam/fused_adam.py", line 135, in step
- self.optimizer.step()
- File "/databricks/python/lib/python3.9/site-packages/torch/optim/optimizer.py", line 140, in wrapper
- state['exp_avg'] = torch.zeros_like(p.data)
- self._optimizer_step(i)
- torch.cuda File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 858, in _optimizer_step
- . OutOfMemoryErrorout = func(*args, **kwargs):
- CUDA out of memory. Tried to allocate 2.63 GiB (GPU 0; 22.20 GiB total capacity; 20.97 GiB already allocated; 282.12 MiB free; 20.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/ops/adam/fused_adam.py", line 135, in step
- state['exp_avg'] = torch.zeros_like(p.data)
- torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.63 GiB (GPU 1; 22.20 GiB total capacity; 20.97 GiB already allocated; 282.12 MiB free; 20.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
- self.optimizer.step()
- File "/databricks/python/lib/python3.9/site-packages/torch/optim/optimizer.py", line 140, in wrapper
- out = func(*args, **kwargs)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/ops/adam/fused_adam.py", line 135, in step
- trainer.train()
- state['exp_avg'] = torch.zeros_like(p.data) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/trainer.py", line 1662, in train
- torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.63 GiB (GPU 3; 22.20 GiB total capacity; 20.97 GiB already allocated; 282.12 MiB free; 20.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
- return inner_training_loop(
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/trainer.py", line 1731, in _inner_training_loop
- deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
- deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
- engine = DeepSpeedEngine(args=args,
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in __init__
- self._configure_optimizer(optimizer, model_parameters)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1298, in _configure_optimizer
- self.optimizer = self._configure_zero_optimizer(basic_optimizer)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1599, in _configure_zero_optimizer
- optimizer = DeepSpeedZeroOptimizer_Stage3(
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 312, in __init__
- self._setup_for_real_optimizer()
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 371, in _setup_for_real_optimizer
- self.initialize_optimizer_states()
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 938, in initialize_optimizer_states
- self._optimizer_step(i)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 858, in _optimizer_step
- self.optimizer.step()
- File "/databricks/python/lib/python3.9/site-packages/torch/optim/optimizer.py", line 140, in wrapper
- out = func(*args, **kwargs)
- File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/lib/python3.9/site-packages/deepspeed/ops/adam/fused_adam.py", line 135, in step
- state['exp_avg'] = torch.zeros_like(p.data)
- torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.63 GiB (GPU 2; 22.20 GiB total capacity; 20.97 GiB already allocated; 282.12 MiB free; 20.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
- [2023-04-25 19:49:33,762] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 2328
- [2023-04-25 19:49:34,377] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 2329
- [2023-04-25 19:49:34,379] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 2330
- [2023-04-25 19:49:34,712] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 2331
- [2023-04-25 19:49:34,712] [ERROR] [launch.py:324:sigkill_handler] ['/local_disk0/.ephemeral_nfs/envs/pythonEnv-8a27c8d2-ec1e-4f11-9ce2-b56cf13d0017/bin/python', '-u', '-m', 'training.trainer', '--local_rank=3', '--input-model', 'EleutherAI/pythia-6.9b', '--deepspeed', '/Workspace/Repos/opyate@gmail.com/dolly/config/ds_z3_bf16_config.json', '--epochs', '2', '--local-output-dir', '/local_disk0/dolly_training/dolly__2023-04-25T19:42:07', '--dbfs-output-dir', '/dbfs/dolly_training/dolly__2023-04-25T19:42:07', '--per-device-train-batch-size', '6', '--per-device-eval-batch-size', '6', '--logging-steps', '10', '--save-steps', '200', '--save-total-limit', '20', '--eval-steps', '50', '--warmup-steps', '50', '--test-size', '200', '--lr', '5e-6'] exits with return code = 1
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement