Advertisement
Guest User

Untitled

a guest
May 2nd, 2023
22
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 21.26 KB | None | 0 0
  1. [2023-05-02 10:25:26,472] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
  2. [2023-05-02 10:25:26,499] [INFO] [runner.py:550:main] cmd = /local_disk0/.ephemeral_nfs/envs/pythonEnv-e1f29474-0171-420e-8b03-5f4b03ffc5cd/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgM119 --master_addr=127.0.0.1 --master_port=29500 --module --enable_each_rank_log=None training.trainer --input-model Databricks/dolly-v2-7b --deepspeed /Workspace/Repos/opyate@gmail.com/dolly/config/ds_z3_bf16_config.json --epochs 2 --local-output-dir /local_disk0/dolly_training/dolly_mydataset-dolly15kFormat-noJSONSchema__2023-05-02T10:25:17 --dbfs-output-dir /dbfs/dolly_training/dolly_mydataset-dolly15kFormat-noJSONSchema__2023-05-02T10:25:17 --per-device-train-batch-size 3 --per-device-eval-batch-size 3 --logging-steps 10 --save-steps 200 --save-total-limit 20 --eval-steps 50 --warmup-steps 50 --test-size 10 --lr 5e-06
  3. [2023-05-02 10:25:29,906] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3]}
  4. [2023-05-02 10:25:29,906] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=4, node_rank=0
  5. [2023-05-02 10:25:29,906] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3]})
  6. [2023-05-02 10:25:29,906] [INFO] [launch.py:162:main] dist_world_size=4
  7. [2023-05-02 10:25:29,906] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3
  8. 2023-05-02 10:25:32.315847: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
  9. To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
  10. 2023-05-02 10:25:32.339317: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
  11. To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
  12. 2023-05-02 10:25:32.367056: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
  13. To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
  14. 2023-05-02 10:25:32.378277: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
  15. To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
  16. 2023-05-02 10:25:40 INFO [__main__] Loading tokenizer for Databricks/dolly-v2-7b
  17. 2023-05-02 10:25:40 INFO [__main__] Loading tokenizer for Databricks/dolly-v2-7b
  18. 2023-05-02 10:25:40 INFO [__main__] Loading tokenizer for Databricks/dolly-v2-7b
  19. 2023-05-02 10:25:40 INFO [__main__] Loading tokenizer for Databricks/dolly-v2-7b
  20. 2023-05-02 10:25:41 INFO [__main__] Loading model for Databricks/dolly-v2-7b
  21. 2023-05-02 10:25:41 INFO [__main__] Loading model for Databricks/dolly-v2-7b
  22. 2023-05-02 10:25:41 INFO [__main__] Loading model for Databricks/dolly-v2-7b
  23. 2023-05-02 10:25:41 INFO [__main__] Loading model for Databricks/dolly-v2-7b
  24. 2023-05-02 10:29:30 INFO [__main__] Found max lenth: 2048
  25. 2023-05-02 10:29:30 INFO [__main__] Checking if dataset is specific via env var DATASET_NAME
  26. 2023-05-02 10:29:30 INFO [__main__] Yes: opyate/mydataset-dolly15kFormat-noJSONSchema
  27. 2023-05-02 10:29:31 WARNING [datasets.builder] Found cached dataset parquet (/root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
  28. 100%|█████████████████████████████████████████████| 1/1 [00:00<00:00, 97.22it/s]
  29. 2023-05-02 10:29:31 INFO [__main__] Found 1032 rows
  30. 2023-05-02 10:29:31 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-373a08c6145e0816.arrow
  31. 2023-05-02 10:29:31 INFO [__main__] Preprocessing dataset
  32. 2023-05-02 10:29:31 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-c25c16c83a622b7b.arrow
  33. 2023-05-02 10:29:31 INFO [__main__] Processed dataset has 1032 rows
  34. 2023-05-02 10:29:31 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-462aabd08dd4ec96.arrow
  35. 2023-05-02 10:29:31 INFO [__main__] Processed dataset has 1030 rows after filtering for truncated records
  36. 2023-05-02 10:29:31 INFO [__main__] Shuffling dataset
  37. 2023-05-02 10:29:31 WARNING [datasets.arrow_dataset] Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-2ef5f0cbbbb9df9d.arrow
  38. 2023-05-02 10:29:31 INFO [__main__] Done preprocessing
  39. 2023-05-02 10:29:31 WARNING [datasets.arrow_dataset] Loading cached split indices for dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-8ed636b6f0cf0253.arrow and /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-5b275d686d6d0049.arrow
  40. 2023-05-02 10:29:31 INFO [__main__] Train data size: 1020
  41. 2023-05-02 10:29:31 INFO [__main__] Test data size: 10
  42. [2023-05-02 10:29:31,603] [INFO] [comm.py:652:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
  43. 2023-05-02 10:29:31 INFO [__main__] Found max lenth: 2048
  44. 2023-05-02 10:29:31 INFO [__main__] Checking if dataset is specific via env var DATASET_NAME
  45. 2023-05-02 10:29:31 INFO [__main__] Yes: opyate/mydataset-dolly15kFormat-noJSONSchema
  46. 2023-05-02 10:29:31 INFO [__main__] Found max lenth: 2048
  47. 2023-05-02 10:29:31 INFO [__main__] Checking if dataset is specific via env var DATASET_NAME
  48. 2023-05-02 10:29:31 INFO [__main__] Yes: opyate/mydataset-dolly15kFormat-noJSONSchema
  49. 2023-05-02 10:29:31 INFO [__main__] Found max lenth: 2048
  50. 2023-05-02 10:29:31 INFO [__main__] Checking if dataset is specific via env var DATASET_NAME
  51. 2023-05-02 10:29:31 INFO [__main__] Yes: opyate/mydataset-dolly15kFormat-noJSONSchema
  52. 2023-05-02 10:29:32 WARNING [datasets.builder] Found cached dataset parquet (/root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
  53. 100%|████████████████████████████████████████████| 1/1 [00:00<00:00, 538.01it/s]
  54. 2023-05-02 10:29:32 INFO [__main__] Found 1032 rows
  55. 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-373a08c6145e0816.arrow
  56. 2023-05-02 10:29:32 INFO [__main__] Preprocessing dataset
  57. 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-c25c16c83a622b7b.arrow
  58. 2023-05-02 10:29:32 INFO [__main__] Processed dataset has 1032 rows
  59. 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-462aabd08dd4ec96.arrow
  60. 2023-05-02 10:29:32 INFO [__main__] Processed dataset has 1030 rows after filtering for truncated records
  61. 2023-05-02 10:29:32 INFO [__main__] Shuffling dataset
  62. 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-2ef5f0cbbbb9df9d.arrow
  63. 2023-05-02 10:29:32 INFO [__main__] Done preprocessing
  64. 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached split indices for dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-8ed636b6f0cf0253.arrow and /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-5b275d686d6d0049.arrow
  65. 2023-05-02 10:29:32 INFO [__main__] Train data size: 1020
  66. 2023-05-02 10:29:32 INFO [__main__] Test data size: 10
  67. 2023-05-02 10:29:32 INFO [torch.distributed.distributed_c10d] Added key: store_based_barrier_key:1 to store for rank: 1
  68. 2023-05-02 10:29:32 WARNING [datasets.builder] Found cached dataset parquet (/root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
  69. 100%|████████████████████████████████████████████| 1/1 [00:00<00:00, 534.24it/s]
  70. 2023-05-02 10:29:32 INFO [__main__] Found 1032 rows
  71. 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-373a08c6145e0816.arrow
  72. 2023-05-02 10:29:32 INFO [__main__] Preprocessing dataset
  73. 2023-05-02 10:29:32 WARNING [datasets.builder] Found cached dataset parquet (/root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
  74. 100%|████████████████████████████████████████████| 1/1 [00:00<00:00, 541.97it/s]
  75. 2023-05-02 10:29:32 INFO [__main__] Found 1032 rows
  76. 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-373a08c6145e0816.arrow
  77. 2023-05-02 10:29:32 INFO [__main__] Preprocessing dataset
  78. 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-c25c16c83a622b7b.arrow
  79. 2023-05-02 10:29:32 INFO [__main__] Processed dataset has 1032 rows
  80. 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-462aabd08dd4ec96.arrow
  81. 2023-05-02 10:29:32 INFO [__main__] Processed dataset has 1030 rows after filtering for truncated records
  82. 2023-05-02 10:29:32 INFO [__main__] Shuffling dataset
  83. 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-2ef5f0cbbbb9df9d.arrow
  84. 2023-05-02 10:29:32 INFO [__main__] Done preprocessing
  85. 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached split indices for dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-8ed636b6f0cf0253.arrow and /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-5b275d686d6d0049.arrow
  86. 2023-05-02 10:29:32 INFO [__main__] Train data size: 1020
  87. 2023-05-02 10:29:32 INFO [__main__] Test data size: 10
  88. 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-c25c16c83a622b7b.arrow
  89. 2023-05-02 10:29:32 INFO [__main__] Processed dataset has 1032 rows
  90. 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-462aabd08dd4ec96.arrow
  91. 2023-05-02 10:29:32 INFO [__main__] Processed dataset has 1030 rows after filtering for truncated records
  92. 2023-05-02 10:29:32 INFO [__main__] Shuffling dataset
  93. 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-2ef5f0cbbbb9df9d.arrow
  94. 2023-05-02 10:29:32 INFO [__main__] Done preprocessing
  95. 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached split indices for dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-8ed636b6f0cf0253.arrow and /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-5b275d686d6d0049.arrow
  96. 2023-05-02 10:29:32 INFO [__main__] Train data size: 1020
  97. 2023-05-02 10:29:32 INFO [__main__] Test data size: 10
  98. 2023-05-02 10:29:32 INFO [torch.distributed.distributed_c10d] Added key: store_based_barrier_key:1 to store for rank: 2
  99. 2023-05-02 10:29:32 INFO [torch.distributed.distributed_c10d] Added key: store_based_barrier_key:1 to store for rank: 3
  100. 2023-05-02 10:29:32 INFO [torch.distributed.distributed_c10d] Added key: store_based_barrier_key:1 to store for rank: 0
  101. 2023-05-02 10:29:32 INFO [torch.distributed.distributed_c10d] Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
  102. 2023-05-02 10:29:32 INFO [torch.distributed.distributed_c10d] Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
  103. 2023-05-02 10:29:32 INFO [torch.distributed.distributed_c10d] Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
  104. 2023-05-02 10:29:32 INFO [torch.distributed.distributed_c10d] Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
  105. 2023-05-02 10:29:33 INFO [__main__] Instantiating Trainer
  106. 2023-05-02 10:29:33 INFO [__main__] Instantiating Trainer
  107. 2023-05-02 10:29:33 INFO [__main__] Instantiating Trainer
  108. 2023-05-02 10:29:33 INFO [__main__] Instantiating Trainer
  109. 2023-05-02 10:29:33 INFO [__main__] Training
  110. 2023-05-02 10:29:33 INFO [__main__] Training
  111. 2023-05-02 10:29:33 INFO [__main__] Training
  112. 2023-05-02 10:29:33 INFO [__main__] Training
  113. 2023-05-02 10:29:41 INFO [torch.distributed.distributed_c10d] Added key: store_based_barrier_key:2 to store for rank: 3
  114. 2023-05-02 10:29:41 INFO [torch.distributed.distributed_c10d] Added key: store_based_barrier_key:2 to store for rank: 2
  115. 2023-05-02 10:29:41 INFO [torch.distributed.distributed_c10d] Added key: store_based_barrier_key:2 to store for rank: 0
  116. 2023-05-02 10:29:41 INFO [torch.distributed.distributed_c10d] Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes.
  117. 2023-05-02 10:29:41 INFO [torch.distributed.distributed_c10d] Added key: store_based_barrier_key:2 to store for rank: 1
  118. 2023-05-02 10:29:41 INFO [torch.distributed.distributed_c10d] Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes.
  119. 2023-05-02 10:29:41 INFO [torch.distributed.distributed_c10d] Rank 2: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes.
  120. 2023-05-02 10:29:41 INFO [torch.distributed.distributed_c10d] Rank 3: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes.
  121. Installed CUDA version 11.3 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combinationInstalled CUDA version 11.3 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
  122.  
  123. Installed CUDA version 11.3 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combinationInstalled CUDA version 11.3 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
  124.  
  125. Using /root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
  126. Using /root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
  127. Using /root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
  128. Using /root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
  129. Detected CUDA files, patching ldflags
  130. Emitting ninja build file /root/.cache/torch_extensions/py39_cu117/cpu_adam/build.ninja...
  131. Building extension module cpu_adam...
  132. Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
  133. ninja: no work to do.
  134. Loading extension module cpu_adam...
  135. Time to load cpu_adam op: 2.78975510597229 seconds
  136. Loading extension module cpu_adam...
  137. Time to load cpu_adam op: 2.8339414596557617 seconds
  138. Loading extension module cpu_adam...
  139. Time to load cpu_adam op: 2.8468267917633057 seconds
  140. Loading extension module cpu_adam...
  141. Time to load cpu_adam op: 2.8517279624938965 seconds
  142. Using /root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
  143. Using /root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
  144. Using /root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
  145. Using /root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
  146. Emitting ninja build file /root/.cache/torch_extensions/py39_cu117/utils/build.ninja...
  147. Building extension module utils...
  148. Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
  149. ninja: no work to do.
  150. Loading extension module utils...
  151. Time to load utils op: 0.4267737865447998 seconds
  152. Loading extension module utils...
  153. Time to load utils op: 0.30211973190307617 seconds
  154. Loading extension module utils...
  155. Loading extension module utils...
  156. Time to load utils op: 0.5031204223632812 seconds
  157. Time to load utils op: 0.5025167465209961 seconds
  158. Parameter Offload: Total persistent parameters: 1712128 in 258 params
  159. [2023-05-02 10:31:47,581] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 6014
  160. [2023-05-02 10:31:50,883] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 6015
  161. [2023-05-02 10:31:54,180] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 6016
  162. [2023-05-02 10:31:57,557] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 6017
  163. [2023-05-02 10:31:57,558] [ERROR] [launch.py:324:sigkill_handler] ['/local_disk0/.ephemeral_nfs/envs/pythonEnv-e1f29474-0171-420e-8b03-5f4b03ffc5cd/bin/python', '-u', '-m', 'training.trainer', '--local_rank=3', '--input-model', 'Databricks/dolly-v2-7b', '--deepspeed', '/Workspace/Repos/opyate@gmail.com/dolly/config/ds_z3_bf16_config.json', '--epochs', '2', '--local-output-dir', '/local_disk0/dolly_training/dolly_mydataset-dolly15kFormat-noJSONSchema__2023-05-02T10:25:17', '--dbfs-output-dir', '/dbfs/dolly_training/dolly_mydataset-dolly15kFormat-noJSONSchema__2023-05-02T10:25:17', '--per-device-train-batch-size', '3', '--per-device-eval-batch-size', '3', '--logging-steps', '10', '--save-steps', '200', '--save-total-limit', '20', '--eval-steps', '50', '--warmup-steps', '50', '--test-size', '10', '--lr', '5e-06'] exits with return code = -9
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement