Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- [2023-05-02 10:25:26,472] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
- [2023-05-02 10:25:26,499] [INFO] [runner.py:550:main] cmd = /local_disk0/.ephemeral_nfs/envs/pythonEnv-e1f29474-0171-420e-8b03-5f4b03ffc5cd/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgM119 --master_addr=127.0.0.1 --master_port=29500 --module --enable_each_rank_log=None training.trainer --input-model Databricks/dolly-v2-7b --deepspeed /Workspace/Repos/opyate@gmail.com/dolly/config/ds_z3_bf16_config.json --epochs 2 --local-output-dir /local_disk0/dolly_training/dolly_mydataset-dolly15kFormat-noJSONSchema__2023-05-02T10:25:17 --dbfs-output-dir /dbfs/dolly_training/dolly_mydataset-dolly15kFormat-noJSONSchema__2023-05-02T10:25:17 --per-device-train-batch-size 3 --per-device-eval-batch-size 3 --logging-steps 10 --save-steps 200 --save-total-limit 20 --eval-steps 50 --warmup-steps 50 --test-size 10 --lr 5e-06
- [2023-05-02 10:25:29,906] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3]}
- [2023-05-02 10:25:29,906] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=4, node_rank=0
- [2023-05-02 10:25:29,906] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3]})
- [2023-05-02 10:25:29,906] [INFO] [launch.py:162:main] dist_world_size=4
- [2023-05-02 10:25:29,906] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3
- 2023-05-02 10:25:32.315847: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
- To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
- 2023-05-02 10:25:32.339317: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
- To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
- 2023-05-02 10:25:32.367056: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
- To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
- 2023-05-02 10:25:32.378277: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
- To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
- 2023-05-02 10:25:40 INFO [__main__] Loading tokenizer for Databricks/dolly-v2-7b
- 2023-05-02 10:25:40 INFO [__main__] Loading tokenizer for Databricks/dolly-v2-7b
- 2023-05-02 10:25:40 INFO [__main__] Loading tokenizer for Databricks/dolly-v2-7b
- 2023-05-02 10:25:40 INFO [__main__] Loading tokenizer for Databricks/dolly-v2-7b
- 2023-05-02 10:25:41 INFO [__main__] Loading model for Databricks/dolly-v2-7b
- 2023-05-02 10:25:41 INFO [__main__] Loading model for Databricks/dolly-v2-7b
- 2023-05-02 10:25:41 INFO [__main__] Loading model for Databricks/dolly-v2-7b
- 2023-05-02 10:25:41 INFO [__main__] Loading model for Databricks/dolly-v2-7b
- 2023-05-02 10:29:30 INFO [__main__] Found max lenth: 2048
- 2023-05-02 10:29:30 INFO [__main__] Checking if dataset is specific via env var DATASET_NAME
- 2023-05-02 10:29:30 INFO [__main__] Yes: opyate/mydataset-dolly15kFormat-noJSONSchema
- 2023-05-02 10:29:31 WARNING [datasets.builder] Found cached dataset parquet (/root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
- 100%|█████████████████████████████████████████████| 1/1 [00:00<00:00, 97.22it/s]
- 2023-05-02 10:29:31 INFO [__main__] Found 1032 rows
- 2023-05-02 10:29:31 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-373a08c6145e0816.arrow
- 2023-05-02 10:29:31 INFO [__main__] Preprocessing dataset
- 2023-05-02 10:29:31 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-c25c16c83a622b7b.arrow
- 2023-05-02 10:29:31 INFO [__main__] Processed dataset has 1032 rows
- 2023-05-02 10:29:31 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-462aabd08dd4ec96.arrow
- 2023-05-02 10:29:31 INFO [__main__] Processed dataset has 1030 rows after filtering for truncated records
- 2023-05-02 10:29:31 INFO [__main__] Shuffling dataset
- 2023-05-02 10:29:31 WARNING [datasets.arrow_dataset] Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-2ef5f0cbbbb9df9d.arrow
- 2023-05-02 10:29:31 INFO [__main__] Done preprocessing
- 2023-05-02 10:29:31 WARNING [datasets.arrow_dataset] Loading cached split indices for dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-8ed636b6f0cf0253.arrow and /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-5b275d686d6d0049.arrow
- 2023-05-02 10:29:31 INFO [__main__] Train data size: 1020
- 2023-05-02 10:29:31 INFO [__main__] Test data size: 10
- [2023-05-02 10:29:31,603] [INFO] [comm.py:652:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
- 2023-05-02 10:29:31 INFO [__main__] Found max lenth: 2048
- 2023-05-02 10:29:31 INFO [__main__] Checking if dataset is specific via env var DATASET_NAME
- 2023-05-02 10:29:31 INFO [__main__] Yes: opyate/mydataset-dolly15kFormat-noJSONSchema
- 2023-05-02 10:29:31 INFO [__main__] Found max lenth: 2048
- 2023-05-02 10:29:31 INFO [__main__] Checking if dataset is specific via env var DATASET_NAME
- 2023-05-02 10:29:31 INFO [__main__] Yes: opyate/mydataset-dolly15kFormat-noJSONSchema
- 2023-05-02 10:29:31 INFO [__main__] Found max lenth: 2048
- 2023-05-02 10:29:31 INFO [__main__] Checking if dataset is specific via env var DATASET_NAME
- 2023-05-02 10:29:31 INFO [__main__] Yes: opyate/mydataset-dolly15kFormat-noJSONSchema
- 2023-05-02 10:29:32 WARNING [datasets.builder] Found cached dataset parquet (/root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
- 100%|████████████████████████████████████████████| 1/1 [00:00<00:00, 538.01it/s]
- 2023-05-02 10:29:32 INFO [__main__] Found 1032 rows
- 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-373a08c6145e0816.arrow
- 2023-05-02 10:29:32 INFO [__main__] Preprocessing dataset
- 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-c25c16c83a622b7b.arrow
- 2023-05-02 10:29:32 INFO [__main__] Processed dataset has 1032 rows
- 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-462aabd08dd4ec96.arrow
- 2023-05-02 10:29:32 INFO [__main__] Processed dataset has 1030 rows after filtering for truncated records
- 2023-05-02 10:29:32 INFO [__main__] Shuffling dataset
- 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-2ef5f0cbbbb9df9d.arrow
- 2023-05-02 10:29:32 INFO [__main__] Done preprocessing
- 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached split indices for dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-8ed636b6f0cf0253.arrow and /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-5b275d686d6d0049.arrow
- 2023-05-02 10:29:32 INFO [__main__] Train data size: 1020
- 2023-05-02 10:29:32 INFO [__main__] Test data size: 10
- 2023-05-02 10:29:32 INFO [torch.distributed.distributed_c10d] Added key: store_based_barrier_key:1 to store for rank: 1
- 2023-05-02 10:29:32 WARNING [datasets.builder] Found cached dataset parquet (/root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
- 100%|████████████████████████████████████████████| 1/1 [00:00<00:00, 534.24it/s]
- 2023-05-02 10:29:32 INFO [__main__] Found 1032 rows
- 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-373a08c6145e0816.arrow
- 2023-05-02 10:29:32 INFO [__main__] Preprocessing dataset
- 2023-05-02 10:29:32 WARNING [datasets.builder] Found cached dataset parquet (/root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
- 100%|████████████████████████████████████████████| 1/1 [00:00<00:00, 541.97it/s]
- 2023-05-02 10:29:32 INFO [__main__] Found 1032 rows
- 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-373a08c6145e0816.arrow
- 2023-05-02 10:29:32 INFO [__main__] Preprocessing dataset
- 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-c25c16c83a622b7b.arrow
- 2023-05-02 10:29:32 INFO [__main__] Processed dataset has 1032 rows
- 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-462aabd08dd4ec96.arrow
- 2023-05-02 10:29:32 INFO [__main__] Processed dataset has 1030 rows after filtering for truncated records
- 2023-05-02 10:29:32 INFO [__main__] Shuffling dataset
- 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-2ef5f0cbbbb9df9d.arrow
- 2023-05-02 10:29:32 INFO [__main__] Done preprocessing
- 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached split indices for dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-8ed636b6f0cf0253.arrow and /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-5b275d686d6d0049.arrow
- 2023-05-02 10:29:32 INFO [__main__] Train data size: 1020
- 2023-05-02 10:29:32 INFO [__main__] Test data size: 10
- 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-c25c16c83a622b7b.arrow
- 2023-05-02 10:29:32 INFO [__main__] Processed dataset has 1032 rows
- 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached processed dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-462aabd08dd4ec96.arrow
- 2023-05-02 10:29:32 INFO [__main__] Processed dataset has 1030 rows after filtering for truncated records
- 2023-05-02 10:29:32 INFO [__main__] Shuffling dataset
- 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-2ef5f0cbbbb9df9d.arrow
- 2023-05-02 10:29:32 INFO [__main__] Done preprocessing
- 2023-05-02 10:29:32 WARNING [datasets.arrow_dataset] Loading cached split indices for dataset at /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-8ed636b6f0cf0253.arrow and /root/.cache/huggingface/datasets/opyate___parquet/opyate--mydataset-dolly15kFormat-noJSONSchema-66b5aa1793cd296a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-5b275d686d6d0049.arrow
- 2023-05-02 10:29:32 INFO [__main__] Train data size: 1020
- 2023-05-02 10:29:32 INFO [__main__] Test data size: 10
- 2023-05-02 10:29:32 INFO [torch.distributed.distributed_c10d] Added key: store_based_barrier_key:1 to store for rank: 2
- 2023-05-02 10:29:32 INFO [torch.distributed.distributed_c10d] Added key: store_based_barrier_key:1 to store for rank: 3
- 2023-05-02 10:29:32 INFO [torch.distributed.distributed_c10d] Added key: store_based_barrier_key:1 to store for rank: 0
- 2023-05-02 10:29:32 INFO [torch.distributed.distributed_c10d] Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
- 2023-05-02 10:29:32 INFO [torch.distributed.distributed_c10d] Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
- 2023-05-02 10:29:32 INFO [torch.distributed.distributed_c10d] Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
- 2023-05-02 10:29:32 INFO [torch.distributed.distributed_c10d] Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
- 2023-05-02 10:29:33 INFO [__main__] Instantiating Trainer
- 2023-05-02 10:29:33 INFO [__main__] Instantiating Trainer
- 2023-05-02 10:29:33 INFO [__main__] Instantiating Trainer
- 2023-05-02 10:29:33 INFO [__main__] Instantiating Trainer
- 2023-05-02 10:29:33 INFO [__main__] Training
- 2023-05-02 10:29:33 INFO [__main__] Training
- 2023-05-02 10:29:33 INFO [__main__] Training
- 2023-05-02 10:29:33 INFO [__main__] Training
- 2023-05-02 10:29:41 INFO [torch.distributed.distributed_c10d] Added key: store_based_barrier_key:2 to store for rank: 3
- 2023-05-02 10:29:41 INFO [torch.distributed.distributed_c10d] Added key: store_based_barrier_key:2 to store for rank: 2
- 2023-05-02 10:29:41 INFO [torch.distributed.distributed_c10d] Added key: store_based_barrier_key:2 to store for rank: 0
- 2023-05-02 10:29:41 INFO [torch.distributed.distributed_c10d] Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes.
- 2023-05-02 10:29:41 INFO [torch.distributed.distributed_c10d] Added key: store_based_barrier_key:2 to store for rank: 1
- 2023-05-02 10:29:41 INFO [torch.distributed.distributed_c10d] Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes.
- 2023-05-02 10:29:41 INFO [torch.distributed.distributed_c10d] Rank 2: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes.
- 2023-05-02 10:29:41 INFO [torch.distributed.distributed_c10d] Rank 3: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes.
- Installed CUDA version 11.3 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combinationInstalled CUDA version 11.3 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
- Installed CUDA version 11.3 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combinationInstalled CUDA version 11.3 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
- Using /root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
- Using /root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
- Using /root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
- Using /root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
- Detected CUDA files, patching ldflags
- Emitting ninja build file /root/.cache/torch_extensions/py39_cu117/cpu_adam/build.ninja...
- Building extension module cpu_adam...
- Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
- ninja: no work to do.
- Loading extension module cpu_adam...
- Time to load cpu_adam op: 2.78975510597229 seconds
- Loading extension module cpu_adam...
- Time to load cpu_adam op: 2.8339414596557617 seconds
- Loading extension module cpu_adam...
- Time to load cpu_adam op: 2.8468267917633057 seconds
- Loading extension module cpu_adam...
- Time to load cpu_adam op: 2.8517279624938965 seconds
- Using /root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
- Using /root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
- Using /root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
- Using /root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
- Emitting ninja build file /root/.cache/torch_extensions/py39_cu117/utils/build.ninja...
- Building extension module utils...
- Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
- ninja: no work to do.
- Loading extension module utils...
- Time to load utils op: 0.4267737865447998 seconds
- Loading extension module utils...
- Time to load utils op: 0.30211973190307617 seconds
- Loading extension module utils...
- Loading extension module utils...
- Time to load utils op: 0.5031204223632812 seconds
- Time to load utils op: 0.5025167465209961 seconds
- Parameter Offload: Total persistent parameters: 1712128 in 258 params
- [2023-05-02 10:31:47,581] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 6014
- [2023-05-02 10:31:50,883] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 6015
- [2023-05-02 10:31:54,180] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 6016
- [2023-05-02 10:31:57,557] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 6017
- [2023-05-02 10:31:57,558] [ERROR] [launch.py:324:sigkill_handler] ['/local_disk0/.ephemeral_nfs/envs/pythonEnv-e1f29474-0171-420e-8b03-5f4b03ffc5cd/bin/python', '-u', '-m', 'training.trainer', '--local_rank=3', '--input-model', 'Databricks/dolly-v2-7b', '--deepspeed', '/Workspace/Repos/opyate@gmail.com/dolly/config/ds_z3_bf16_config.json', '--epochs', '2', '--local-output-dir', '/local_disk0/dolly_training/dolly_mydataset-dolly15kFormat-noJSONSchema__2023-05-02T10:25:17', '--dbfs-output-dir', '/dbfs/dolly_training/dolly_mydataset-dolly15kFormat-noJSONSchema__2023-05-02T10:25:17', '--per-device-train-batch-size', '3', '--per-device-eval-batch-size', '3', '--logging-steps', '10', '--save-steps', '200', '--save-total-limit', '20', '--eval-steps', '50', '--warmup-steps', '50', '--test-size', '10', '--lr', '5e-06'] exits with return code = -9
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement