Guest User

stage3nvme

a guest
Jul 9th, 2022
52
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Bash 14.76 KB | None | 0 0
  1. ; parser.add_argument("--strategy", default=DeepSpeedStrategy(
  2. ;                                                           stage=3,
  3. ;                                                           offload_optimizer=True,
  4. ;                                                           offload_parameters=True,
  5. ;                                                           params_buffer_size = 150_000_000,
  6. ;                                                           logging_level="INFO",
  7. ;                                                           remote_device="nvme",
  8. ;                                                           offload_optimizer_device="nvme",
  9. ;                                                           offload_params_device="nvme",
  10. ;                                                           nvme_path="/home/neil/tmp/deepspeed_offloading",
  11. ;                                                       ))
  12. Global seed set to 8653745
  13. GPU available: True, used: True
  14. TPU available: False, using: 0 TPU cores
  15. IPU available: False, using: 0 IPUs
  16. HPU available: False, using: 0 HPUs
  17. /home/neil/.pyvenv/ml/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py:131: UserWarning: You passed in a `val_dataloader` but have no `validation_step`. Skipping val loop.
  18.   rank_zero_warn("You passed in a `val_dataloader` but have no `validation_step`. Skipping val loop.")
  19. /home/neil/.pyvenv/ml/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py:412: LightningDeprecationWarning: `LightningDataModule.on_save_checkpoint` was deprecated in v1.6 and will be removed in v1.8. Use `state_dict` instead.
  20.   rank_zero_deprecation(
  21. /home/neil/.pyvenv/ml/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py:417: LightningDeprecationWarning: `LightningDataModule.on_load_checkpoint` was deprecated in v1.6 and will be removed in v1.8. Use `load_state_dict` instead.
  22.   rank_zero_deprecation(
  23. Global seed set to 8653745
  24. initializing deepspeed distributed: GLOBAL_RANK: 0, MEMBER: 1/1
  25. [2022-07-10 11:07:50,532] [INFO] [distributed.py:48:init_distributed] Initializing torch distributed with backend: nccl
  26. [2022-07-10 11:07:50,533] [WARNING] [deepspeed.py:647:_auto_select_batch_size] Tried to infer the batch size for internal deepspeed logging from the `train_dataloader()`. To ensure DeepSpeed logging remains correct, please manually pass the plugin with the batch size, `Trainer(strategy=DeepSpeedStrategy(logging_batch_size_per_gpu=batch_size))`.
  27. Reusing dataset wikitext (/home/neil/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
  28. 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1726.29it/s]
  29. Parameter 'function'=<function Dataset.map.<locals>.decorate.<locals>.decorated at 0x7f7ac46d0a60> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
  30. Loading cached processed dataset at /home/neil/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126/cache-8d4c9428789cfa50.arrow
  31. Loading cached processed dataset at /home/neil/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126/cache-1a6d3236afea204a.arrow
  32. Loading cached processed dataset at /home/neil/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126/cache-ff14772e12a6fc92.arrow
  33. Loading cached processed dataset at /home/neil/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126/cache-3efdba240770b126.arrow
  34. Loading cached processed dataset at /home/neil/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126/cache-090d3d24d784f74e.arrow
  35. Loading cached processed dataset at /home/neil/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126/cache-e0650e6b0992b455.arrow
  36. Estimated memory needed for params, optim states and gradients for a:
  37. HW: Setup with 1 node, 1 GPU per node.
  38. SW: Model with 2651M total params, 128M largest layer params.
  39.   per CPU  |  per GPU |   Options
  40.    66.67GB |   0.48GB | offload_param=cpu , offload_optimizer=cpu , zero_init=1
  41.    66.67GB |   0.48GB | offload_param=cpu , offload_optimizer=cpu , zero_init=0
  42.    59.26GB |   5.42GB | offload_param=none, offload_optimizer=cpu , zero_init=1
  43.    59.26GB |   5.42GB | offload_param=none, offload_optimizer=cpu , zero_init=0
  44.     0.72GB |  44.93GB | offload_param=none, offload_optimizer=none, zero_init=1
  45.    14.82GB |  44.93GB | offload_param=none, offload_optimizer=none, zero_init=0
  46. [2022-07-10 11:07:52,809] [INFO] [utils.py:828:see_memory_usage] after setup
  47. [2022-07-10 11:07:52,810] [INFO] [utils.py:829:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB
  48. [2022-07-10 11:07:52,810] [INFO] [utils.py:837:see_memory_usage] CPU Virtual Memory:  used = 16.1 GB, percent = 25.7%
  49. [2022-07-10 11:07:57,261] [INFO] [utils.py:30:print_object] AsyncPartitionedParameterSwapper:
  50. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
  51. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   aio_handle ................... <class 'async_io.aio_handle'>
  52. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   aligned_bytes ................ 1024
  53. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   aligned_elements_per_buffer .. 150000128
  54. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   available_buffer_ids ......... [0, 1, 2, 3, 4]
  55. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   available_numel .............. 0
  56. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   available_params ............. set()
  57. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   dtype ........................ torch.float32
  58. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   elements_per_buffer .......... 150000000
  59. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   id_to_path ................... {}
  60. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   inflight_numel ............... 0
  61. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   inflight_params .............. []
  62. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   inflight_swap_in_buffers ..... []
  63. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   invalid_buffer ............... 1.0
  64. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   min_aio_bytes ................ 1048576
  65. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   numel_alignment .............. 256
  66. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   param_buffer_count ........... 5
  67. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   param_id_to_buffer_id ........ {}
  68. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   param_id_to_numel ............ {}
  69. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   param_id_to_swap_buffer ...... {}
  70. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   partitioned_swap_buffer ...... None
  71. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   partitioned_swap_pool ........ None
  72. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   pending_reads ................ 0
  73. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   pending_writes ............... 0
  74. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   reserved_buffer_ids .......... []
  75. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   swap_config .................. {'device': 'nvme', 'nvme_path': '/home/neil/tmp/deepspeed_offloading', 'buffer_count': 5, 'buffer_size': 150000000, 'max_in_cpu': 1000000000, 'pin_memory': False}
  76. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   swap_element_size ............ 4
  77. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   swap_folder .................. /home/neil/tmp/deepspeed_offloading/zero_stage_3/float32params/rank0
  78. [2022-07-10 11:07:57,261] [INFO] [utils.py:34:print_object]   swap_out_params .............. []
  79. [2022-07-10 11:07:57,266] [INFO] [partition_parameters.py:463:__exit__] finished initializing model with 0.00B parameters
  80. LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
  81. Using /home/neil/.cache/torch_extensions/py38_cu116 as PyTorch extensions root...
  82. Detected CUDA files, patching ldflags
  83. Emitting ninja build file /home/neil/.cache/torch_extensions/py38_cu116/cpu_adam/build.ninja...
  84. Building extension module cpu_adam...
  85. Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
  86. ninja: no work to do.
  87. Loading extension module cpu_adam...
  88. Time to load cpu_adam op: 3.0707850456237793 seconds
  89. Adam Optimizer #0 is created with AVX2 arithmetic capability.
  90. Config: alpha=0.001000, betas=(0.900000, 0.999000), weight_decay=0.000500, adam_w=1
  91. [2022-07-10 11:08:01,074] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed info: version=0.6.5, git-hash=unknown, git-branch=unknown
  92. [2022-07-10 11:08:02,033] [INFO] [engine.py:278:__init__] DeepSpeed Flops Profiler Enabled: False
  93. [2022-07-10 11:08:02,033] [INFO] [engine.py:1086:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer
  94. [2022-07-10 11:08:02,033] [INFO] [engine.py:1092:_configure_optimizer] Using client Optimizer as basic optimizer
  95. [2022-07-10 11:08:02,054] [INFO] [engine.py:1108:_configure_optimizer] DeepSpeed Basic Optimizer = DeepSpeedCPUAdam
  96. [2022-07-10 11:08:02,054] [INFO] [utils.py:52:is_zero_supported_optimizer] Checking ZeRO support for optimizer=DeepSpeedCPUAdam type=<class 'deepspeed.ops.adam.cpu_adam.DeepSpeedCPUAdam'>
  97. [2022-07-10 11:08:02,054] [INFO] [logging.py:69:log_dist] [Rank 0] Creating fp16 ZeRO stage 3 optimizer
  98. [2022-07-10 11:08:02,054] [INFO] [engine.py:1410:_configure_zero_optimizer] Initializing ZeRO Stage 3
  99. [2022-07-10 11:08:02,056] [INFO] [stage3.py:275:__init__] Reduce bucket size 200000000
  100. [2022-07-10 11:08:02,056] [INFO] [stage3.py:276:__init__] Prefetch bucket size 50000000
  101. Using /home/neil/.cache/torch_extensions/py38_cu116 as PyTorch extensions root...
  102. Emitting ninja build file /home/neil/.cache/torch_extensions/py38_cu116/utils/build.ninja...
  103. Building extension module utils...
  104. Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
  105. ninja: no work to do.
  106. Loading extension module utils...
  107. Time to load utils op: 0.2862880229949951 seconds
  108. [2022-07-10 11:08:05,541] [INFO] [utils.py:30:print_object] AsyncPartitionedParameterSwapper:
  109. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
  110. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   aio_handle ................... <class 'async_io.aio_handle'>
  111. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   aligned_bytes ................ 1024
  112. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   aligned_elements_per_buffer .. 150000128
  113. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   available_buffer_ids ......... [0, 1, 2, 3, 4]
  114. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   available_numel .............. 0
  115. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   available_params ............. set()
  116. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   dtype ........................ torch.float32
  117. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   elements_per_buffer .......... 150000000
  118. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   id_to_path ................... {}
  119. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   inflight_numel ............... 0
  120. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   inflight_params .............. []
  121. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   inflight_swap_in_buffers ..... []
  122. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   invalid_buffer ............... 1.0
  123. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   min_aio_bytes ................ 1048576
  124. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   numel_alignment .............. 256
  125. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   param_buffer_count ........... 5
  126. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   param_id_to_buffer_id ........ {}
  127. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   param_id_to_numel ............ {}
  128. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   param_id_to_swap_buffer ...... {}
  129. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   partitioned_swap_buffer ...... None
  130. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   partitioned_swap_pool ........ None
  131. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   pending_reads ................ 0
  132. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   pending_writes ............... 0
  133. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   reserved_buffer_ids .......... []
  134. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   swap_config .................. {'device': 'nvme', 'nvme_path': '/home/neil/tmp/deepspeed_offloading', 'buffer_count': 5, 'buffer_size': 150000000, 'max_in_cpu': 1000000000, 'pin_memory': False}
  135. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   swap_element_size ............ 4
  136. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   swap_folder .................. /home/neil/tmp/deepspeed_offloading/zero_stage_3/float32params/rank0
  137. [2022-07-10 11:08:05,541] [INFO] [utils.py:34:print_object]   swap_out_params .............. []
  138. [2022-07-10 11:08:13,713] [INFO] [stage3.py:713:_configure_tensor_swapping] Tensor Swapping: Adding optimizer tensors
  139. Killed
Advertisement
Add Comment
Please, Sign In to add comment