Advertisement
kjetilk

Dreambooth+Lora run 1

Dec 24th, 2023
29
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 57.34 KB | Software | 0 0
  1. accelerate launch train_dreambooth.py --pretrained_model_name_or_path=$MODEL_NAME --instance_data_dir=$INSTANCE_DIR --class_data_dir=$CLASS_DIR --output_dir=$OUTPUT_DIR --train_text_encoder --with_prior_preservation --prior_loss_weight=1.0 --num_dataloader_workers=1 --instance_prompt="a photo of lyra dog" --class_prompt="a photo of dog" --resolution=512 --train_batch_size=1 --lr_scheduler="constant" --lr_warmup_steps=0 --num_class_images=200 --use_lora --lora_r 16 --lora_alpha 27 --lora_text_encoder_r 16 --lora_text_encoder_alpha 17 --learning_rate=1e-4 --gradient_accumulation_steps=1 --gradient_checkpointing --max_train_steps=800
  2. /mnt/ssd1/home/kjetil/dev/lora_dreambooth/venv/lib/python3.11/site-packages/torch/cuda/__init__.py:611: UserWarning: Can't initialize NVML
  3. warnings.warn("Can't initialize NVML")
  4. /mnt/ssd1/home/kjetil/dev/lora_dreambooth/venv/lib/python3.11/site-packages/torch/cuda/__init__.py:611: UserWarning: Can't initialize NVML
  5. warnings.warn("Can't initialize NVML")
  6. /mnt/ssd1/home/kjetil/dev/lora_dreambooth/venv/lib/python3.11/site-packages/accelerate/accelerator.py:384: UserWarning: `log_with=tensorboard` was passed but no supported trackers are currently installed.
  7. warnings.warn(f"`log_with={log_with}` was passed but no supported trackers are currently installed.")
  8. 12/22/2023 02:48:01 - INFO - __main__ - Distributed environment: DistributedType.NO
  9. Num processes: 1
  10. Process index: 0
  11. Local process index: 0
  12. Device: cpu
  13.  
  14. Mixed precision type: no
  15.  
  16. diffusion_pytorch_model.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████| 3.44G/3.44G [10:36<00:00, 5.40MB/s]
  17. Fetching 14 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [10:36<00:00, 45.49s/it]
  18. {'image_encoder', 'requires_safety_checker'} was not found in config. Values will be initialized to default values.
  19. Loading pipeline components...: 0%| | 0/6 [00:00<?, ?it/s]{'dual_cross_attention', 'time_cond_proj_dim', 'class_embed_type', 'time_embedding_type', 'addition_embed_type_num_heads', 'attention_type', 'dropout', 'mid_block_type', 'reverse_transformer_layers_per_block', 'timestep_post_act', 'mid_block_only_cross_attention', 'conv_out_kernel', 'encoder_hid_dim', 'addition_time_embed_dim', 'class_embeddings_concat', 'cross_attention_norm', 'resnet_out_scale_factor', 'encoder_hid_dim_type', 'addition_embed_type', 'time_embedding_dim', 'num_attention_heads', 'only_cross_attention', 'resnet_skip_time_act', 'num_class_embeds', 'use_linear_projection', 'projection_class_embeddings_input_dim', 'conv_in_kernel', 'transformer_layers_per_block', 'resnet_time_scale_shift', 'upcast_attention', 'time_embedding_act_fn'} was not found in config. Values will be initialized to default values.
  20. Loaded unet as UNet2DConditionModel from `unet` subfolder of CompVis/stable-diffusion-v1-4.
  21. Loading pipeline components...: 17%|███████████████ | 1/6 [00:00<00:01, 4.98it/s]{'force_upcast', 'norm_num_groups'} was not found in config. Values will be initialized to default values.
  22. Loaded vae as AutoencoderKL from `vae` subfolder of CompVis/stable-diffusion-v1-4.
  23. Loading pipeline components...: 33%|██████████████████████████████ | 2/6 [00:01<00:03, 1.19it/s]Loaded feature_extractor as CLIPImageProcessor from `feature_extractor` subfolder of CompVis/stable-diffusion-v1-4.
  24. Loaded tokenizer as CLIPTokenizer from `tokenizer` subfolder of CompVis/stable-diffusion-v1-4.
  25. Loading pipeline components...: 67%|████████████████████████████████████████████████████████████ | 4/6 [00:01<00:00, 2.79it/s]Loaded text_encoder as CLIPTextModel from `text_encoder` subfolder of CompVis/stable-diffusion-v1-4.
  26. Loading pipeline components...: 83%|███████████████████████████████████████████████████████████████████████████ | 5/6 [00:02<00:00, 1.76it/s]{'prediction_type', 'timestep_spacing'} was not found in config. Values will be initialized to default values.
  27. Loaded scheduler as PNDMScheduler from `scheduler` subfolder of CompVis/stable-diffusion-v1-4.
  28. Loading pipeline components...: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:02<00:00, 2.24it/s]
  29. You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
  30. 12/22/2023 02:58:42 - INFO - __main__ - Number of class images to sample: 200.
  31. Generating class images: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 50/50 [51:10:16<00:00, 3684.34s/it]
  32. You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
  33. {'force_upcast', 'norm_num_groups'} was not found in config. Values will be initialized to default values.
  34. {'dual_cross_attention', 'time_cond_proj_dim', 'class_embed_type', 'time_embedding_type', 'addition_embed_type_num_heads', 'attention_type', 'dropout', 'mid_block_type', 'reverse_transformer_layers_per_block', 'timestep_post_act', 'mid_block_only_cross_attention', 'conv_out_kernel', 'encoder_hid_dim', 'addition_time_embed_dim', 'class_embeddings_concat', 'cross_attention_norm', 'resnet_out_scale_factor', 'encoder_hid_dim_type', 'addition_embed_type', 'time_embedding_dim', 'num_attention_heads', 'only_cross_attention', 'resnet_skip_time_act', 'num_class_embeds', 'use_linear_projection', 'projection_class_embeddings_input_dim', 'conv_in_kernel', 'transformer_layers_per_block', 'resnet_time_scale_shift', 'upcast_attention', 'time_embedding_act_fn'} was not found in config. Values will be initialized to default values.
  35. trainable params: 1,594,368 || all params: 861,115,332 || trainable%: 0.18515150535027286
  36. PeftModel(
  37. (base_model): LoraModel(
  38. (model): UNet2DConditionModel(
  39. (conv_in): Conv2d(4, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  40. (time_proj): Timesteps()
  41. (time_embedding): TimestepEmbedding(
  42. (linear_1): Linear(in_features=320, out_features=1280, bias=True)
  43. (act): SiLU()
  44. (linear_2): Linear(in_features=1280, out_features=1280, bias=True)
  45. )
  46. (down_blocks): ModuleList(
  47. (0): CrossAttnDownBlock2D(
  48. (attentions): ModuleList(
  49. (0-1): 2 x Transformer2DModel(
  50. (norm): GroupNorm(32, 320, eps=1e-06, affine=True)
  51. (proj_in): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
  52. (transformer_blocks): ModuleList(
  53. (0): BasicTransformerBlock(
  54. (norm1): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
  55. (attn1): Attention(
  56. (to_q): lora.Linear(
  57. (base_layer): Linear(in_features=320, out_features=320, bias=False)
  58. (lora_dropout): ModuleDict(
  59. (default): Identity()
  60. )
  61. (lora_A): ModuleDict(
  62. (default): Linear(in_features=320, out_features=16, bias=False)
  63. )
  64. (lora_B): ModuleDict(
  65. (default): Linear(in_features=16, out_features=320, bias=False)
  66. )
  67. (lora_embedding_A): ParameterDict()
  68. (lora_embedding_B): ParameterDict()
  69. )
  70. (to_k): Linear(in_features=320, out_features=320, bias=False)
  71. (to_v): lora.Linear(
  72. (base_layer): Linear(in_features=320, out_features=320, bias=False)
  73. (lora_dropout): ModuleDict(
  74. (default): Identity()
  75. )
  76. (lora_A): ModuleDict(
  77. (default): Linear(in_features=320, out_features=16, bias=False)
  78. )
  79. (lora_B): ModuleDict(
  80. (default): Linear(in_features=16, out_features=320, bias=False)
  81. )
  82. (lora_embedding_A): ParameterDict()
  83. (lora_embedding_B): ParameterDict()
  84. )
  85. (to_out): ModuleList(
  86. (0): Linear(in_features=320, out_features=320, bias=True)
  87. (1): Dropout(p=0.0, inplace=False)
  88. )
  89. )
  90. (norm2): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
  91. (attn2): Attention(
  92. (to_q): lora.Linear(
  93. (base_layer): Linear(in_features=320, out_features=320, bias=False)
  94. (lora_dropout): ModuleDict(
  95. (default): Identity()
  96. )
  97. (lora_A): ModuleDict(
  98. (default): Linear(in_features=320, out_features=16, bias=False)
  99. )
  100. (lora_B): ModuleDict(
  101. (default): Linear(in_features=16, out_features=320, bias=False)
  102. )
  103. (lora_embedding_A): ParameterDict()
  104. (lora_embedding_B): ParameterDict()
  105. )
  106. (to_k): Linear(in_features=768, out_features=320, bias=False)
  107. (to_v): lora.Linear(
  108. (base_layer): Linear(in_features=768, out_features=320, bias=False)
  109. (lora_dropout): ModuleDict(
  110. (default): Identity()
  111. )
  112. (lora_A): ModuleDict(
  113. (default): Linear(in_features=768, out_features=16, bias=False)
  114. )
  115. (lora_B): ModuleDict(
  116. (default): Linear(in_features=16, out_features=320, bias=False)
  117. )
  118. (lora_embedding_A): ParameterDict()
  119. (lora_embedding_B): ParameterDict()
  120. )
  121. (to_out): ModuleList(
  122. (0): Linear(in_features=320, out_features=320, bias=True)
  123. (1): Dropout(p=0.0, inplace=False)
  124. )
  125. )
  126. (norm3): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
  127. (ff): FeedForward(
  128. (net): ModuleList(
  129. (0): GEGLU(
  130. (proj): Linear(in_features=320, out_features=2560, bias=True)
  131. )
  132. (1): Dropout(p=0.0, inplace=False)
  133. (2): Linear(in_features=1280, out_features=320, bias=True)
  134. )
  135. )
  136. )
  137. )
  138. (proj_out): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
  139. )
  140. )
  141. (resnets): ModuleList(
  142. (0-1): 2 x ResnetBlock2D(
  143. (norm1): GroupNorm(32, 320, eps=1e-05, affine=True)
  144. (conv1): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  145. (time_emb_proj): Linear(in_features=1280, out_features=320, bias=True)
  146. (norm2): GroupNorm(32, 320, eps=1e-05, affine=True)
  147. (dropout): Dropout(p=0.0, inplace=False)
  148. (conv2): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  149. (nonlinearity): SiLU()
  150. )
  151. )
  152. (downsamplers): ModuleList(
  153. (0): Downsample2D(
  154. (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  155. )
  156. )
  157. )
  158. (1): CrossAttnDownBlock2D(
  159. (attentions): ModuleList(
  160. (0-1): 2 x Transformer2DModel(
  161. (norm): GroupNorm(32, 640, eps=1e-06, affine=True)
  162. (proj_in): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
  163. (transformer_blocks): ModuleList(
  164. (0): BasicTransformerBlock(
  165. (norm1): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
  166. (attn1): Attention(
  167. (to_q): lora.Linear(
  168. (base_layer): Linear(in_features=640, out_features=640, bias=False)
  169. (lora_dropout): ModuleDict(
  170. (default): Identity()
  171. )
  172. (lora_A): ModuleDict(
  173. (default): Linear(in_features=640, out_features=16, bias=False)
  174. )
  175. (lora_B): ModuleDict(
  176. (default): Linear(in_features=16, out_features=640, bias=False)
  177. )
  178. (lora_embedding_A): ParameterDict()
  179. (lora_embedding_B): ParameterDict()
  180. )
  181. (to_k): Linear(in_features=640, out_features=640, bias=False)
  182. (to_v): lora.Linear(
  183. (base_layer): Linear(in_features=640, out_features=640, bias=False)
  184. (lora_dropout): ModuleDict(
  185. (default): Identity()
  186. )
  187. (lora_A): ModuleDict(
  188. (default): Linear(in_features=640, out_features=16, bias=False)
  189. )
  190. (lora_B): ModuleDict(
  191. (default): Linear(in_features=16, out_features=640, bias=False)
  192. )
  193. (lora_embedding_A): ParameterDict()
  194. (lora_embedding_B): ParameterDict()
  195. )
  196. (to_out): ModuleList(
  197. (0): Linear(in_features=640, out_features=640, bias=True)
  198. (1): Dropout(p=0.0, inplace=False)
  199. )
  200. )
  201. (norm2): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
  202. (attn2): Attention(
  203. (to_q): lora.Linear(
  204. (base_layer): Linear(in_features=640, out_features=640, bias=False)
  205. (lora_dropout): ModuleDict(
  206. (default): Identity()
  207. )
  208. (lora_A): ModuleDict(
  209. (default): Linear(in_features=640, out_features=16, bias=False)
  210. )
  211. (lora_B): ModuleDict(
  212. (default): Linear(in_features=16, out_features=640, bias=False)
  213. )
  214. (lora_embedding_A): ParameterDict()
  215. (lora_embedding_B): ParameterDict()
  216. )
  217. (to_k): Linear(in_features=768, out_features=640, bias=False)
  218. (to_v): lora.Linear(
  219. (base_layer): Linear(in_features=768, out_features=640, bias=False)
  220. (lora_dropout): ModuleDict(
  221. (default): Identity()
  222. )
  223. (lora_A): ModuleDict(
  224. (default): Linear(in_features=768, out_features=16, bias=False)
  225. )
  226. (lora_B): ModuleDict(
  227. (default): Linear(in_features=16, out_features=640, bias=False)
  228. )
  229. (lora_embedding_A): ParameterDict()
  230. (lora_embedding_B): ParameterDict()
  231. )
  232. (to_out): ModuleList(
  233. (0): Linear(in_features=640, out_features=640, bias=True)
  234. (1): Dropout(p=0.0, inplace=False)
  235. )
  236. )
  237. (norm3): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
  238. (ff): FeedForward(
  239. (net): ModuleList(
  240. (0): GEGLU(
  241. (proj): Linear(in_features=640, out_features=5120, bias=True)
  242. )
  243. (1): Dropout(p=0.0, inplace=False)
  244. (2): Linear(in_features=2560, out_features=640, bias=True)
  245. )
  246. )
  247. )
  248. )
  249. (proj_out): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
  250. )
  251. )
  252. (resnets): ModuleList(
  253. (0): ResnetBlock2D(
  254. (norm1): GroupNorm(32, 320, eps=1e-05, affine=True)
  255. (conv1): Conv2d(320, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  256. (time_emb_proj): Linear(in_features=1280, out_features=640, bias=True)
  257. (norm2): GroupNorm(32, 640, eps=1e-05, affine=True)
  258. (dropout): Dropout(p=0.0, inplace=False)
  259. (conv2): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  260. (nonlinearity): SiLU()
  261. (conv_shortcut): Conv2d(320, 640, kernel_size=(1, 1), stride=(1, 1))
  262. )
  263. (1): ResnetBlock2D(
  264. (norm1): GroupNorm(32, 640, eps=1e-05, affine=True)
  265. (conv1): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  266. (time_emb_proj): Linear(in_features=1280, out_features=640, bias=True)
  267. (norm2): GroupNorm(32, 640, eps=1e-05, affine=True)
  268. (dropout): Dropout(p=0.0, inplace=False)
  269. (conv2): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  270. (nonlinearity): SiLU()
  271. )
  272. )
  273. (downsamplers): ModuleList(
  274. (0): Downsample2D(
  275. (conv): Conv2d(640, 640, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  276. )
  277. )
  278. )
  279. (2): CrossAttnDownBlock2D(
  280. (attentions): ModuleList(
  281. (0-1): 2 x Transformer2DModel(
  282. (norm): GroupNorm(32, 1280, eps=1e-06, affine=True)
  283. (proj_in): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
  284. (transformer_blocks): ModuleList(
  285. (0): BasicTransformerBlock(
  286. (norm1): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
  287. (attn1): Attention(
  288. (to_q): lora.Linear(
  289. (base_layer): Linear(in_features=1280, out_features=1280, bias=False)
  290. (lora_dropout): ModuleDict(
  291. (default): Identity()
  292. )
  293. (lora_A): ModuleDict(
  294. (default): Linear(in_features=1280, out_features=16, bias=False)
  295. )
  296. (lora_B): ModuleDict(
  297. (default): Linear(in_features=16, out_features=1280, bias=False)
  298. )
  299. (lora_embedding_A): ParameterDict()
  300. (lora_embedding_B): ParameterDict()
  301. )
  302. (to_k): Linear(in_features=1280, out_features=1280, bias=False)
  303. (to_v): lora.Linear(
  304. (base_layer): Linear(in_features=1280, out_features=1280, bias=False)
  305. (lora_dropout): ModuleDict(
  306. (default): Identity()
  307. )
  308. (lora_A): ModuleDict(
  309. (default): Linear(in_features=1280, out_features=16, bias=False)
  310. )
  311. (lora_B): ModuleDict(
  312. (default): Linear(in_features=16, out_features=1280, bias=False)
  313. )
  314. (lora_embedding_A): ParameterDict()
  315. (lora_embedding_B): ParameterDict()
  316. )
  317. (to_out): ModuleList(
  318. (0): Linear(in_features=1280, out_features=1280, bias=True)
  319. (1): Dropout(p=0.0, inplace=False)
  320. )
  321. )
  322. (norm2): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
  323. (attn2): Attention(
  324. (to_q): lora.Linear(
  325. (base_layer): Linear(in_features=1280, out_features=1280, bias=False)
  326. (lora_dropout): ModuleDict(
  327. (default): Identity()
  328. )
  329. (lora_A): ModuleDict(
  330. (default): Linear(in_features=1280, out_features=16, bias=False)
  331. )
  332. (lora_B): ModuleDict(
  333. (default): Linear(in_features=16, out_features=1280, bias=False)
  334. )
  335. (lora_embedding_A): ParameterDict()
  336. (lora_embedding_B): ParameterDict()
  337. )
  338. (to_k): Linear(in_features=768, out_features=1280, bias=False)
  339. (to_v): lora.Linear(
  340. (base_layer): Linear(in_features=768, out_features=1280, bias=False)
  341. (lora_dropout): ModuleDict(
  342. (default): Identity()
  343. )
  344. (lora_A): ModuleDict(
  345. (default): Linear(in_features=768, out_features=16, bias=False)
  346. )
  347. (lora_B): ModuleDict(
  348. (default): Linear(in_features=16, out_features=1280, bias=False)
  349. )
  350. (lora_embedding_A): ParameterDict()
  351. (lora_embedding_B): ParameterDict()
  352. )
  353. (to_out): ModuleList(
  354. (0): Linear(in_features=1280, out_features=1280, bias=True)
  355. (1): Dropout(p=0.0, inplace=False)
  356. )
  357. )
  358. (norm3): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
  359. (ff): FeedForward(
  360. (net): ModuleList(
  361. (0): GEGLU(
  362. (proj): Linear(in_features=1280, out_features=10240, bias=True)
  363. )
  364. (1): Dropout(p=0.0, inplace=False)
  365. (2): Linear(in_features=5120, out_features=1280, bias=True)
  366. )
  367. )
  368. )
  369. )
  370. (proj_out): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
  371. )
  372. )
  373. (resnets): ModuleList(
  374. (0): ResnetBlock2D(
  375. (norm1): GroupNorm(32, 640, eps=1e-05, affine=True)
  376. (conv1): Conv2d(640, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  377. (time_emb_proj): Linear(in_features=1280, out_features=1280, bias=True)
  378. (norm2): GroupNorm(32, 1280, eps=1e-05, affine=True)
  379. (dropout): Dropout(p=0.0, inplace=False)
  380. (conv2): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  381. (nonlinearity): SiLU()
  382. (conv_shortcut): Conv2d(640, 1280, kernel_size=(1, 1), stride=(1, 1))
  383. )
  384. (1): ResnetBlock2D(
  385. (norm1): GroupNorm(32, 1280, eps=1e-05, affine=True)
  386. (conv1): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  387. (time_emb_proj): Linear(in_features=1280, out_features=1280, bias=True)
  388. (norm2): GroupNorm(32, 1280, eps=1e-05, affine=True)
  389. (dropout): Dropout(p=0.0, inplace=False)
  390. (conv2): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  391. (nonlinearity): SiLU()
  392. )
  393. )
  394. (downsamplers): ModuleList(
  395. (0): Downsample2D(
  396. (conv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  397. )
  398. )
  399. )
  400. (3): DownBlock2D(
  401. (resnets): ModuleList(
  402. (0-1): 2 x ResnetBlock2D(
  403. (norm1): GroupNorm(32, 1280, eps=1e-05, affine=True)
  404. (conv1): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  405. (time_emb_proj): Linear(in_features=1280, out_features=1280, bias=True)
  406. (norm2): GroupNorm(32, 1280, eps=1e-05, affine=True)
  407. (dropout): Dropout(p=0.0, inplace=False)
  408. (conv2): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  409. (nonlinearity): SiLU()
  410. )
  411. )
  412. )
  413. )
  414. (up_blocks): ModuleList(
  415. (0): UpBlock2D(
  416. (resnets): ModuleList(
  417. (0-2): 3 x ResnetBlock2D(
  418. (norm1): GroupNorm(32, 2560, eps=1e-05, affine=True)
  419. (conv1): Conv2d(2560, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  420. (time_emb_proj): Linear(in_features=1280, out_features=1280, bias=True)
  421. (norm2): GroupNorm(32, 1280, eps=1e-05, affine=True)
  422. (dropout): Dropout(p=0.0, inplace=False)
  423. (conv2): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  424. (nonlinearity): SiLU()
  425. (conv_shortcut): Conv2d(2560, 1280, kernel_size=(1, 1), stride=(1, 1))
  426. )
  427. )
  428. (upsamplers): ModuleList(
  429. (0): Upsample2D(
  430. (conv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  431. )
  432. )
  433. )
  434. (1): CrossAttnUpBlock2D(
  435. (attentions): ModuleList(
  436. (0-2): 3 x Transformer2DModel(
  437. (norm): GroupNorm(32, 1280, eps=1e-06, affine=True)
  438. (proj_in): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
  439. (transformer_blocks): ModuleList(
  440. (0): BasicTransformerBlock(
  441. (norm1): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
  442. (attn1): Attention(
  443. (to_q): lora.Linear(
  444. (base_layer): Linear(in_features=1280, out_features=1280, bias=False)
  445. (lora_dropout): ModuleDict(
  446. (default): Identity()
  447. )
  448. (lora_A): ModuleDict(
  449. (default): Linear(in_features=1280, out_features=16, bias=False)
  450. )
  451. (lora_B): ModuleDict(
  452. (default): Linear(in_features=16, out_features=1280, bias=False)
  453. )
  454. (lora_embedding_A): ParameterDict()
  455. (lora_embedding_B): ParameterDict()
  456. )
  457. (to_k): Linear(in_features=1280, out_features=1280, bias=False)
  458. (to_v): lora.Linear(
  459. (base_layer): Linear(in_features=1280, out_features=1280, bias=False)
  460. (lora_dropout): ModuleDict(
  461. (default): Identity()
  462. )
  463. (lora_A): ModuleDict(
  464. (default): Linear(in_features=1280, out_features=16, bias=False)
  465. )
  466. (lora_B): ModuleDict(
  467. (default): Linear(in_features=16, out_features=1280, bias=False)
  468. )
  469. (lora_embedding_A): ParameterDict()
  470. (lora_embedding_B): ParameterDict()
  471. )
  472. (to_out): ModuleList(
  473. (0): Linear(in_features=1280, out_features=1280, bias=True)
  474. (1): Dropout(p=0.0, inplace=False)
  475. )
  476. )
  477. (norm2): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
  478. (attn2): Attention(
  479. (to_q): lora.Linear(
  480. (base_layer): Linear(in_features=1280, out_features=1280, bias=False)
  481. (lora_dropout): ModuleDict(
  482. (default): Identity()
  483. )
  484. (lora_A): ModuleDict(
  485. (default): Linear(in_features=1280, out_features=16, bias=False)
  486. )
  487. (lora_B): ModuleDict(
  488. (default): Linear(in_features=16, out_features=1280, bias=False)
  489. )
  490. (lora_embedding_A): ParameterDict()
  491. (lora_embedding_B): ParameterDict()
  492. )
  493. (to_k): Linear(in_features=768, out_features=1280, bias=False)
  494. (to_v): lora.Linear(
  495. (base_layer): Linear(in_features=768, out_features=1280, bias=False)
  496. (lora_dropout): ModuleDict(
  497. (default): Identity()
  498. )
  499. (lora_A): ModuleDict(
  500. (default): Linear(in_features=768, out_features=16, bias=False)
  501. )
  502. (lora_B): ModuleDict(
  503. (default): Linear(in_features=16, out_features=1280, bias=False)
  504. )
  505. (lora_embedding_A): ParameterDict()
  506. (lora_embedding_B): ParameterDict()
  507. )
  508. (to_out): ModuleList(
  509. (0): Linear(in_features=1280, out_features=1280, bias=True)
  510. (1): Dropout(p=0.0, inplace=False)
  511. )
  512. )
  513. (norm3): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
  514. (ff): FeedForward(
  515. (net): ModuleList(
  516. (0): GEGLU(
  517. (proj): Linear(in_features=1280, out_features=10240, bias=True)
  518. )
  519. (1): Dropout(p=0.0, inplace=False)
  520. (2): Linear(in_features=5120, out_features=1280, bias=True)
  521. )
  522. )
  523. )
  524. )
  525. (proj_out): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
  526. )
  527. )
  528. (resnets): ModuleList(
  529. (0-1): 2 x ResnetBlock2D(
  530. (norm1): GroupNorm(32, 2560, eps=1e-05, affine=True)
  531. (conv1): Conv2d(2560, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  532. (time_emb_proj): Linear(in_features=1280, out_features=1280, bias=True)
  533. (norm2): GroupNorm(32, 1280, eps=1e-05, affine=True)
  534. (dropout): Dropout(p=0.0, inplace=False)
  535. (conv2): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  536. (nonlinearity): SiLU()
  537. (conv_shortcut): Conv2d(2560, 1280, kernel_size=(1, 1), stride=(1, 1))
  538. )
  539. (2): ResnetBlock2D(
  540. (norm1): GroupNorm(32, 1920, eps=1e-05, affine=True)
  541. (conv1): Conv2d(1920, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  542. (time_emb_proj): Linear(in_features=1280, out_features=1280, bias=True)
  543. (norm2): GroupNorm(32, 1280, eps=1e-05, affine=True)
  544. (dropout): Dropout(p=0.0, inplace=False)
  545. (conv2): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  546. (nonlinearity): SiLU()
  547. (conv_shortcut): Conv2d(1920, 1280, kernel_size=(1, 1), stride=(1, 1))
  548. )
  549. )
  550. (upsamplers): ModuleList(
  551. (0): Upsample2D(
  552. (conv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  553. )
  554. )
  555. )
  556. (2): CrossAttnUpBlock2D(
  557. (attentions): ModuleList(
  558. (0-2): 3 x Transformer2DModel(
  559. (norm): GroupNorm(32, 640, eps=1e-06, affine=True)
  560. (proj_in): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
  561. (transformer_blocks): ModuleList(
  562. (0): BasicTransformerBlock(
  563. (norm1): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
  564. (attn1): Attention(
  565. (to_q): lora.Linear(
  566. (base_layer): Linear(in_features=640, out_features=640, bias=False)
  567. (lora_dropout): ModuleDict(
  568. (default): Identity()
  569. )
  570. (lora_A): ModuleDict(
  571. (default): Linear(in_features=640, out_features=16, bias=False)
  572. )
  573. (lora_B): ModuleDict(
  574. (default): Linear(in_features=16, out_features=640, bias=False)
  575. )
  576. (lora_embedding_A): ParameterDict()
  577. (lora_embedding_B): ParameterDict()
  578. )
  579. (to_k): Linear(in_features=640, out_features=640, bias=False)
  580. (to_v): lora.Linear(
  581. (base_layer): Linear(in_features=640, out_features=640, bias=False)
  582. (lora_dropout): ModuleDict(
  583. (default): Identity()
  584. )
  585. (lora_A): ModuleDict(
  586. (default): Linear(in_features=640, out_features=16, bias=False)
  587. )
  588. (lora_B): ModuleDict(
  589. (default): Linear(in_features=16, out_features=640, bias=False)
  590. )
  591. (lora_embedding_A): ParameterDict()
  592. (lora_embedding_B): ParameterDict()
  593. )
  594. (to_out): ModuleList(
  595. (0): Linear(in_features=640, out_features=640, bias=True)
  596. (1): Dropout(p=0.0, inplace=False)
  597. )
  598. )
  599. (norm2): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
  600. (attn2): Attention(
  601. (to_q): lora.Linear(
  602. (base_layer): Linear(in_features=640, out_features=640, bias=False)
  603. (lora_dropout): ModuleDict(
  604. (default): Identity()
  605. )
  606. (lora_A): ModuleDict(
  607. (default): Linear(in_features=640, out_features=16, bias=False)
  608. )
  609. (lora_B): ModuleDict(
  610. (default): Linear(in_features=16, out_features=640, bias=False)
  611. )
  612. (lora_embedding_A): ParameterDict()
  613. (lora_embedding_B): ParameterDict()
  614. )
  615. (to_k): Linear(in_features=768, out_features=640, bias=False)
  616. (to_v): lora.Linear(
  617. (base_layer): Linear(in_features=768, out_features=640, bias=False)
  618. (lora_dropout): ModuleDict(
  619. (default): Identity()
  620. )
  621. (lora_A): ModuleDict(
  622. (default): Linear(in_features=768, out_features=16, bias=False)
  623. )
  624. (lora_B): ModuleDict(
  625. (default): Linear(in_features=16, out_features=640, bias=False)
  626. )
  627. (lora_embedding_A): ParameterDict()
  628. (lora_embedding_B): ParameterDict()
  629. )
  630. (to_out): ModuleList(
  631. (0): Linear(in_features=640, out_features=640, bias=True)
  632. (1): Dropout(p=0.0, inplace=False)
  633. )
  634. )
  635. (norm3): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
  636. (ff): FeedForward(
  637. (net): ModuleList(
  638. (0): GEGLU(
  639. (proj): Linear(in_features=640, out_features=5120, bias=True)
  640. )
  641. (1): Dropout(p=0.0, inplace=False)
  642. (2): Linear(in_features=2560, out_features=640, bias=True)
  643. )
  644. )
  645. )
  646. )
  647. (proj_out): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
  648. )
  649. )
  650. (resnets): ModuleList(
  651. (0): ResnetBlock2D(
  652. (norm1): GroupNorm(32, 1920, eps=1e-05, affine=True)
  653. (conv1): Conv2d(1920, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  654. (time_emb_proj): Linear(in_features=1280, out_features=640, bias=True)
  655. (norm2): GroupNorm(32, 640, eps=1e-05, affine=True)
  656. (dropout): Dropout(p=0.0, inplace=False)
  657. (conv2): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  658. (nonlinearity): SiLU()
  659. (conv_shortcut): Conv2d(1920, 640, kernel_size=(1, 1), stride=(1, 1))
  660. )
  661. (1): ResnetBlock2D(
  662. (norm1): GroupNorm(32, 1280, eps=1e-05, affine=True)
  663. (conv1): Conv2d(1280, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  664. (time_emb_proj): Linear(in_features=1280, out_features=640, bias=True)
  665. (norm2): GroupNorm(32, 640, eps=1e-05, affine=True)
  666. (dropout): Dropout(p=0.0, inplace=False)
  667. (conv2): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  668. (nonlinearity): SiLU()
  669. (conv_shortcut): Conv2d(1280, 640, kernel_size=(1, 1), stride=(1, 1))
  670. )
  671. (2): ResnetBlock2D(
  672. (norm1): GroupNorm(32, 960, eps=1e-05, affine=True)
  673. (conv1): Conv2d(960, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  674. (time_emb_proj): Linear(in_features=1280, out_features=640, bias=True)
  675. (norm2): GroupNorm(32, 640, eps=1e-05, affine=True)
  676. (dropout): Dropout(p=0.0, inplace=False)
  677. (conv2): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  678. (nonlinearity): SiLU()
  679. (conv_shortcut): Conv2d(960, 640, kernel_size=(1, 1), stride=(1, 1))
  680. )
  681. )
  682. (upsamplers): ModuleList(
  683. (0): Upsample2D(
  684. (conv): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  685. )
  686. )
  687. )
  688. (3): CrossAttnUpBlock2D(
  689. (attentions): ModuleList(
  690. (0-2): 3 x Transformer2DModel(
  691. (norm): GroupNorm(32, 320, eps=1e-06, affine=True)
  692. (proj_in): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
  693. (transformer_blocks): ModuleList(
  694. (0): BasicTransformerBlock(
  695. (norm1): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
  696. (attn1): Attention(
  697. (to_q): lora.Linear(
  698. (base_layer): Linear(in_features=320, out_features=320, bias=False)
  699. (lora_dropout): ModuleDict(
  700. (default): Identity()
  701. )
  702. (lora_A): ModuleDict(
  703. (default): Linear(in_features=320, out_features=16, bias=False)
  704. )
  705. (lora_B): ModuleDict(
  706. (default): Linear(in_features=16, out_features=320, bias=False)
  707. )
  708. (lora_embedding_A): ParameterDict()
  709. (lora_embedding_B): ParameterDict()
  710. )
  711. (to_k): Linear(in_features=320, out_features=320, bias=False)
  712. (to_v): lora.Linear(
  713. (base_layer): Linear(in_features=320, out_features=320, bias=False)
  714. (lora_dropout): ModuleDict(
  715. (default): Identity()
  716. )
  717. (lora_A): ModuleDict(
  718. (default): Linear(in_features=320, out_features=16, bias=False)
  719. )
  720. (lora_B): ModuleDict(
  721. (default): Linear(in_features=16, out_features=320, bias=False)
  722. )
  723. (lora_embedding_A): ParameterDict()
  724. (lora_embedding_B): ParameterDict()
  725. )
  726. (to_out): ModuleList(
  727. (0): Linear(in_features=320, out_features=320, bias=True)
  728. (1): Dropout(p=0.0, inplace=False)
  729. )
  730. )
  731. (norm2): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
  732. (attn2): Attention(
  733. (to_q): lora.Linear(
  734. (base_layer): Linear(in_features=320, out_features=320, bias=False)
  735. (lora_dropout): ModuleDict(
  736. (default): Identity()
  737. )
  738. (lora_A): ModuleDict(
  739. (default): Linear(in_features=320, out_features=16, bias=False)
  740. )
  741. (lora_B): ModuleDict(
  742. (default): Linear(in_features=16, out_features=320, bias=False)
  743. )
  744. (lora_embedding_A): ParameterDict()
  745. (lora_embedding_B): ParameterDict()
  746. )
  747. (to_k): Linear(in_features=768, out_features=320, bias=False)
  748. (to_v): lora.Linear(
  749. (base_layer): Linear(in_features=768, out_features=320, bias=False)
  750. (lora_dropout): ModuleDict(
  751. (default): Identity()
  752. )
  753. (lora_A): ModuleDict(
  754. (default): Linear(in_features=768, out_features=16, bias=False)
  755. )
  756. (lora_B): ModuleDict(
  757. (default): Linear(in_features=16, out_features=320, bias=False)
  758. )
  759. (lora_embedding_A): ParameterDict()
  760. (lora_embedding_B): ParameterDict()
  761. )
  762. (to_out): ModuleList(
  763. (0): Linear(in_features=320, out_features=320, bias=True)
  764. (1): Dropout(p=0.0, inplace=False)
  765. )
  766. )
  767. (norm3): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
  768. (ff): FeedForward(
  769. (net): ModuleList(
  770. (0): GEGLU(
  771. (proj): Linear(in_features=320, out_features=2560, bias=True)
  772. )
  773. (1): Dropout(p=0.0, inplace=False)
  774. (2): Linear(in_features=1280, out_features=320, bias=True)
  775. )
  776. )
  777. )
  778. )
  779. (proj_out): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
  780. )
  781. )
  782. (resnets): ModuleList(
  783. (0): ResnetBlock2D(
  784. (norm1): GroupNorm(32, 960, eps=1e-05, affine=True)
  785. (conv1): Conv2d(960, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  786. (time_emb_proj): Linear(in_features=1280, out_features=320, bias=True)
  787. (norm2): GroupNorm(32, 320, eps=1e-05, affine=True)
  788. (dropout): Dropout(p=0.0, inplace=False)
  789. (conv2): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  790. (nonlinearity): SiLU()
  791. (conv_shortcut): Conv2d(960, 320, kernel_size=(1, 1), stride=(1, 1))
  792. )
  793. (1-2): 2 x ResnetBlock2D(
  794. (norm1): GroupNorm(32, 640, eps=1e-05, affine=True)
  795. (conv1): Conv2d(640, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  796. (time_emb_proj): Linear(in_features=1280, out_features=320, bias=True)
  797. (norm2): GroupNorm(32, 320, eps=1e-05, affine=True)
  798. (dropout): Dropout(p=0.0, inplace=False)
  799. (conv2): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  800. (nonlinearity): SiLU()
  801. (conv_shortcut): Conv2d(640, 320, kernel_size=(1, 1), stride=(1, 1))
  802. )
  803. )
  804. )
  805. )
  806. (mid_block): UNetMidBlock2DCrossAttn(
  807. (attentions): ModuleList(
  808. (0): Transformer2DModel(
  809. (norm): GroupNorm(32, 1280, eps=1e-06, affine=True)
  810. (proj_in): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
  811. (transformer_blocks): ModuleList(
  812. (0): BasicTransformerBlock(
  813. (norm1): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
  814. (attn1): Attention(
  815. (to_q): lora.Linear(
  816. (base_layer): Linear(in_features=1280, out_features=1280, bias=False)
  817. (lora_dropout): ModuleDict(
  818. (default): Identity()
  819. )
  820. (lora_A): ModuleDict(
  821. (default): Linear(in_features=1280, out_features=16, bias=False)
  822. )
  823. (lora_B): ModuleDict(
  824. (default): Linear(in_features=16, out_features=1280, bias=False)
  825. )
  826. (lora_embedding_A): ParameterDict()
  827. (lora_embedding_B): ParameterDict()
  828. )
  829. (to_k): Linear(in_features=1280, out_features=1280, bias=False)
  830. (to_v): lora.Linear(
  831. (base_layer): Linear(in_features=1280, out_features=1280, bias=False)
  832. (lora_dropout): ModuleDict(
  833. (default): Identity()
  834. )
  835. (lora_A): ModuleDict(
  836. (default): Linear(in_features=1280, out_features=16, bias=False)
  837. )
  838. (lora_B): ModuleDict(
  839. (default): Linear(in_features=16, out_features=1280, bias=False)
  840. )
  841. (lora_embedding_A): ParameterDict()
  842. (lora_embedding_B): ParameterDict()
  843. )
  844. (to_out): ModuleList(
  845. (0): Linear(in_features=1280, out_features=1280, bias=True)
  846. (1): Dropout(p=0.0, inplace=False)
  847. )
  848. )
  849. (norm2): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
  850. (attn2): Attention(
  851. (to_q): lora.Linear(
  852. (base_layer): Linear(in_features=1280, out_features=1280, bias=False)
  853. (lora_dropout): ModuleDict(
  854. (default): Identity()
  855. )
  856. (lora_A): ModuleDict(
  857. (default): Linear(in_features=1280, out_features=16, bias=False)
  858. )
  859. (lora_B): ModuleDict(
  860. (default): Linear(in_features=16, out_features=1280, bias=False)
  861. )
  862. (lora_embedding_A): ParameterDict()
  863. (lora_embedding_B): ParameterDict()
  864. )
  865. (to_k): Linear(in_features=768, out_features=1280, bias=False)
  866. (to_v): lora.Linear(
  867. (base_layer): Linear(in_features=768, out_features=1280, bias=False)
  868. (lora_dropout): ModuleDict(
  869. (default): Identity()
  870. )
  871. (lora_A): ModuleDict(
  872. (default): Linear(in_features=768, out_features=16, bias=False)
  873. )
  874. (lora_B): ModuleDict(
  875. (default): Linear(in_features=16, out_features=1280, bias=False)
  876. )
  877. (lora_embedding_A): ParameterDict()
  878. (lora_embedding_B): ParameterDict()
  879. )
  880. (to_out): ModuleList(
  881. (0): Linear(in_features=1280, out_features=1280, bias=True)
  882. (1): Dropout(p=0.0, inplace=False)
  883. )
  884. )
  885. (norm3): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
  886. (ff): FeedForward(
  887. (net): ModuleList(
  888. (0): GEGLU(
  889. (proj): Linear(in_features=1280, out_features=10240, bias=True)
  890. )
  891. (1): Dropout(p=0.0, inplace=False)
  892. (2): Linear(in_features=5120, out_features=1280, bias=True)
  893. )
  894. )
  895. )
  896. )
  897. (proj_out): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
  898. )
  899. )
  900. (resnets): ModuleList(
  901. (0-1): 2 x ResnetBlock2D(
  902. (norm1): GroupNorm(32, 1280, eps=1e-05, affine=True)
  903. (conv1): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  904. (time_emb_proj): Linear(in_features=1280, out_features=1280, bias=True)
  905. (norm2): GroupNorm(32, 1280, eps=1e-05, affine=True)
  906. (dropout): Dropout(p=0.0, inplace=False)
  907. (conv2): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  908. (nonlinearity): SiLU()
  909. )
  910. )
  911. )
  912. (conv_norm_out): GroupNorm(32, 320, eps=1e-05, affine=True)
  913. (conv_act): SiLU()
  914. (conv_out): Conv2d(320, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  915. )
  916. )
  917. )
  918. trainable params: 589,824 || all params: 123,650,304 || trainable%: 0.4770097451600281
  919. PeftModel(
  920. (base_model): LoraModel(
  921. (model): CLIPTextModel(
  922. (text_model): CLIPTextTransformer(
  923. (embeddings): CLIPTextEmbeddings(
  924. (token_embedding): Embedding(49408, 768)
  925. (position_embedding): Embedding(77, 768)
  926. )
  927. (encoder): CLIPEncoder(
  928. (layers): ModuleList(
  929. (0-11): 12 x CLIPEncoderLayer(
  930. (self_attn): CLIPAttention(
  931. (k_proj): Linear(in_features=768, out_features=768, bias=True)
  932. (v_proj): lora.Linear(
  933. (base_layer): Linear(in_features=768, out_features=768, bias=True)
  934. (lora_dropout): ModuleDict(
  935. (default): Identity()
  936. )
  937. (lora_A): ModuleDict(
  938. (default): Linear(in_features=768, out_features=16, bias=False)
  939. )
  940. (lora_B): ModuleDict(
  941. (default): Linear(in_features=16, out_features=768, bias=False)
  942. )
  943. (lora_embedding_A): ParameterDict()
  944. (lora_embedding_B): ParameterDict()
  945. )
  946. (q_proj): lora.Linear(
  947. (base_layer): Linear(in_features=768, out_features=768, bias=True)
  948. (lora_dropout): ModuleDict(
  949. (default): Identity()
  950. )
  951. (lora_A): ModuleDict(
  952. (default): Linear(in_features=768, out_features=16, bias=False)
  953. )
  954. (lora_B): ModuleDict(
  955. (default): Linear(in_features=16, out_features=768, bias=False)
  956. )
  957. (lora_embedding_A): ParameterDict()
  958. (lora_embedding_B): ParameterDict()
  959. )
  960. (out_proj): Linear(in_features=768, out_features=768, bias=True)
  961. )
  962. (layer_norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  963. (mlp): CLIPMLP(
  964. (activation_fn): QuickGELUActivation()
  965. (fc1): Linear(in_features=768, out_features=3072, bias=True)
  966. (fc2): Linear(in_features=3072, out_features=768, bias=True)
  967. )
  968. (layer_norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  969. )
  970. )
  971. )
  972. (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  973. )
  974. )
  975. )
  976. )
  977. 12/24/2023 06:09:07 - INFO - __main__ - ***** Running training *****
  978. 12/24/2023 06:09:07 - INFO - __main__ - Num examples = 200
  979. 12/24/2023 06:09:07 - INFO - __main__ - Num batches each epoch = 200
  980. 12/24/2023 06:09:07 - INFO - __main__ - Num Epochs = 4
  981. 12/24/2023 06:09:07 - INFO - __main__ - Instantaneous batch size per device = 1
  982. 12/24/2023 06:09:07 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 1
  983. 12/24/2023 06:09:07 - INFO - __main__ - Gradient Accumulation steps = 1
  984. 12/24/2023 06:09:07 - INFO - __main__ - Total optimization steps = 800
  985. Steps: 0%| | 0/800 [00:00<?, ?it/s]/mnt/ssd1/home/kjetil/dev/lora_dreambooth/venv/lib/python3.11/site-packages/torch/cuda/memory.py:329: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
  986. warnings.warn(
  987. Traceback (most recent call last):
  988. File "/home/kjetil/dev/andres/ai-stuff/peft/examples/lora_dreambooth/train_dreambooth.py", line 1104, in <module>
  989. main(args)
  990. File "/home/kjetil/dev/andres/ai-stuff/peft/examples/lora_dreambooth/train_dreambooth.py", line 908, in main
  991. with TorchTracemalloc() if not args.no_tracemalloc else nullcontext() as tracemalloc:
  992. File "/home/kjetil/dev/andres/ai-stuff/peft/examples/lora_dreambooth/train_dreambooth.py", line 416, in __enter__
  993. torch.cuda.reset_max_memory_allocated() # reset the peak gauge to zero
  994. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  995. File "/mnt/ssd1/home/kjetil/dev/lora_dreambooth/venv/lib/python3.11/site-packages/torch/cuda/memory.py", line 334, in reset_max_memory_allocated
  996. return reset_peak_memory_stats(device=device)
  997. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  998. File "/mnt/ssd1/home/kjetil/dev/lora_dreambooth/venv/lib/python3.11/site-packages/torch/cuda/memory.py", line 307, in reset_peak_memory_stats
  999. return torch._C._cuda_resetPeakMemoryStats(device)
  1000. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  1001. RuntimeError: invalid argument to reset_peak_memory_stats
  1002. Steps: 0%| | 0/800 [00:10<?, ?it/s]
  1003. Traceback (most recent call last):
  1004. File "/mnt/ssd1/home/kjetil/dev/lora_dreambooth/venv/bin/accelerate", line 8, in <module>
  1005. sys.exit(main())
  1006. ^^^^^^
  1007. File "/mnt/ssd1/home/kjetil/dev/lora_dreambooth/venv/lib/python3.11/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
  1008. args.func(args)
  1009. File "/mnt/ssd1/home/kjetil/dev/lora_dreambooth/venv/lib/python3.11/site-packages/accelerate/commands/launch.py", line 1017, in launch_command
  1010. simple_launcher(args)
  1011. File "/mnt/ssd1/home/kjetil/dev/lora_dreambooth/venv/lib/python3.11/site-packages/accelerate/commands/launch.py", line 637, in simple_launcher
  1012. raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
  1013. subprocess.CalledProcessError: Command '['/mnt/ssd1/home/kjetil/dev/lora_dreambooth/venv/bin/python3', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--instance_data_dir=/home/kjetil/scratch/trening/', '--class_data_dir=/mnt/ssd1/home/kjetil/experiments/take1/class_dir', '--output_dir=/mnt/ssd1/home/kjetil/experiments/take1/output', '--train_text_encoder', '--with_prior_preservation', '--prior_loss_weight=1.0', '--num_dataloader_workers=1', '--instance_prompt=a photo of lyra dog', '--class_prompt=a photo of dog', '--resolution=512', '--train_batch_size=1', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=200', '--use_lora', '--lora_r', '16', '--lora_alpha', '27', '--lora_text_encoder_r', '16', '--lora_text_encoder_alpha', '17', '--learning_rate=1e-4', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--max_train_steps=800']' returned non-zero exit status 1.
  1014.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement