Advertisement
Guest User

training writeup

a guest
Jan 26th, 2023
1,080
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 3.99 KB | None | 0 0
  1. the metadata can be pulled from the lora itself but
  2.  
  3. DATA:
  4. ~100 images of school uniform x5
  5. ~40 images of frilled bikini x5
  6. ~40 images of nsfw x4
  7. ~20 images of random outfits x8
  8. ~15 images that i wanted deprioritized x1
  9.  
  10.  
  11. TRAINING COUNTS:
  12. batch size=3
  13. this ran for 7 epochs
  14. each epoch was around ~950 images trained
  15. epoch 6 and 7 were beginning to show signs of overcook so i left them out
  16.  
  17. images were all tagged with
  18. ajitani hifumi, <outfit>
  19. using shuffle tags + keep_token 2
  20. I saw a distinct quality increase after doing this
  21.  
  22.  
  23. SCHEDULER: cosine_with_restarts
  24. I used the cosine_with_restarts because i have no idea how to pick a scheduler
  25. and BA anon was already using it.
  26. I haven't experimented with this yet.
  27. WARMUP RATIO: again, copied BA anon. Haven't experimented.
  28.  
  29.  
  30. LEARNING RATE:
  31. LR + unetLR 2e-4
  32. text LR: 1e-4
  33. After a lot of trial and error, I settled on a base LR of 1e-4
  34. being multiplied by 2/3 of the batch size (0.66 * 3 = 2)
  35. I read/heard somewhere that text enc LR should be half of the other LR.
  36.  
  37.  
  38. BAD LEARNING RATE:
  39. LR+unetLR = 3e-4
  40. text LR = 1.5e-4
  41. this resulted in random stuff randomly popping up when it shouldn't be:
  42. drinks popping up in school/her hands
  43. straps appearing on her clothes
  44. ?? objects just showing up
  45.  
  46. DIM/ALPHA:
  47. I tried dim=128 many many many times, and dim=64 a couple times.
  48. dim=64 produced great results but the quality was lacking compared to the dim=128 ones.
  49. I don't know if its because If my dataset is too large/varied/multi-concept or because
  50. I need to tune for it better but I figured I'll leave further exploration on that for later.
  51. I would definitely recommend anons give it a try with smaller data loras, it cooks super
  52. fast with amazing results.
  53.  
  54. I still dunno what exactly alpha does since the technical details go over my head, but it does
  55. seem to apply an inverse relationship with DIM to training speed.
  56. 128/128 - the way it was always done in the past, I found that this overcooks too easily
  57. and makes it harder (for me) to pin down quality
  58. 128/64 - this felt like it gave significantly more breathing room compared to 128/128
  59. 128/32 - my early trials showed alpha=32 producing great results, but it took a lot more time to train
  60. so i stopped experimenting with it due to impatience. Another anon prefers 128/32 and gets the best results there.
  61. 128/1 - memes. this being a default seems wrong. At least one technical anon mentioned it doesn't make sense to use 128/1 and
  62. instead one should be lowering dim at that point.
  63. Even with training at 3x LRs and 10 epochs (10,000+ steps) it was undercooked.
  64. And yet with all that it started showing weird flaws as well.
  65. If theres a way to get value out of this, i'm not the one figuring it out.
  66.  
  67.  
  68. MIX-PRECISION: BP, SAVE-PRECISION: FP
  69. When i use FP for mix-precision i run out of VRAM so its not a choice, unless I want
  70. to reduce my batch size. I prioritized experimenting with things that down slow me down
  71. significantly.
  72.  
  73.  
  74. FLIP AUGMENT: OFF
  75. I tried the flip augment flag many times. It has a significant impact.
  76. It boosts the speed at which the data converges significantly and sometimes
  77. gets better quality. I think it might make it too easy to overcook though.
  78. I haven't tested it since further data cleanups + reduced LR rates so it might still have value.
  79. But i was often erring towards overcooking so i turned it off in the most recent bakes.
  80.  
  81.  
  82. COLOR AUG: OFF
  83. I tried experimenting with this, It does things. It changes something.
  84. But I can't tell you what exactly it did and whether it was beneficial enough to be worth using.
  85. Stopped experimenting with it after 1 trial since it didn't amaze.
  86.  
  87.  
  88. RESOLUTION: 512,512
  89. I did one trial of 768,768 and it APPEARS to have brought some nice improvements.
  90. But it required dropping my batch size to 1 and took 3+ hours to train.
  91. It would take further experimenting to find the right combination of settings to get the
  92. right mileage out of this, and since it takes me aeons to train on it I dropped back to 512.
  93.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement