Advertisement
Guest User

Active learning for SFW/Q/NSFW classification on Danbooru201

a guest
Jul 8th, 2018
105
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 7.47 KB | None | 0 0
  1. Prepare the Danbooru2017 dataset for use with the fast.ai NN library for training a s/q/e classifier:
  2.  
  3. ~~~{.Bash}
  4. # [
  5. cat metadata/*.json | jq '[.id, .rating]' -c | fgrep '"s"' | sed -e 's/\,"s"\]//' | tr -d '["' > ./sfw-ids.txt
  6. cat metadata/*.json | jq '[.id, .rating]' -c | fgrep '"q"' | sed -e 's/\,"q"\]//' | tr -d '["' > ./questionable-ids.txt
  7. cat metadata/*.json | jq '[.id, .rating]' -c | fgrep '"e"' | sed -e 's/\,"e"\]//' | tr -d '["' > ./nsfw-ids.txt # ]" # ]
  8.  
  9. symSFW() { BUCKET=$(printf "%04d" $(( $@ % 1000 )) ); if [[ -a ./512px-all/$BUCKET/"$@".jpg ]]; then ln -s /media/gwern/Data/danbooru2017/512px-all/"$BUCKET"/"$@".jpg ./data/train/sfw/"$@".jpg; fi; }
  10. export -f symSFW
  11. cat sfw-ids.txt | head -600000 | nice parallel --progress symSFW &
  12.  
  13. symQ() { BUCKET=$(printf "%04d" $(( $@ % 1000 )) ); if [[ -a ./512px-all/$BUCKET/"$@".jpg ]]; then ln -s /media/gwern/Data/danbooru2017/512px-all/"$BUCKET"/"$@".jpg ./data/train/questionable/"$@".jpg; fi; }
  14. export -f symQ
  15. cat questionable-ids.txt | head -600000 | nice parallel --progress symQ &
  16.  
  17. symNSFW() { BUCKET=$(printf "%04d" $(( $@ % 1000 )) ); if [[ -a ./512px-all/$BUCKET/"$@".jpg ]]; then ln -s /media/gwern/Data/danbooru2017/512px-all/"$BUCKET"/"$@".jpg ./data/train/nsfw/"$@".jpg; fi; }
  18. export -f symNSFW
  19. cat nsfw-ids.txt | head -600000 | nice parallel --progress symNSFW &
  20.  
  21. cd data
  22. # 10k validation images total:
  23. mv `find train/sfw/ -type l|shuf|head -3334` valid/sfw/
  24. mv `find train/questionable/ -type l|shuf|head -3333` valid/questionable/
  25. mv `find train/nsfw/ -type l|shuf|head -3333` valid/nsfw/
  26. cd ../
  27.  
  28. find data/train/ -type l | wc --lines
  29. # 1271820
  30. find data/train/sfw/ -type l | wc --lines
  31. # 588879
  32. find data/train/questionable/ -type l | wc --lines
  33. # 434770
  34. find data/train/nsfw/ -type l | wc --lines
  35. # 248171
  36. # R> round(digits=2, c(588879, 434770, 248171) / 1271820)
  37. # [1] 0.46 0.34 0.20
  38. ~~~
  39.  
  40. Train a deep NN with cyclic learning rates to 85% accuracy, and examine validation set for mislabeled images (specifically, SFW images mislabeled as the default, Questionable):
  41.  
  42. ~~~{.Python}
  43. from fastai.transforms import *
  44. from fastai.conv_learner import *
  45. from fastai.model import *
  46. from fastai.dataset import *
  47. from fastai.sgdr import *
  48. from fastai.plots import *
  49.  
  50. ## Specify: DenseNet-101 on 512x512 image data, from Danbooru2017, minibatch=26 (just fits in 2x1080ti), all fast.ai data augmentations:
  51. PATH = "/media/gwern/Data/danbooru2017/data/"
  52. sz = 512
  53. bs = 13*2
  54. arch=dn121
  55. tfms = tfms_from_model(arch, sz, aug_tfms=transforms_top_down+transforms_side_on, max_zoom=1.1)
  56.  
  57. data = ImageClassifierData.from_paths(PATH, tfms=tfms, bs=bs)
  58. learn = ConvLearner.pretrained(arch, data, precompute=False)
  59. learn.models.model = torch.nn.DataParallel(learn.models.model)
  60.  
  61. ## Train:
  62. learn.fit(0.0229, 1)
  63. learn.unfreeze()
  64. lr=np.array([2e-4,2e-3,2.29e-2])
  65.  
  66. learn.fit(lr, 5, cycle_len=1, cycle_mult=2)
  67. learn.save("2018-07-07-densenet101-512-sfwdanbooru85percent")
  68.  
  69. ## Generate predictions on validation set for active learning:
  70. ## to compute on the training/test set instead: log_preds,y = learn.TTA(is_test=True)
  71. log_preds,y = learn.TTA()
  72. probs = np.mean(np.exp(log_preds), axis=0)
  73. probs = np.mean(np.exp(log_preds), axis=0)
  74.  
  75. ## Prediction quality:
  76. accuracy_np(probs, y)
  77. # 0.852
  78.  
  79. from sklearn.metrics import confusion_matrix
  80. confusion_matrix(y, np.argmax(probs,axis=1))
  81. # NSFW Q S
  82. # NSFW 2552, 686, 95
  83. # Q 230, 2544, 559
  84. # S 20, 243, 3071
  85.  
  86. ## Merge filenames/labels/predictions:
  87. results = np.column_stack((data.val_ds.fnames, data.val_y, preds, np.mean(log_preds, axis=0)))
  88. ## Filter out just the Questionable images the NN thinks are Safe
  89. m = np.stack([row for row in results if (row[1] == '1' and row[2] == '2' ) ])
  90. ## Order mistakes by confidence from highest confidence to lowest:
  91. m = m[m[:,7].argsort()]
  92. ## Write out for interactive evaluation:
  93. np.savetxt('/media/gwern/Data/danbooru2017/data/q2s.txt', m[:,0], delimiter='', fmt="%s")
  94. ~~~
  95.  
  96. Visually examine and fix Q/S mistakes:
  97.  
  98. ~~~{.Bash}
  99. cd /media/gwern/Data/danbooru2017/data/
  100.  
  101. danbooruEditStatus() {
  102. local USERNAME="gwern-bot"
  103. local API_KEY="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
  104. local ID=$(basename --suffix=".jpg" $1)
  105. curl -u "$USERNAME:$API_KEY" -X PUT "https://danbooru.donmai.us/posts/"$ID".json" -d "post[rating]=$2"
  106. echo "Result:"
  107. curl --get "https://danbooru.donmai.us/posts/$ID.json" --data "login=$USERNAME&api_key=$API_KEY" | jq '.rating'
  108. }
  109. export -f danbooruEditStatus
  110.  
  111. ## Example use:
  112. # $ danbooruEditStatus valid/questionable/1452109.jpg s
  113. # {"tag_string":"2girls alternate_costume black_legwear blue_eyes blue_legwear brown_hair carrying cherry_blossoms cloud day hair_ornament hairclip holding loafers long_hair long_sleeves lyrical_nanoha mahou_shoujo_lyrical_nanoha mahou_shoujo_lyrical_nanoha_a's miniskirt multiple_girls open_mouth pantyhose princess_carry print_legwear red_eyes reinforce shoes short_hair shorts silver_hair single_hair_intake skirt sky smile socks standing sweater takana turtleneck x_hair_ornament yagami_hayate","is_banned":false,"id":1452109,"rating":"s","parent_id":null,"source":"http://25.media.tumblr.com/tumblr_m7ijbpKjKy1r9yjhso1_1280.jpg","image_width":800,"image_height":614,"file_size":102206,"file_ext":"jpg","pixiv_id":null,"uploader_id":23799,"keeper_data":{"uid":23799},"tag_count":42,"tag_count_general":36,"tag_count_character":2,"tag_count_copyright":3,"tag_count_artist":1,"tag_count_meta":0,"pool_string":"","created_at":"2013-06-29T08:58:15.443-04:00","score":1,"md5":"cc735d40ee521b74acaec6721e7e1549","last_comment_bumped_at":null,"is_note_locked":false,"fav_count":1,"last_noted_at":null,"is_rating_locked":false,"has_children":false,"approver_id":287254,"is_status_locked":false,"up_score":1,"down_score":0,"is_pending":false,"is_flagged":false,"is_deleted":false,"updated_at":"2018-07-08T17:06:53.192-04:00","last_commented_at":null,"has_active_children":false,"bit_flags":0,"uploader_name":"BrokenEagle98","has_large":false,"has_visible_children":false,"children_ids":null,"is_favorited":false,"tag_string_general":"2girls alternate_costume black_legwear blue_eyes blue_legwear brown_hair carrying cherry_blossoms cloud day hair_ornament hairclip holding loafers long_hair long_sleeves miniskirt multiple_girls open_mouth pantyhose princess_carry print_legwear red_eyes shoes short_hair shorts silver_hair single_hair_intake skirt sky smile socks standing sweater turtleneck x_hair_ornament","tag_string_character":"reinforce yagami_hayate","tag_string_copyright":"lyrical_nanoha mahou_shoujo_lyrical_nanoha mahou_shoujo_lyrical_nanoha_a's","tag_string_artist":"takana","tag_string_meta":"","file_url":"https://raikou2.donmai.us/cc/73/cc735d40ee521b74acaec6721e7e1549.jpg","large_file_url":"https://raikou2.donmai.us/cc/73/cc735d40ee521b74acaec6721e7e1549.jpg","preview_file_url":"https://raikou2.donmai.us/preview/cc/73/cc735d40ee521b74acaec6721e7e1549.jpg"} % Total % Received % Xferd Average Speed Time Time Time Current
  114. # Dload Upload Total Spent Left Speed
  115. # 100 2370 0 2370 0 0 8061 0 --:--:-- --:--:-- --:--:-- 8061
  116.  
  117. ## Scroll through the images in order; press '1' to move a questionable image to SFW & set to SFW on http://danbooru.donmai.us as well; '2' to move a Q to NSFW etc.
  118. feh --file=q2s.txt \
  119. --action1 "mv %f ./valid/sfw/ && echo %f >> q2s-moved.txt && danbooruEditStatus %f s &" \
  120. --action2 "mv %f ./valid/nsfw/ && echo %f >> q2e-moved.txt && danbooruEditStatus %f e &"
  121. ~~~
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement