SHARE
TWEET

Bloo

a guest Dec 1st, 2015 2,360 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. 4ChanArchives.Cu.CC is worth mentioning for several reasons:
  2. [*] He is the only one currently hosting the 4Archive's DB
  3. [*] He is using a self/hand-made system for displaying that DB, not using any CMS/framework
  4. [*] he is using a heavily modified system based on the 4Archive's LumenCMS for his auto/on-demand archiving
  5. [*] He has been resolving any DMCA/takedown e-mails as well as replying swiftly to any sort of e-mail
  6. [*] He is running advertisements on his site
  7. [*] He has reuploaded more than 632k images from imageshack to imgur via MashapeAPI
  8. [*] He is privately working on bringing up a new and better on-demand archiving system
  9. [*] He is privately working on bringing up a 1:1 version of archive.moe, just as it was in the day it died.
  10. [*] He has plans to convert some day the 4Archive's DB in to Fuuka DB structure
  11. [*] He is online 24/7 and often active on the irc.rizon.net on several channels (nickname: Bloo_SemiAFK)
  12.  
  13. As of 1.November 2015, any image shown on imageshack free account will no longer be hot-linkable (displayed on other than imageshack's website) and as of 1. January 2015 all images hosted on free accounts will be deleted. 4ChanArchives.cu.cc admin (known as Bloo_SemiAFK in rizon's IRC network) took the tough challenge of getting all imageshack links from the current 4Archive's DB reuploaded.
  14. At first he attempted to speak to 4Archive via email but the admin never replied.
  15. He then bought a 25$ subscribtion to imageshack, only to get told that he is uploading too much and no matter how much he is paying he will not be allowed to upload this many images.
  16. He then went to mashape and imgur, paying at first 25USD/month with a 25k images/day limit but quickly realized that it would take too long and bought a better plan with 100USD and 100k images/day limit.
  17. Imgur does not allow direct uploading of imageshack images, for whatever reason there is, so he had to downloading every single image and uploading it to imgur via mashape. It took him about 25days and in the end out of the 665,015 images (~303.04GB) he managed to save 632,279 images (~288.18GB). The rest, 32,736 images (~14.86GB) were either deleted from the imageshack servers or corrupted and not archivable. He also provided the DB dump with the new links.[provide the link to the zip i gave you after you upload it to archive.com or w/e]
  18. Thus, [stikretough]all[stikretough] most 4Archive images are now on imgur.
  19. Speaking of imgur, whenever someone sends a DMCA to them, they remove the image. This causes the DB to have a lot of imgur link entries which point to nothing. In the following months he is going to check every single link in the 4Archive's DB and if it's deleted, it will be replaced with a 404.png. Empty posts (having a 404.png and no text body) are deleted as they have no value anymore and only keep DB space.
  20. The table DB structure looks like this: http://i.imgur.com/CqZAEmO.png
  21.  
  22. In terms of the advertisements on 4chanarchives.cu.cc site there are 3 main opinions
  23. [*] It's ok, at least this way he will maintain and secure a long lasting future for the current and upcoming archives as well as a financial motivation to continue to improve current systems like the 4Archive and in the future Fuuka&Asagi or creating a custom and new CMS (he is an experienced PHP&MySQL, HTML&CSS, Laravel, developer and knows what he's doing).
  24. [*] It's not ok, archivers should pay out of their pocket, regardless if the admin is an Europoor from the Balkans, if he want's to maintain a site like that he should pay everything on his own without getting anything in return!
  25. [*] I don't care, as long as the archive is running, it doesn't matter for me if it has adds or no, I use uBlock/AdBlockPlus.
  26.  
  27. The administrator is working with different media advertisers to display NSFW adds only on NSFW boards and SFW adds on the SFW boards.
  28.  
  29. It's worth noting that the administrator of 4ChanArchives.cu.cc is one of the very few who replies in under 24hours to any DMCA notice or a friendly take-down notice (personal images and information, etc.) thus keeping a healthy website.
  30.  
  31. Furthermore, the administrator has been running locally as of about the middle of October 4Archive's CMS, which is based on lumen and has a lot of bugs, perhaps some of which caused the death of 4zip.org. He was using that system as a on-demand archiving in a closed beta, runing only on his dev mashine and not publicly. The threads will be added to the 4chanarchive.cu.cc DB [soon]
  32. Since about 7. Nov. he has created a script (threads.php) which pulls all board names from the 4chan API (boards.json) and then accesses each board's archive.json to pull all threads and send them to his local installation.
  33. As of 18. Nov, meaning about 10days of active automatic archiving, he has archived about 4mil. posts, 1mil images (500gb), 35k threads.
  34.  
  35. A block diagram of the archiving process is shown here:
  36. https://i.imgur.com/WXcf4OY.png
  37.  
  38. A screenshot of the actual process in motion:
  39. http://i.imgur.com/5YWxrMT.png (as you can notice it barely stresses the HDD with 5 simultaneous scripts running
  40.  
  41. Each thread is checked if it already exists in the DB, if it does - the script continues to the other one, going though all threads per board, going though each single board.
  42. The sourcecode for the thread.php can be found here:
  43. http://pastebin.com/ppBfJTu7
  44. Updated version:
  45. http://pastebin.com/X9UVR6fc
  46.  
  47. If the thread does not exist in the DB, it's ID is written by the 4Archive's API and then via the 4Chan API  (threads.json) the entire thread is archived.
  48. The images in the thread are send one by one as a link to imgur via the mashape's API. If it fails, the file is downloaded locally and uploaded again. If it uploads - the file is deleted from the local hdd, if the file did not upload, it is kept and another script (upload.php) attempts to upload it.
  49. Both threads.php nd upload.php run 24/7 with a 30 minute pause when they are done.
  50.  
  51. Currently the administrator aims to archive all NSFW boards (you can see them in the source code) as well as all other boards, excluding /b/ and /trash/ since they don't have archive.json and /f/ since flash files can't be uploaded to imgur.
  52.  
  53. You can read more information on the ArchiveTeam's website:
  54. http://archiveteam.org/index.php?title=4chan#4chanarchives.cu.cc
RAW Paste Data
Top