Hiro_Protagonist

Whitepaper: Git-based stacked journalling filesystem

Sep 26th, 2012
109
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 8.67 KB | None | 0 0
  1. Title: gmfs, an experimental stacked journalling filesystem built on Git
  2. Author: James Stallings II aka Hiro Protagonist (Freenode IRC)
  3. Abstract: Technical Survey, Implementation Instructions and Materials
  4. Summary:
  5. A brief discussion of the nature of Journalled Filesystems; a documented exposition of the protypical
  6. implementation; a manifest of requirements for proper operation
  7.  
  8. Date: Wed Sep 26, 2012
  9.  
  10. This document and anything in it that is deemed worthy and valuable by or to anyone is made freely available to such
  11. persons, out of the sheer benevolence of my right intent, without encumberance or other hinderance, insofar as they
  12. do not usurp credit for the work, nor prevent other persons from the enjoyment of it's equally unencumbered employment.
  13.  
  14. --
  15.  
  16. Introduction
  17.  
  18. Having been involved over the years with a couple different complex opensource projects, I've come to be rather familiar
  19. with source code management systems, and have learned why they're useful and how they empower and insulate users from
  20. various potential problems that arise in the course of managing large, dynamic projects.
  21.  
  22. More modern revision control systems generalize over the software development model, becoming more like content-management
  23. systems; enter Git.
  24.  
  25. Git was first conceived as an opensource tool for managing the source code of the linux kernel, perhaps one of the largest,
  26. most complex, and dare I say most important source code bases in existence. It's design requirements remain rigorous, it's
  27. feature-set progressively broad in scope, and being designed largely for and by the linux kernel development community,
  28. it had to meet or exceed all requirements before being deemed acceptable.
  29.  
  30. I first encountered git when a project I have long standing involvement with (the opensimulator project) switched to it
  31. as the source-control tool of choice a couple of years ago. I've since struggled with it some, cursed at it some, learned
  32. a lot, and come to appreciate it for the amazingly useful tool it is.
  33.  
  34. The particular application of it as backend for a journalled filesystem occurred to me recently while reading Git
  35. documentation, as I was preparing to teach it. The documentation mentions, almost as a footnote, that git is perfectly
  36. suitable for managing all of a projects assets, not just it's source code; and it occurred to me at once that I should
  37. attempt initializing a smallish filesystem and seeing how it went.
  38.  
  39. Presented here is what evolved from that experiment.
  40.  
  41. --
  42.  
  43. Journalling Filesystems: Disaster recovery management strategy?
  44.  
  45. I guess this depends a lot on whether you take the developer's view or the manager's view; in any case, it all boils
  46. down to change mangement, whether or not such change is anticipated.
  47.  
  48. A journalling filesystem allows one to travel in time, after a fashion, within the filesystem. You can treat it like
  49. any other filesystem, and at any time, roll it back to a previous point in time, and see the filesystem as it was at
  50. that time. The benefits of this capability are manifold and relatively obvious.
  51.  
  52. --
  53.  
  54. Backup Strategy or Version Control?
  55.  
  56. Both, at least in this instance. Git can do some pretty usefull things with branches, and these things are equally
  57. usefull when the whole filesystem is under git management. So not only can git be used for specualtive work, it can also
  58. be used as a point-in-time backup control system with some pretty unique capabilities; backups (re)generated on demand,
  59. ready *remote replication and syncronization (file load balancing potential here?)
  60.  
  61. * note that this referes to the entire filesystem repository which is not just a current copy of the filesystem, but also
  62. it's entire history and that of every file it contains
  63.  
  64. The possibilities are intriguing to say the least.
  65.  
  66. --
  67.  
  68. So, enough with the blather, here's what I did:
  69.  
  70. 1. Parts List
  71. Two 32 GB USB 2.0 flash drives
  72.  
  73. 2. My P4D desktop box running Ubuntu Linux 12.04 LTS
  74.  
  75.  
  76. Software components employed:
  77.  
  78. 1. Git SCM/RCS
  79. 2. cron clock-event software scheduler
  80. 3. bash shell scripting language
  81. 4. automounter automagically mount my flash drives in userland with fstab
  82. 5. your preferred text editor for working with the files
  83.  
  84.  
  85. The custom files:
  86.  
  87. 1. fstab
  88. 2. crontab
  89. 3. roll.sh
  90.  
  91. --
  92.  
  93. Setting it all up
  94.  
  95. The first thing that must be accomplished is getting the underlying volumes and filesystems viable and operable - in my
  96. case, this means getting the volumes (two usb flash drives) and their filesystems (vfat) mounted in a consistent
  97. location and in a consistent fashion. As they are removeable media and managed by my user, I need them mounted in my
  98. user's file space, and they need to automount when inserted in the ports. Here 's the relevant portion of my fstab:
  99.  
  100. #/dev/sdc1 USB Thumbdrive
  101. UUID=A974-15B6 /home/twitch/flashdrv0 vfat rw,user,noauto,nofail
  102.  
  103. #/dev/sdd1 USB Thumbdrive
  104. UUID=40E5-93BB /home/twitch/flashdrv1 vfat rw,user,noauto,nofail
  105.  
  106. Note that the drives are mounted by UUID, the only way to distinguish between otherwise identical drives.
  107.  
  108. The next thing is th crontab. Cron is a unix program that runs software based on what time it is. I wont go any further
  109. into it than that. The short story is, we need to do certain processing over the filesystem periodically with git to make
  110. the journalling magic happen, and that processing is encasulated within the roll.sh shell script; that script is run by
  111. cron (every 15 mins, all day long, every day, every week, every month in my case).
  112.  
  113. The cron entries for my installation are as follows:
  114.  
  115. 0,15,30,45 * * * * /bin/bash /home/twitch/flashdrv1/shbin/roll.sh
  116. 0,15,30,45 * * * * /bin/bash /home/twitch/flashdrv0/shbin/roll.sh
  117.  
  118. Now to the meat of it -- the roll.sh script. Sounds like major mojo, but it really isn't; it's just some basic
  119. automation of git, which does all of the heavy lifting.
  120.  
  121. Here's roll.sh:
  122.  
  123. #!/bin/bash
  124.  
  125. function Recurse
  126. {
  127. oldIFS=$IFS
  128. IFS=$'\n'
  129. for f in "$@"
  130. do
  131. if [[ -d "${f}" ]]; then
  132. echo "/usr/bin/git add ${f}/*" >>~/flashdrv0/logs/gmfs.log
  133. /usr/bin/git add "${f}/*" >>~/flashdrv0/logs/gmfs.log
  134. cd "${f}"
  135. Recurse $(ls -1 ".")
  136. cd ..
  137. fi
  138. done
  139. IFS=$oldIFS
  140. }
  141.  
  142. # process cwd as a git-managed filesystem
  143. #
  144. # this is experimental and is just what it sounds like
  145. #
  146. echo "=====================================================================================================" >>~/flashdrv0/logs/gmfs.log
  147. cd /home/twitch/flashdrv0/
  148. echo `date`" - staging changes" >>~/flashdrv0/logs/gmfs.log
  149. Recurse $(ls -1 ".")
  150. /usr/bin/git add ~/flashdrv0/. >>~/flashdrv0/logs/gmfs.log
  151. /usr/bin/git add -u ~/flashdrv0/. >>~/flashdrv0/logs/gmfs.log
  152. echo `date`" - the following changes were staged for commit:" >>~/flashdrv0/logs/gmfs.log
  153. /usr/bin/git status >>~/flashdrv0/logs/gmfs.log
  154. echo `date`" - making commit" >>~/flashdrv0/logs/gmfs.log
  155. /usr/bin/git commit -a -m "`date`" >>~/flashdrv0/logs/gmfs.log
  156. echo `date`" - commit completed" >>~/flashdrv0/logs/gmfs.log
  157.  
  158.  
  159. Note the hard coded paths in both the crontab and the roll.sh shell script. This is an area that could potentially
  160. benefit from some configuration points. Note also that the script fully logs all it's activities.
  161.  
  162. The intention of the git commands employed by the script is as follows:
  163.  
  164. - (recursively) add any untracked directories and files within them to the repository
  165. - add the root of the filesystem to the repository (this might well be redundant)
  166. - update the repository with any files or folders that have been removed
  167. - make log entries summarizing repository staging conducted thus far (not git but relevant)
  168. - commit the staged changes to the repository, updating the repository to the state of the working directory
  169.  
  170. --
  171.  
  172. Idiosyncrosies
  173.  
  174. Logging is sufficiently vigorous that it is always tainted immediately after the staged changes have been committed to the
  175. repository; so it always shows as modified and ready for staging. This and some other similar circumstances provide for
  176. some 'interesting' issues when working with branches. Among these are that any uncomitted changes will have to be resolved
  177. in the new branch before one can return to the 'master' branch; so probably best practices to manually commit any staged
  178. changes before working with a new branch.
  179.  
  180. --
  181.  
  182. Conclusions
  183.  
  184. This experiment is still very much in progress, and the jury is still out on whether it's a viable journalling filesystem
  185. solution or just a technically interesting curiousity. I'm publishing about it in the interest of of sharing my experiment,
  186. and encouraging others to join in, not in the interest of presenting or announcing a finished product with deliverables.
  187. In short, use at your own risk.
Advertisement
Add Comment
Please, Sign In to add comment