Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Title: gmfs, an experimental stacked journalling filesystem built on Git
- Author: James Stallings II aka Hiro Protagonist (Freenode IRC)
- Abstract: Technical Survey, Implementation Instructions and Materials
- Summary:
- A brief discussion of the nature of Journalled Filesystems; a documented exposition of the protypical
- implementation; a manifest of requirements for proper operation
- Date: Wed Sep 26, 2012
- This document and anything in it that is deemed worthy and valuable by or to anyone is made freely available to such
- persons, out of the sheer benevolence of my right intent, without encumberance or other hinderance, insofar as they
- do not usurp credit for the work, nor prevent other persons from the enjoyment of it's equally unencumbered employment.
- --
- Introduction
- Having been involved over the years with a couple different complex opensource projects, I've come to be rather familiar
- with source code management systems, and have learned why they're useful and how they empower and insulate users from
- various potential problems that arise in the course of managing large, dynamic projects.
- More modern revision control systems generalize over the software development model, becoming more like content-management
- systems; enter Git.
- Git was first conceived as an opensource tool for managing the source code of the linux kernel, perhaps one of the largest,
- most complex, and dare I say most important source code bases in existence. It's design requirements remain rigorous, it's
- feature-set progressively broad in scope, and being designed largely for and by the linux kernel development community,
- it had to meet or exceed all requirements before being deemed acceptable.
- I first encountered git when a project I have long standing involvement with (the opensimulator project) switched to it
- as the source-control tool of choice a couple of years ago. I've since struggled with it some, cursed at it some, learned
- a lot, and come to appreciate it for the amazingly useful tool it is.
- The particular application of it as backend for a journalled filesystem occurred to me recently while reading Git
- documentation, as I was preparing to teach it. The documentation mentions, almost as a footnote, that git is perfectly
- suitable for managing all of a projects assets, not just it's source code; and it occurred to me at once that I should
- attempt initializing a smallish filesystem and seeing how it went.
- Presented here is what evolved from that experiment.
- --
- Journalling Filesystems: Disaster recovery management strategy?
- I guess this depends a lot on whether you take the developer's view or the manager's view; in any case, it all boils
- down to change mangement, whether or not such change is anticipated.
- A journalling filesystem allows one to travel in time, after a fashion, within the filesystem. You can treat it like
- any other filesystem, and at any time, roll it back to a previous point in time, and see the filesystem as it was at
- that time. The benefits of this capability are manifold and relatively obvious.
- --
- Backup Strategy or Version Control?
- Both, at least in this instance. Git can do some pretty usefull things with branches, and these things are equally
- usefull when the whole filesystem is under git management. So not only can git be used for specualtive work, it can also
- be used as a point-in-time backup control system with some pretty unique capabilities; backups (re)generated on demand,
- ready *remote replication and syncronization (file load balancing potential here?)
- * note that this referes to the entire filesystem repository which is not just a current copy of the filesystem, but also
- it's entire history and that of every file it contains
- The possibilities are intriguing to say the least.
- --
- So, enough with the blather, here's what I did:
- 1. Parts List
- Two 32 GB USB 2.0 flash drives
- 2. My P4D desktop box running Ubuntu Linux 12.04 LTS
- Software components employed:
- 1. Git SCM/RCS
- 2. cron clock-event software scheduler
- 3. bash shell scripting language
- 4. automounter automagically mount my flash drives in userland with fstab
- 5. your preferred text editor for working with the files
- The custom files:
- 1. fstab
- 2. crontab
- 3. roll.sh
- --
- Setting it all up
- The first thing that must be accomplished is getting the underlying volumes and filesystems viable and operable - in my
- case, this means getting the volumes (two usb flash drives) and their filesystems (vfat) mounted in a consistent
- location and in a consistent fashion. As they are removeable media and managed by my user, I need them mounted in my
- user's file space, and they need to automount when inserted in the ports. Here 's the relevant portion of my fstab:
- #/dev/sdc1 USB Thumbdrive
- UUID=A974-15B6 /home/twitch/flashdrv0 vfat rw,user,noauto,nofail
- #/dev/sdd1 USB Thumbdrive
- UUID=40E5-93BB /home/twitch/flashdrv1 vfat rw,user,noauto,nofail
- Note that the drives are mounted by UUID, the only way to distinguish between otherwise identical drives.
- The next thing is th crontab. Cron is a unix program that runs software based on what time it is. I wont go any further
- into it than that. The short story is, we need to do certain processing over the filesystem periodically with git to make
- the journalling magic happen, and that processing is encasulated within the roll.sh shell script; that script is run by
- cron (every 15 mins, all day long, every day, every week, every month in my case).
- The cron entries for my installation are as follows:
- 0,15,30,45 * * * * /bin/bash /home/twitch/flashdrv1/shbin/roll.sh
- 0,15,30,45 * * * * /bin/bash /home/twitch/flashdrv0/shbin/roll.sh
- Now to the meat of it -- the roll.sh script. Sounds like major mojo, but it really isn't; it's just some basic
- automation of git, which does all of the heavy lifting.
- Here's roll.sh:
- #!/bin/bash
- function Recurse
- {
- oldIFS=$IFS
- IFS=$'\n'
- for f in "$@"
- do
- if [[ -d "${f}" ]]; then
- echo "/usr/bin/git add ${f}/*" >>~/flashdrv0/logs/gmfs.log
- /usr/bin/git add "${f}/*" >>~/flashdrv0/logs/gmfs.log
- cd "${f}"
- Recurse $(ls -1 ".")
- cd ..
- fi
- done
- IFS=$oldIFS
- }
- # process cwd as a git-managed filesystem
- #
- # this is experimental and is just what it sounds like
- #
- echo "=====================================================================================================" >>~/flashdrv0/logs/gmfs.log
- cd /home/twitch/flashdrv0/
- echo `date`" - staging changes" >>~/flashdrv0/logs/gmfs.log
- Recurse $(ls -1 ".")
- /usr/bin/git add ~/flashdrv0/. >>~/flashdrv0/logs/gmfs.log
- /usr/bin/git add -u ~/flashdrv0/. >>~/flashdrv0/logs/gmfs.log
- echo `date`" - the following changes were staged for commit:" >>~/flashdrv0/logs/gmfs.log
- /usr/bin/git status >>~/flashdrv0/logs/gmfs.log
- echo `date`" - making commit" >>~/flashdrv0/logs/gmfs.log
- /usr/bin/git commit -a -m "`date`" >>~/flashdrv0/logs/gmfs.log
- echo `date`" - commit completed" >>~/flashdrv0/logs/gmfs.log
- Note the hard coded paths in both the crontab and the roll.sh shell script. This is an area that could potentially
- benefit from some configuration points. Note also that the script fully logs all it's activities.
- The intention of the git commands employed by the script is as follows:
- - (recursively) add any untracked directories and files within them to the repository
- - add the root of the filesystem to the repository (this might well be redundant)
- - update the repository with any files or folders that have been removed
- - make log entries summarizing repository staging conducted thus far (not git but relevant)
- - commit the staged changes to the repository, updating the repository to the state of the working directory
- --
- Idiosyncrosies
- Logging is sufficiently vigorous that it is always tainted immediately after the staged changes have been committed to the
- repository; so it always shows as modified and ready for staging. This and some other similar circumstances provide for
- some 'interesting' issues when working with branches. Among these are that any uncomitted changes will have to be resolved
- in the new branch before one can return to the 'master' branch; so probably best practices to manually commit any staged
- changes before working with a new branch.
- --
- Conclusions
- This experiment is still very much in progress, and the jury is still out on whether it's a viable journalling filesystem
- solution or just a technically interesting curiousity. I'm publishing about it in the interest of of sharing my experiment,
- and encouraging others to join in, not in the interest of presenting or announcing a finished product with deliverables.
- In short, use at your own risk.
Advertisement
Add Comment
Please, Sign In to add comment