Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Backup Buddy Design Document
- ============================
- James Stanley 2011
- Requirements:
- -------------
- - Not possible for backup server to read any filenames or data
- - Not waste bandwidth
- - Possible to detect data corruption
- Server:
- -------
- Files will be stored on the server in a content-addressible filesystem. The
- server will maintain a system of files named by the SHA1 hash of their
- contents.
- When the client wishes to perform a new backup, it connects to the server and
- presents a list of SHA1 hashes. The server responds with a subset of this list
- containing only the hashes that it does not already have on disk (or that have
- become corrupted - hash the contents and check that it matches the filename).
- The client then sends the data for these files.
- After the server has stored the new files, it deletes any files that were not
- in the list of hashes sent by the client.
- The server system presented above allows the client to store arbitrary data on
- the server and for the server to detect when data has become corrupted.
- Client:
- -------
- The client will start with an empty directory somewhere in the filesystem.
- Starting at the backup root, it will create symmetrically-encrypted versions of
- each file and directory. Any blocks that are larger than 4K (make this limit
- configurable?) are split in to separate blocks, and all blocks are exactly 4K.
- To allow the client to know the list of files when the backup is restored, an
- index file is created. This file contains name-hash pairs (mapping a file name
- to a SHA1 hash). Each directory is represented by an index file.
- Special files like char/block devices and symlinks can have special types of
- block.
- Encoding:
- ---------
- The file is split in to blocks of size 4K minus sizeof(file_block_header).
- Starting with the last block, each block is encrypted and then hashed. The
- encrypted block is stored under the temporary directory on the local filesystem
- with the name being the SHA1 hash of the contents. Now we move on to the
- previous block of file and encrypt it (making sure to set the "next" sha1-hash
- correctly), storing it under the temporary directory, etc.
- File block format:
- ------------------
- [uint32 total_length][uchar160 sha1-hash of next block][data]
- Index block format:
- -------------------
- [uint32 num_entries][uchar160 sha1-hash of next block][DIRS]
- DIR format:
- -----------
- [uint16 name_length][ucharN name][uchar160 sha1-hash of first block]
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement