Not a Big Fan of Data Loss
In this post I will summarize my own personal backup approach, which might give you some fresh ideas about backups, especially if you are a Linux user.
For almost half a year I have been using one single computer acting as a terminal server to serve thin clients at home. I like this setup not only because it makes the most use of the resources and processing power of the server, but also because it allows me to centralize maintenance and, most importantly, my data.
Redundancy: Winning on the lottery?
The computer acting as the terminal server came with two hard drives of identical size (160Gb) but there was no on-board RAID (Redundant Array of Independent Drives) option or any way to add a RAID-controller without voiding the warranty. Because of these reasons I decided to use software RAID.
The reason for RAID was to add redundancy and since there were only two disks it had to be RAID-1, also known as mirroring. Simply put it mirrors the data between two disks, which means that even if one of the disks fails, the data will still is on the second drive.
Purely mathematically put the probability of both drives failing at the same time equals the probability of drive A failing multiplied with the probability of drive B failing:
P(A fails and B fails) = P(A fails) * P(B fails)
In theory this means that the probability of data loss due to hardware failure would dramatically decrease. The truth though is somewhat different. Since both disks operate in the same environment they are both equally vulnerable to things like exposure to high operating temperature or power surges.
By logic, the only case where the reduced probability of failure would be noticeable is if either of the drives failed due to manufacturing errors, but since the drives are of identical size and model and came with an OEM computer, chances are the disks were assembled in the same production line and thus under the same circumstances.
Picking a Reliable Format
Basically the tar utility archives a file structure as a serial stream allowing a directory structure to be represented as a single file, which makes it ideal for backing up data.
In computing, tar (derived from tape archive) is both file format (in the form of a type of archive bitstream) and the name of the program used to handle such files. The format was standardized by POSIX.1-1998 and later POSIX.1-2001. Initially developed as a raw format, used for tape backup and other sequential access devices for backup purposes, it is now commonly used to collate collections of files into one larger file, for distribution or archiving, while preserving file system information such as user and group permissions, dates, and directory structures.
Standarized formats are always interesting when deciding what format to pick for your backup routines. The fact that all *NIX operating systems can handle the file format makes it an ideal backup format. Because the program itself is designed according to the UNIX philosophy, it does one thing, and it does it well. There are no built-in compression methods/algorithms, but bzip2 and gnuzip are commonly used to compress the final serialized output.
Compression Comparison
When deciding what compression algorithm to use there are two variables that is worth attention:
Obviously there is a catch-22 involved: a fast algorithm that does not require a lot of computation might be very limited to what extent it can compress the data. Heavy compression, on the other hand, usually also requires a lot of computations in order to maximize the ratio between the size of the compressed and the uncompressed data. Typical comparisons between the common methods bzip2 and gnuzip reveals that:
The Simple Backup Suite
Personally I am using a dedicated tool called sbackup. It was developed during the Google Summer of Code 2005 for use with the Ubuntu GNU/Linux distribution.
Beside the fact that it comes with a full-fledged graphical user interface for configuration and restoration, there are some other features I find useful for a backup tool:
How It Works
Typically a backup by the sbackup utility will leave you with a folder named something like:
2007-04-01_23.00.04.283538.prescott.inc
if it is an incremental backup, or
2007-04-01_23.00.04.283538.prescott.ful
if it is a full backup.
The content is straight-forward:
base excludes files.tgz flist fprops packages ver
Putting It All Together
Ever since I started using a terminal server I was motivated to keep up a regular and stable backup schedule because that it was well worth the effort considering that all the data was in the same place. As previously mentioned I am using the sbackup utility to make the actual backups. Below is a more detailed view of what my backup schedule looks like:
June 13th, 2007 at 8:32 pm
Childish. PISH POSH. Hopscotch is childish, not this. This is interesting :)
Doug Swain Says:January 6th, 2008 at 11:45 pm
That’s unreal over the top haha. Awesome though.