For some years I’ve been backing up my various Linux-based servers, websites etc using a custom script which makes incremental tar-based backups of key directory hierarchies, dumps some MySQL databases, and then copies the lot to a remote machine using scp or rsync. We run this each night using cron. It’s worked well, but it’s becoming rather spaghetti-like since we run some version of it on several machines, copying stuff to several other machines. And the process of pruning old backups to keep disk usage under control at both the sources and the destinations is somewhat haphazard.
So I’ve been looking at various other backup systems which may do a more manageable job. The big systems in the Unix world are the venerable Amanda and the more recent but highly-respected Bacula. I may do something based around Bacula in due course, but for now I needed something quick. Here’s a quick rundown of some useful backing-up scripts. They all make use of rsync, or the rsync algorithm, in some way, but do more than just copy from A to B.
- You can think of this as an rsync which keeps some history. The destination ends up with a copy of the source but also has a subdirectory containing reverse-diffs so you can get back to earlier versions. This is rather nice, I think, and it can pull the backups from or push them to a remote machine, though it does need to be installed at both ends. It’s mostly Python and relies on librsync. The standard Ubuntu rdiff-backup package isn’t very recent so I built and installed it by hand.
- This looks good and is being actively maintained. It’s a bit like rdiff-backup but focuses on encryption, and uses incremental tar-based backups. For me, the downside was that it’s push-only – you run it on one machine to send backups to another – and I was more keen on pulling from several machines under centralised control. Update: I later discovered that pushing can have some real advantages. One is that it can often be easier to manage the permissions of the backup user on the machine where the data exists. It might be a cron job run as root, for example. Another is that you may not always be able to install software or cron jobs on the machine where you want to store the backups. Also, duplicity has some interesting backends for things like Amazon S3. I’m using duplicity more now than when I first wrote this.
- In the short term, I think this is the one that will suit me best. You can create categories like ‘hourly’, ‘daily’, ‘monthly’, and specify how many of each you’d like kept. It creates complete copy directories for each copy of each one, but where the files haven’t changed they are simply hard links to the previous ones, so it’s pretty efficient on space. And a single configuration file can perform lots of remote and local backups. I suppose the downside is that the hard-link based architecture limits the range of filesystems on which you can store your backups, but if you’re firmly in the Unix world this seems to work rather well.
Just in case anyone else is looking…
Update: Emanuel Carnevale reminded me about:
- Unison is a bit like rsync but does bi-directional synchronisation – it can cope with changes being made at either end. I hadn’t really thought of it as a backup tool, but – perhaps because two-way synchronisation can sometimes do unexpected things – it does have the ability to keep backups of any files it replaces. One more option if you need it…!