Incremental backups of a web host to a desktop mac
In the time since I set up kstruct.com I’ve had a series of interim backup procedures in place, but given that it’s going on almost a year now, I’ve spent a bit of time recently pulling it all into a single reliable system. There’s nothing here that you couldn’t look up else where on the web, or significantly improve for that matter, but hopefully it will server as a good starting point for other people setting up a similar system.
The goal here is to end up with a roughly daily backup of everything in my web hosting account on my Mac desktop machine, keeping about a week’s worth of history and without using too much disk space.
Cron backups of the database
The first thing I setup, way back when, was an automated dump of the databases for this site and the international baccalaureate notes wiki. This happens daily, is triggered via cron on the hosting server, and keeps the last seven days worth. This history means that, even if one of the databases managed to corrupt itself somehow, I’d have a number of fairly recent copies to roll back to (unless, of course, I didn’t notice for over a week - hopefully that isn’t going to happen).
First I created a backup directory in my home directory. This is really just to have somewhere neat to keep all these database backups, but the cron job will fail if it doesn’t exist first.
The cron job itself simply uses the mysqldump command to export the database, runs it through gzip to compress it, and writes it out to a backup directory. I’m not mad keen on the idea of having passwords stored in the crontab file, but in the end they’re going to have to be stored somewhere, and I don’t really see a better choice. I’m using –skip-opt here to avoid locking the database while dumping. This means it’s possible to get different tables in the backup in an inconsistent state, but it’s unlikely to cause any real problems (given that mysql doesn’t do transactions anyway, database constancy can’t be too critical to apps which use it). You could probably leave opt on and just switch off locking if you want things like extended inserts.
After all the databases are dumped, we change to the right directory and run a find command to clear out any database backups which haven’t been modified in the last seven days. Come to think of it, we could probably change directory at the beginning and clean up the dump commands (but I don’t really want to have to test that change at the moment).
12 15 * * * /usr/local/bin/mysqldump --skip-opt --user=mattsheppard
--password=XXXXXX mattsheppard_ib | gzip >
/users/home/mattsheppard/backups/
mattsheppard_ib_`date "+\%Y\%m\%d"`.gz;
/usr/local/bin/mysqldump --skip-opt --user=mattsheppard
--password= XXXXXX mattsheppard_wordpress | gzip >
/users/home/mattsheppard/backups/
mattsheppard_wordpress_`date "+\%Y\%m\%d"`.gz;
cd /users/home/mattsheppard/backups;
/usr/bin/find *.gz -mtime +7 -delete
Anyway, after all that, I’ve got a directory in my home directory named backups, with files for each database going back for the last week. Great for database corruption, but it’s not really going to help a great deal if a disk fails and both the database and files are suddenly gone.
Incremental rsync
To ensure that I’m covered if my host’s disks fail (and their backups aren’t any good), I’ve got a script to copy down my entire home directory and store it on my Mac. Once again, I’ve got things set up to store about a week’s worth of history, since it doesn’t cost much extra in terms of disk space and gives me a chance to fix things if problems aren’t discovered straight away.
To do this, the script uses rsync, which allows me to copy only changed files (saving some time and bandwidth), and makes keeping the last weeks worth of history pretty simple.
#!/bin/bash
cd /Users/matt/shell/kstruct_backup/;
BACKUP_DATE=`date +"%a"`;
rm -rf $PWD/home-$BACKUP_DATE
rsync -aze ssh --link-dest=$PWD/latest/ mattsheppard@kstruct.com:
$PWD/home-$BACKUP_DATE;
# Fix up the 'latest' link
rm $PWD/latest;
ln -s $PWD/home-$BACKUP_DATE $PWD/latest
date
du -shc $PWD/home-*
The goal is to have the kstruct_backup directory filled with home-Sun, home-Mon … home-Sat directories representing the last week’s worth of backups. First I change into the base directory, and find out the current day of the week, and delete the oldest backup (which we’re about to replace). Rsync is then run, using the link-dest option against the link named latest. This copies down everything which has changed since the last backup, and makes hard links for any files which haven’t changed. The hard links take up virtually no space, meaning it doesn’t use much more space to store a weeks worth of backups than it would to store one.
After the rsync is complete, the script fixes up the latest link to point at our new backup, and prints out a summary of the directory sizes.
Launchd scheduling
The only thing which is left to do is get this script to be run once a day on my mac. As of Mac OS 10.4 (Tiger) the proper way to do this is with launchd rather than cron. I can’t honestly say I’m a hundred percent sold on launchd, but it does have a number of apparent advantages, in particular the way it claims to handle events which should occur while the computer is asleep. I know there were a number of problems in the early 10.4 releases, but I’m lead to believe they’ve all been solved now.
Anyway, each launchd job is represented by an XML file in the traditional Apple plist format. ‘man launchd.plist’ has a rundown of all the options available, but setting up a since job to run at a specific time each day is very similar to setting up a cron job.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.kstruct.Backup</string>
<key>ProgramArguments</key>
<array>
<string>/Users/matt/shell/kstruct_backup/backup.sh</string>
</array>
<key>StartCalendarInterval</key>
</dict><dict>
<key>Minute</key>
<integer>30</integer>
<key>Hour</key>
<integer>21</integer>
</dict>
<key>StandardOutPath</key>
<string>/Users/matt/shell/kstruct_backup/log.txt</string>
</plist>
Basically, this says to run the script at /Users/matt/shell/kstruct_backup/backup.sh every day (at 9:30 pm) and append the output to /Users/matt/shell/kstruct_backup/log.txt. To have a job loaded into launchd when you login, it should be saved into a plist file in the user’s ~/Library/LaunchAgents/ directory, but it can be loaded and started without logging out and back in again with the launchctl command.
launchctl load ~/Library/LaunchAgents/com.kstruct.Backup.plist
launchctl start com.kstruct.Backup
I should note that launchd has been a little odd in terms of when it runs jobs. I’m lead to believe this may be related to time while the computer is asleep not being counted (which is not what launchd’s man page claims). For the time being it seems to work reasonably reliably, but if I can’t make sense of what’s actually happening here I may eventually switch to using cron (which is still supported, though no longer recommended).
Conclusion
I’ve done a few restore checks myself since setting all this up, and everything seems to be working as expected. Hopefully this will help out anyone who is trying to set up something similar, but if anyone sees obvious places for improvement, comments are more than welcome.
Technorati Tags: webhost, backup, rsync, launchd, mysql, mac