Space Efficient Mysqldump Backups Using Incremental Patches

Update 2015-08-18: Boy do I feel silly! It turns out there’s a much simpler and much more robust way of doing what I’ve done with the scripts below. It turns out that, using any revision control system (eg. cvs, git, svn) that stores revisions as deltas (and most if not all do), all you need to do is copy anything into a revision control repository and commit it. Tada! The rcs takes care of the incremental part for you by its use of revision deltas (ie. patches). As a big fan of git I was hoping there was a way for it to fill this role. I had mistakenly thought that git stores whole files without diffs/deltas for every revision. This is true until git garbage collects as I found out with my Stack Overflow question: Can git use patch/diff based storage? There’s some great reading there, check it out. Simply garbage collect after adding and committing in git and you automatically get space efficient incremental backups with the bonus of the robustness and reliability of git (or whatever rcs you choose). Bonus: You can delta anything you can store in an rcs repository meaning files, binary or text, archives, images, etc. You still get the space savings! So, quite literally, my database backup is now something like this: (1) mysql dump, (2) git add dump, (3) git commit dump, (4) git gc. Simple, powerful, elegant, beautiful. As it should be!

Space Efficient Mysqldump Backups Using Incremental Patches

I’m now using Duplicity for super convenient one-liner style incremental backup commands in a simple shell script (seriously, it’s like three commands long) but what I’m missing is incremental space-savings on my database dump. Right now my mysqldump produces about a 40MB file, about 10MB compressed. It’s irked me for some time that there’s no simple way to do intra-file incremental backups. I’ve also wanted to do intra-day, not just daily, backups. Duplicity’s incremental backups allow for that but full database backups add up quickly. Well, I finally went ahead and wrote a shell script to do it and a recover script that can recover to any date in the series of backups – just like duplicity. The key was interdiff for incremental patches. Here’s how I did it…

Caution! The following code should NOT be considered ready for critical systems. I’m using it on my personal server in parallel with a separate full database backup mechanism. The script should work fine but I wouldn’t trust it as my only backup just yet.

The first file is backup-db.sh, I hope the code comments help explain what’s going on:

Now, here’s the recovery script:

There it is! I hope it helps someone out there who has wished for the same thing.

*For the record, the scripts were written and tested on CentOS 6 using Bash shell.

Leave a Reply