Space Efficient Mysqldump Backups Using Incremental Patches

Update 2015-08-18: Boy do I feel silly! It turns out there’s a much simpler and much more robust way of doing what I’ve done with the scripts below. It turns out that, using any revision control system (eg. cvs, git, svn) that stores revisions as deltas (and most if not all do), all you need to do is copy anything into a revision control repository and commit it. Tada! The rcs takes care of the incremental part for you by its use of revision deltas (ie. patches). As a big fan of git I was hoping there was a way for it to fill this role. I had mistakenly thought that git stores whole files without diffs/deltas for every revision. This is true until git garbage collects as I found out with my Stack Overflow question: Can git use patch/diff based storage? There’s some great reading there, check it out. Simply garbage collect after adding and committing in git and you automatically get space efficient incremental backups with the bonus of the robustness and reliability of git (or whatever rcs you choose). Bonus: You can delta anything you can store in an rcs repository meaning files, binary or text, archives, images, etc. You still get the space savings! So, quite literally, my database backup is now something like this: (1) mysql dump, (2) git add dump, (3) git commit dump, (4) git gc. Simple, powerful, elegant, beautiful. As it should be!

Space Efficient Mysqldump Backups Using Incremental Patches

I’m now using Duplicity for super convenient one-liner style incremental backup commands in a simple shell script (seriously, it’s like three commands long) but what I’m missing is incremental space-savings on my database dump. Right now my mysqldump produces about a 40MB file, about 10MB compressed. It’s irked me for some time that there’s no simple way to do intra-file incremental backups. I’ve also wanted to do intra-day, not just daily, backups. Duplicity’s incremental backups allow for that but full database backups add up quickly. Well, I finally went ahead and wrote a shell script to do it and a recover script that can recover to any date in the series of backups – just like duplicity. The key was interdiff for incremental patches. Here’s how I did it…

Continue reading “Space Efficient Mysqldump Backups Using Incremental Patches”

Duplicity – Pleasure and Simplicity in Your Backups

Update 2016-02-13: I’m now making use of Duplicity’s “–full-if-older-than” backup option and the “remove-all-but-n-full” clean up mode to keep a moving window of 3 months of backups. This way your backups don’t keep growing and growing. I’ve updated the examples in below.

Duplicity – Pleasure and Simplicity in Your Backups

If you’re like me, you’ve spent a long time, in many different configurations, trying to come up with a simple yet flexible, easy to use yet space efficient backup solution for a long, long time. Well, I think I’ve finally jumped ship from shell scripts and tarballs to duplicity: bandwidth-efficient backup using the rsync algorithm. Although duplicity talks up encryption, and it’s great and makes it easy, it’s not required and I don’t use it. Here’s some tips and tricks to get you started…

Continue reading “Duplicity – Pleasure and Simplicity in Your Backups”

Where’s the command-line web? (turn back that revolution a few degrees)

Where’s the command-line web?

As all *nix enthusiasts, programmers, syadmins and hobbyists have come to know, our command-line utilities are a best and most constant ally. Any job that comes along we make easier with pipes, redirection, shell scripts and regular expressions. But sometimes you don’t have access to your favourite tools. Sometimes you just can’t get a really nifty tool on your own. So where’s the command-line web?

Where’s that ping command I can wget and see if my server’s available from another network? Where’s traceroute I can send in arguments and switches to, just as I do on the command-line, simply via the query string? Where’s the html encoding application I can pipe data into via http post and receive encoded data back safe for embedding in an html document?

There have been some great apps out there that would benefit from this treatment. One I remember is a DNS check. It runs all kinds of checks against your zones to make sure it’s all valid, correct and optimized. Another great app is one that checks for an open relay on an smtp server. For various reasons, these checks all seem to disappear after a while. The most famous example is that dns site that went to a pay model to use their tools. That sucked.

Wouldn’t it be great to have all these tools out there, free, and accessible via a command-line like interface. I think it’d be brilliant. It’d be a new paradigm in the web’s evolution. For years we’ve talked about soap and web services and xml data transfer formats but that will never be the end-game because we’ve already disovered the most convenient, the most pragmatic and most efficient way to communicate over disparate platforms: text – usually straight-up, or with a little agreed upon formatting (like csv). These simple techniques allow for an infinite range of possibilities.

Think about why all the unix-like platforms are so powerful, so flexible, and so poised to take on new challenges. It lies in the philosopohies of treating almost everything as files, allowing one app to do one thing and do it well, and the piping and redirection of files between these apps.

We’ve all been clamoring for interoperability between offline apps and online apps for decades. We want our office xml formats, our plugin binary APIs opened, our web services soap enabled, our data from disparate apps sync’d with other disparate apps, etc., etc.

The answer is not some strict, spec’d to the nines, validated, schema’d xml, xslt, dtd monstrosity.

The answer has been with us all along. Common-sense, pragmatic, simplified, open protocols and text-based data formats.

Read up on The Unix Philosophy: A Brief Introduction.

The problem is the Unix Philosophy was never extended to the interaction of users on the http protocol. Certainly CGI mimicks command-line, with arguments as query string and STDIN as post data, but somehow we missed the boat completely. We made a mess of taking post data and accepting url parameters. We further made a monstrosity by coming up with all means of pretty structuring and formatting of the traditional key=value data in things like xml.

Just imagine if all the utilities you enjoy on linux where available via simple apps on all platforms all calling out to webified command-line utilities on the web using the same arguments and input as you would right at the console.