Monthly Archives: January 2011

Moving duplicity (and Deja-Dup) backups

In my last blog post I said that I had moved my backups from an external disk to Rackspace Cloud Files and promised I’d explain how.

Ok, so why bother? I had about 100 GB of data that was being backed up. I didn’t want to upload 99% of that, have my wifi go bonkers, and then have to start over (because Duplicity apparently isn’t very good at resuming). So, instead I wanted to make the initial backup to an external drive (the backup wouldn’t fit on my laptop’s hard drive) and defer copying it to Rackspace as time and connectivity permitted.

That was simple enough.

Once the first, full backup was made, I wanted incremental backups to go directly to Cloud Files, so I needed to get Deja-Dup to realise that there was already a backup on there.

This was the trickier bit.

When you ask Duplicity to interact with a particular backup location, it calculates a hash of the URI of it and looks that up in its cache to see if it knows about it already. If you’ve made a backup with deja-dup, you can go and look in $HOME/.cache/deja-dup. This is what I had:

soren@lenny:~$ ls -l $HOME/.cache/deja-dup/
drwxr-xr-x 2 soren soren 4096 2011-01-14 18:09 4e33cf513fa4772471272dbd07fca5be

You see a directory named after the hash of the uri of the backup location I used, namely “file:///media/backup” (the MD5 sum of which is 4e33cf513fa4772471272dbd07fca5be).

Inside this directory, we find:

soren@lenny:~$ ls -l /home/soren/.cache/deja-dup/4e33cf513fa4772471272dbd07fca5be/
-rw------- 1 soren soren 750938885 Jan 14 15:47 duplicity-full-signatures.20110113T170937Z.sigtar.gz
-rw------- 1 soren soren    653487 Jan 14 15:47 duplicity-full.20110113T170937Z.manifest

It contains a manifest and a signature file. These files in there have no record of the backup location. That information exists only in the name of the directory. Essentially, all I needed to do was to rename the directory to match the Cloud Files location. Being a bit cautious, I decided to copy it instead. The URI for a container on Cloud Files looks like “cf+http://containername”. Knowing this, it was as simple as:

soren@lenny:~$ echo -n 'cf+http://lenny' | md5sum
2f66137249874ed1fdc952e9349912d4 -
soren@lenny:~$ cd $HOME/.cache/deja-dup
soren@lenny:~/.cache/deja-dup$ cp -r 4e33cf513fa4772471272dbd07fca5be 2f66137249874ed1fdc952e9349912d4

The -n option to echo is essential. Without it, I’d have been calculating the MD5 sum of the URI with a trailing newline.

Before I ran deja-dup again, I made sure the two files above were copied to Cloud Files. If I hadn’t, the first time duplicity would talk to Cloud Files, it would realise that these files don’t exist on the expected backup location, hence the local cache of them must be invalid, so it would delete them. This happened to me the first time, so making a copy rather than just renaming the directory turned out to be a good idea.

All that was left to do now was to change my backup location in Deja-Dup. This should be simple enough, so I won’t go into detail about that.

The best part about this, I think, is that wasn’t until 5-6 days later, that my upload of the initial full backup finished. However, in the mean time, I was able to do incremental backups just fine, because all it needs to do that is the signature files from the previous runs.

Oh, and to actually upload the files, I used the “st” tool from Swift. Something like this:

soren@lenny:~$ cd /media/backup
soren@lenny:/media/backup$ st -A -U soren -K 6e6f742061206368616e636521212121 upload lenny *

It only took me 20 years..

tl;dr: I now have daily backups of my laptop, powered by Rackspace Cloud Files (powered by Openstack), Deja-Dup, and Duplicity.

I’ve been using computers for a long time. If memory serves, I got my first PC when I was 9, so that’s 20 years ago now. At various times, I’ve set up some sort of backup system, but I always ended up

  • annoyed that I couldn’t acutally *use* the biggest drive I had, because it was reserved for backups,
  • annoyed because I had to go and connect the drive and do something active to get backups running, because having the disk always plugged into my system might mean the backup got toasted along with my active data when disaster struck,
  • and annoyed at a bunch of other things.

Cloud storage solves the hardest part of this. With Rackspace Cloud Files, I have access to an infinite[1] amount of storage. I can just keep pushing data, Rackspace keep them safe, and I pay for exactly how much space I’m using. Awesome.

All I need is something that can actually make backups for me and upload them to Cloud Files. I’ve known about Duplicity for a long time, and I also knew that it’s been able to talk to Cloud Files for a while, but I never got into the habit of running it at regular intervals, and running it from cron was annoying, because maybe I didn’t have my laptop on when it wanted to run, and if I wasn’t logged in, by homedir would be encrypted anyway, etc. etc. Lots of chances for failure.

Enter Deja-Dup! Deja-dup is a project spearheaded by my awesome, former colleague at Canonical, Mike Terry. It uses Duplicity on the backend, and gives me a nice, really simple frontend to get it set up. It has its own timing mechanism that runs in my GNOME desktop session. This means it only runs when my laptop is on and I’m logged in. Every once in a while, it checks how long it’s been since my last backup. If it’s more than a day, an icon pops up in the notification area that offers to run a backup. I’ve only been using this for a day, so it’s only asked me once. I’m not sure if it starts on its own if I give it long enough.

A couple of caveats:

  • Deja-dup needs a very fresh version of libnotify, which means you need to either be running Ubuntu Natty, use backported libraries, or patch Deja-dup to work with the version of libnotify in Maverick. I opted for the latter approach.
  • I have a lot of data. Around 100GB worth. Some of it is VM’s, some of it is code, some of it is various media files. Duplicity doesn’t support resuming a backup if it breaks halfway, and I “only” have 8 Mbit/s upstream bandwidth.. That meant I had to stay connected to the Internet for 28 hours straight (in a perfect world) and not have anything unexpected happen along the way. I wasn’t really interested in that, so I made my initial backup to an external drive and I’m now copying the contents of that to Rackspace at my own pace. I can stop and resume at will. The tricky part here was to get Deja-Dup to understand that the backup it thinks is on an external drive really is on Cloud Files. I’ll save that for a separate post.

[1]: Maybe not actually infinite, but infinite enough.