Tag Archives: planetopenstack

New GPG key – Please help :)

Hash: SHA512

(Thanks to Colin Watson for the template for this post)

I've finally gotten around to setting up a new, strong (4096 bit) RSA-
based GPG-key, and will be transitioning away from my old 1024 bit DSA
key. The old key will continue to be valid for some time, but I prefer
all future correspondence to use the new one. I would also like to
ensure that this new key is well-integrated into the web of trust. This
message is signed by both keys to certify the transition.

The old DSA key was:

pub   1024D/E8BDA4E3 2002-02-22
      Key fingerprint = 196A 89ED 78F3 9047 2A36  F327 A278 DF5E E8BD A4E3

The new RSA key is:

pub   4096R/9EAAF9C5 2011-06-15
      Key fingerprint = E6BC C692 3553 A464 8514  28D1 EE67 E7D3 9EAA F9C5

To fetch my new key from a public key server, you can run:

  gpg --keyserver subkeys.pgp.net --recv-keys 9EAAF9C5

If you already know my old key, you can now verify that the new key is
signed by the old one:

  gpg --check-sigs 9EAAF9C5

If you don't already know my old key, or if you're extra-paranoid, you
can check the fingerprint against the one given above:

  gpg --fingerprint 9EAAF9C5

If you have previously signed my old DSA key, and if you're satisfied
that you've got the correct new RSA key, then I'd appreciate it if you
would sign my new key as well:

  caff 9EAAF9C5

The caff program is in the signing-party package in Debian and its
derivatives, including Ubuntu. Please be careful to generate signatures
that don't rely on the weakening SHA-1 hash algorithm, which requires
some careful configuration even if you've already configured gpg
correctly. See http://www.gag.com/bdale/blog/posts/Strong_Keys.html for
the gory details.

Soren Hansen
Version: GnuPG v1.4.11 (GNU/Linux)


Another week of Openstack stabilisation

I got good feedback on last week’s post about the stuff I’d achieved in Openstack, so I figured I’d do the same this week.

We left the hero of our tale (that would be me (it’s my blog, I can entitle myself however I please)) last Friday somewhat bleary eyed, hacking on a mountall patch that would more gracefully handle SIGPIPE caused by Plymouth going the way of the SIGSEGV. I got the ever awesome Scott James Remnant to review it and he (rightfully) told me to fix it in Plymouth instead. My suggested patch was much more of a workaround than a fix, but I wasn’t really in the mood to deal with Plymouth. Somehow, I had just gotten it into my head that fixing it in Plymouth would be extremely complicated. That probably had to do with the fact that I’d forgotten about MSG_NOSIGNAL for a little bit, and I imagined fixing this problem without MSG_NOSIGNAL would probably mean rewriting a bunch of I/O routines which I certainly didn’t have the brain power for at the time. Nevertheless,  a few attempts later, I got it worked out. I sent it upstream, but it seems to be stuck in the moderation queue for now.

I spent almost a day and a half wondering why some of our unit tests were failing “randomly”. It only happened every once in a while, and every time I tried running it under e.g. strace, it worked. It had “race condition” written all over it. After a lot of swearing, rude gestures and attempts to nail down the race condition, I finally noticed that it only failed if a randomly generated security group name in the test case sorted earlier than “default”, which it would do about 20% of the time. We had recently fixed DescribeSecurityGroups to return an ordered resultset which broke an assumption in this test case. Extremely annoying. My initial proposed fix was a mere 10 characters, but it ended up slightly larger, but the resulting code was easier on the eyes.

Log file handling has been a bit of an eye sore in Nova since The Big Eventlet Merge™. Since then, the Ubuntu packages have simply piped stdout and stderr to a log file and restartet the workers when the log files needed rotating. I finally got fed up with this and resurrected the logdir option and after one futile attempt, I got the log files to rotate without even reloading the workers. Sanity restored.

With all this done, I could now realiably run all the instances I wanted. However, I’d noticed that they’d all be run sequentially. Our workers, while built on top of eventlet, were single-threaded. They could only handle one RPC call at a time. This meant that if the compute worker was handling a long request (e.g. one that involved downloading a large image, and postprocessing it with copy-on-write disabled), another user just wanting to look at their instance’s console output might have to wait minutes for that request to be served. This was causing my tests to take forever to run, so a’fixin’ I went. This means that each worker can now (theoretically) handle 1024 (or any other number you choose) requests at a time.

To test this, I cranked up the concurrency of my tests so that up to 6 instances could started at the same time on each host. This worked about 80% of the time. The remaining 20% instances would entirely fail to be spawned. As could have been predicted, this was a new race condition that was uncovered because we suddenly had actual concurrency in the RPC workers. This time, iptables-restore would fail when trying to run multiple instances at the exact same time. I’ve been wanting to rework our iptables handling for a looong time anyway, so this was a great reason to get to work on that. By 2 AM between Friday and Saturday, I still wasn’t quite happy with it, so you’ll have to read the next post in this series to know how it all worked out.

A week into OpenStack’s third release cycle…

With OpenStack’s second release safely out the door last week, we’re now well on our way towards the next release, due out in April. This release will be focusing on stability and deployability.

To this end, I’ve set up a HudsonJenkins box that runs a bunch of tests for me. I’ve used Jenkins before, but never in this (unintentional TDD) sort of way and I’d like to share how it’s been useful to me.

I have three physical hosts. One runs Lucid, one runs Maverick, and one runs Natty. I’ve set them up as slaves of my Hudson server (which runs separately on a cloud server at Rackspace).

I started out by adding a simple install job. It would blow away existing configuration and install afresh from our trunk PPA, create an admin user, download the Natty UEC image and upload it to the “cloud”. This went reasonably smoothly.

Then I started exercising various parts of the EC2 API (which happens to be what I’m most fluent in). I would:

  1. create a keypair (euca-create-keypair),
  2. find the image id (euca-describe-images with a bit of grep),
  3. run an instance (euca-run-instances),
  4. wait for it to go into the “running” state (euca-describe-instances),
  5. open up port 22 in the default security group (euca-authorize),
  6. find the ip (euca-describe-instances),
  7. connect to the guest and run a command (ssh),
  8. terminate the instance (euca-terminate-instances),
  9. close port 22 in the security group again (euca-revoke),
  10. delete the keypair (euca-delete-keypair),

I was using SQLite as the data store (the default in the packages) and it was known to have concurrency issues (it would timeout attempting to lock the DB), so I wrapped all euca-* commands in a retry loop that would try everything up to 10 times. This was good enough to get me started.

So, pretty soon I would see instances failing to start. However, once Jenkins was done with them, it would terminate them, and I didn’t have anything left to use for debugging. I decided to add the console log to the Jenkins output, so I just added a call to euca-get-console-output. They revealed that every so often, they’d fail to get an IP from dnsmasq. The syslog had a lot of entries from dnsmasq refusing to hand out the IP that Nova asked it to, because it already belonged to someone else. Clearly, Nova was recycling IP’s too quickly. It read through the code that was supposed to handle this several times, and it looked great. I tried drawing it on my whiteboard to see where it would fall through the cracks. Nothing. Then I tried logging the SQL for that specific operation, and it looked just fine. It wasn’t until I actually copied the sql from the logs and ran it in sqlite3’s CLI that I realised it would recycle IP’s that had just been leased. It took me hours to realise that sqlite didn’t compare these as timestamps, but as strings. They were formatted slightly differently, so it would almost always match. An 11 character patch later, this problem was solved. 1½ days of work. -11 characters. That’s about -1 character an hour. Rackspace is clearly getting their money’s worth having me work for them. I could do this all day!

That got me a bit further. Instances would now reliably come up, one at a time. I expanded out a bit, trying to run two instances at a time. This quickly  blew up in my face. This time I made do with a 4 character patch. Awesome.

At this point, I’d had too many problems with sqlite locking that I got fed up. I was close to just replacing it with MySQL to get it over with, but then I decided that it just didn’t make sense. Sure, it’s a single file and we’re using it from different threads and different processes, but we’re not pounding on it. They really ought to be able to take turns. It took quite a bit of Googling and wondering, but eventually I came up with a (counting effectively changed lines of code) 4 line patch that would tell SQLAlchemy to don’t hold connections to sqlite open. Ever. That totally solved it. I was rather surprised, to be honest. I could now remove all the retry loops, and it’s worked perfectly ever since.

So far, so good. Then I decided to try to go even more agressive. I would let the three boxes all target a single one, so they’d all three run as clients against the same single-box “cloud”. I realised that because I used private addressing, I had to expand my tests and use floating ip’s to be able to reach VM’s from another box. Having done so, I realised that this didn’t work on the box itself. A 4 line patch (really only 2 lines, but I had to split them for pep8 compliance) later, and I was ready to rock and roll.

It quickly turned out that, as I had suspected, my 4 character patch earlier wasn’t broad enough, so I expanded a bit on that (4 lines modified).

Today, though, I found that surprising amount of VM’s were failing to boot, ending up with the dreaded:

General error mounting filesystems.
A maintenance shell will now be started.
CONTROL-D will terminate this shell and reboot the system.
Give root password for maintenance
(or type Control-D to continue):

I tried changing the block device type (we use virtio by default, so I tried ide and scsi), I tried not using copy-on-write images, I tried disabling any code that would touch the images. Nothing worked. I blamed the kernel, I blamed qemu, everything.  I replaced everything, piece by piece, and it still failed quite often. After a long day of debugging, I ended looking at mountall. It seems Plymouth often segfaults in these settings (where the only console is a serial port), and when it does, mountall dies, killed by SIGPIPE. A  5 line (plus a bunch of comments) patch to mountall, that is still pending review, and I can now run hundreds of VM’s in a row and (5-10-ish) in parallel with no failures at all.

So, in the future, Jenkins will provide me with a great way to test drive and validate my changes, making sure that I don’t break anything, but right now, I’m extending the tests, discovering bugs and fixing them as I extend the test suite, very test-driven-development-y. It’s quite nice. At this rate, I should have pretty good test coverage pretty soon and be able to stay confident that things keep working.

It also think it’s kind of cool how much of a difference this week has made in terms of stability of the whole stack and only 19 lines of code have been touched. :)

Moving duplicity (and Deja-Dup) backups

In my last blog post I said that I had moved my backups from an external disk to Rackspace Cloud Files and promised I’d explain how.

Ok, so why bother? I had about 100 GB of data that was being backed up. I didn’t want to upload 99% of that, have my wifi go bonkers, and then have to start over (because Duplicity apparently isn’t very good at resuming). So, instead I wanted to make the initial backup to an external drive (the backup wouldn’t fit on my laptop’s hard drive) and defer copying it to Rackspace as time and connectivity permitted.

That was simple enough.

Once the first, full backup was made, I wanted incremental backups to go directly to Cloud Files, so I needed to get Deja-Dup to realise that there was already a backup on there.

This was the trickier bit.

When you ask Duplicity to interact with a particular backup location, it calculates a hash of the URI of it and looks that up in its cache to see if it knows about it already. If you’ve made a backup with deja-dup, you can go and look in $HOME/.cache/deja-dup. This is what I had:

soren@lenny:~$ ls -l $HOME/.cache/deja-dup/
drwxr-xr-x 2 soren soren 4096 2011-01-14 18:09 4e33cf513fa4772471272dbd07fca5be

You see a directory named after the hash of the uri of the backup location I used, namely “file:///media/backup” (the MD5 sum of which is 4e33cf513fa4772471272dbd07fca5be).

Inside this directory, we find:

soren@lenny:~$ ls -l /home/soren/.cache/deja-dup/4e33cf513fa4772471272dbd07fca5be/
-rw------- 1 soren soren 750938885 Jan 14 15:47 duplicity-full-signatures.20110113T170937Z.sigtar.gz
-rw------- 1 soren soren    653487 Jan 14 15:47 duplicity-full.20110113T170937Z.manifest

It contains a manifest and a signature file. These files in there have no record of the backup location. That information exists only in the name of the directory. Essentially, all I needed to do was to rename the directory to match the Cloud Files location. Being a bit cautious, I decided to copy it instead. The URI for a container on Cloud Files looks like “cf+http://containername”. Knowing this, it was as simple as:

soren@lenny:~$ echo -n 'cf+http://lenny' | md5sum
2f66137249874ed1fdc952e9349912d4 -
soren@lenny:~$ cd $HOME/.cache/deja-dup
soren@lenny:~/.cache/deja-dup$ cp -r 4e33cf513fa4772471272dbd07fca5be 2f66137249874ed1fdc952e9349912d4

The -n option to echo is essential. Without it, I’d have been calculating the MD5 sum of the URI with a trailing newline.

Before I ran deja-dup again, I made sure the two files above were copied to Cloud Files. If I hadn’t, the first time duplicity would talk to Cloud Files, it would realise that these files don’t exist on the expected backup location, hence the local cache of them must be invalid, so it would delete them. This happened to me the first time, so making a copy rather than just renaming the directory turned out to be a good idea.

All that was left to do now was to change my backup location in Deja-Dup. This should be simple enough, so I won’t go into detail about that.

The best part about this, I think, is that wasn’t until 5-6 days later, that my upload of the initial full backup finished. However, in the mean time, I was able to do incremental backups just fine, because all it needs to do that is the signature files from the previous runs.

Oh, and to actually upload the files, I used the “st” tool from Swift. Something like this:

soren@lenny:~$ cd /media/backup
soren@lenny:/media/backup$ st -A https://auth.api.rackspacecloud.com/v1.0 -U soren -K 6e6f742061206368616e636521212121 upload lenny *

It only took me 20 years..

tl;dr: I now have daily backups of my laptop, powered by Rackspace Cloud Files (powered by Openstack), Deja-Dup, and Duplicity.

I’ve been using computers for a long time. If memory serves, I got my first PC when I was 9, so that’s 20 years ago now. At various times, I’ve set up some sort of backup system, but I always ended up

  • annoyed that I couldn’t acutally *use* the biggest drive I had, because it was reserved for backups,
  • annoyed because I had to go and connect the drive and do something active to get backups running, because having the disk always plugged into my system might mean the backup got toasted along with my active data when disaster struck,
  • and annoyed at a bunch of other things.

Cloud storage solves the hardest part of this. With Rackspace Cloud Files, I have access to an infinite[1] amount of storage. I can just keep pushing data, Rackspace keep them safe, and I pay for exactly how much space I’m using. Awesome.

All I need is something that can actually make backups for me and upload them to Cloud Files. I’ve known about Duplicity for a long time, and I also knew that it’s been able to talk to Cloud Files for a while, but I never got into the habit of running it at regular intervals, and running it from cron was annoying, because maybe I didn’t have my laptop on when it wanted to run, and if I wasn’t logged in, by homedir would be encrypted anyway, etc. etc. Lots of chances for failure.

Enter Deja-Dup! Deja-dup is a project spearheaded by my awesome, former colleague at Canonical, Mike Terry. It uses Duplicity on the backend, and gives me a nice, really simple frontend to get it set up. It has its own timing mechanism that runs in my GNOME desktop session. This means it only runs when my laptop is on and I’m logged in. Every once in a while, it checks how long it’s been since my last backup. If it’s more than a day, an icon pops up in the notification area that offers to run a backup. I’ve only been using this for a day, so it’s only asked me once. I’m not sure if it starts on its own if I give it long enough.

A couple of caveats:

  • Deja-dup needs a very fresh version of libnotify, which means you need to either be running Ubuntu Natty, use backported libraries, or patch Deja-dup to work with the version of libnotify in Maverick. I opted for the latter approach.
  • I have a lot of data. Around 100GB worth. Some of it is VM’s, some of it is code, some of it is various media files. Duplicity doesn’t support resuming a backup if it breaks halfway, and I “only” have 8 Mbit/s upstream bandwidth.. That meant I had to stay connected to the Internet for 28 hours straight (in a perfect world) and not have anything unexpected happen along the way. I wasn’t really interested in that, so I made my initial backup to an external drive and I’m now copying the contents of that to Rackspace at my own pace. I can stop and resume at will. The tricky part here was to get Deja-Dup to understand that the backup it thinks is on an external drive really is on Cloud Files. I’ll save that for a separate post.

[1]: Maybe not actually infinite, but infinite enough.

Openstack Nova in Maverick

Ubuntu Maverick was released yesterday. Big congrats to the Ubuntu team for another release well out the door.

As you may know, both Openstack storage (Swift) and compute (Nova) are available in the Ubuntu repositories. We haven’t made a proper release of Nova yet, so that’s a development snapshot, but it’s in reasonably good shape. Swift, on the other hand, should be in very good shape and be production ready. I’ve worked mostly on Nova, so that’s what I’ll focus on.

So, to get to play with Nova in Maverick on a single machine, here are the instructions:

sudo apt-get install rabbitmq-server redis-server
sudo apt-get install nova-api nova-objectstore nova-compute \
                nova-scheduler nova-network euca2ools unzip

rabbitmq-server and redis-server are not stated as dependencies of Nova in the packages, because they don’t need to live on the same host. In fact, as soon as you add the next compute node (or API node or whatever), you’ll want to use a remote rabbitmq server and a remote database, too. But, for our small experiment here, we need a rabbitmq server and a redis server (it’s very likely that the final release of Nova will not require Redis, but for now, we need it).

A quick explanation of the different components:

is a messaging system the implements AMQP.  Basically, it’s a server that passes messages around between the other components that make up Nova.
is the API server (I was schocked to learn this, too!) . It implements a subset of the Amazon EC2. We’re working on adding the rest, but it takes time. It also implements a subset of the Rackspace API.
stores objects. It implements the S3 API. It’s quite crude. If you’re serious about storing objects, Swift is what you want. Really.
the component that runs virtual machines.
the network worker. Depending on configuration, it may just assign IP’s or it could work as the gateway for a bunch of NAT’ed VM’s.
the scheduler (another schocker). When a user wants to run a virtual machine, they send a request to the API server. The API server asks the network worker for an IP and then passes off handling to the scheduler. The scheduler decides which host gets to run the VM.

Once it’s done installing (which should be a breeze), you can create an admin user (I name mine “soren” for obvious reasons):

sudo nova-manage user admin soren

and create a project (also named soren) with the above user as the project admin:

sudo nova-manage project create soren soren

Now, you’ll want to get a hold of your credentials:

sudo nova-manage project zipfile soren soren

This yields a nova.zip in the current working directory. Unzip it..

unzip nova.zip

and source the rc file:

. novarc

And now you’re ready to go!

Let’s just repeat all that in one go, shall we?

sudo apt-get install rabbitmq-server redis-server
sudo apt-get install nova-api nova-objectstore nova-compute \
                nova-scheduler nova-network euca2ools unzip
sudo nova-manage user admin soren
sudo nova-manage project create soren soren
sudo nova-manage project zipfile soren soren
unzip nova.zip
. novarc

That’s pretty much it. Now your cloud is up and running, you’ve created an admin user and retrieved the corresponding credentials and put them in your environment.
This is not much fun without any VM’s to run, so you need to add some images. We have some small images we use for testing that you can download here:

wget http://c2477062.cdn.cloudfiles.rackspacecloud.com/images.tgz

Extract that file:

tar xvzf images.tgz

This gives you a directory tree like this:

|-- aki-lucid
|   |-- image
|   `-- info.json
|-- ami-tiny
|   |-- image
|   `-- info.json
`-- ari-lucid
    |-- image
    `-- info.json

As a shortcut, you could just extract this directly in /var/lib/nova and change the permisssions appropriately, but to get the full experience, we’ll use euca-* to get these images uploaded.

euca-bundle-image -i images/aki-lucid/image -p kernel --kernel true
euca-bundle-image -i images/ari-lucid/image -p ramdisk --ramdisk true
euca-upload-bundle -m /tmp/kernel.manifest.xml -b mybucket
euca-upload-bundle -m /tmp/ramdisk.manifest.xml -b mybucket
out=$(euca-register mybucket/kernel.manifest.xml)
[ $? -eq 0 ] && kernel=$(echo $out | awk -- '{ print $2 }') || echo $out

out=$(euca-register mybucket/ramdisk.manifest.xml)
[ $? -eq 0 ] && ramdisk=$(echo $out | awk -- '{ print $2 }') || echo $out

euca-bundle-image -i images/ami-tiny/image -p machine  --kernel $kernel --ramdisk $ramdisk
euca-upload-bundle -m /tmp/machine.manifest.xml -b mybucket
out=$(euca-register mybucket/machine.manifest.xml)
[ $? -eq 0 ] && machine=$(echo $out | awk -- '{ print $2 }') || echo $out
echo kernel: $kernel, ramdisk: $ramdisk, machine: $machine

Alright, so we have images!

Now, we just need a keypair:

euca-add-keypair mykey > mykey.priv
chmod 600 mykey.priv

Let’s run a VM!

euca-run-instances $machine --kernel $kernel --ramdisk $ramdisk -k mykey

This should respond with some info about the VM, among other things, the IP.

In my case, it was

ssh -i mykey.priv root@


I’ll leave it to someone else to provide similar instructions for Swift

OpenStack is open for business

Moments ago Rackspace announced the OpenStack project. Not only is this awesome news in and of itself, it also means that I can finally blog about it :)

The Rackspace’s IaaS offering consists of two parts: Cloud Servers and Cloud Files. Incidentally, OpenStack (so far, at least) has two main components to it: A “compute” compenent called “Nova” and a “storage” component called “Swift”. Swift is the software that runs Rackspace’s Cloud Files today. Nova was initially developed by NASA and is not currently in use at Rackspace, but will eventually replace the existing Cloud Servers platform.

Last week, we held a design summit in Austin, TX, USA, with a bunch of people from companies all around the world who all showed up to see what we were up to and to help out by giving requirements, designing the architecture or write patches. The amount of interest was astounding!

I’m sure others will be blogging at length about all that stuff, so I’d like to touch upon some of the ways in which Nova differs from the alternatives out there. I’ll leave it to someone else to talk about Swift.

  • Nova is written in Python and uses Twisted.
  • Nova is completely open source. There’s no secret sauce. We won’t ever limit functionality or performance so that we can sell you an enterprise edition. It’s all released under the Apache license, so it’s conceivable that some company might write proprietary, for-pay extensions, but it won’t be coming from us. Ever. This is true for Swift as well, by the way.
  • Nova currently uses Redis for its key-value store.
  • Nova can use either LDAP or its key-value store for its user database.
  • Nova currently uses AMQP for messaging, which is the only mechanism with which the different components of Nova communicate.
  • The physical hosts that will run the virtual machines all have a component of Nova running on them. It takes care of setting up disk space and other parts of the virtual machine preparation.
  • It supports the EC2 query API.
  • The Rackspace API is in the works. I expect this will be the basis for the “canonical” API of Nova in the future, but any number of API’s could be supported.

I cannot explain how excited I am about this. Let me know what you think!