OpenStack design tenets – Part 2

Sponsored links

In my last post, I talked about how we’ve deviated from our original design tenets. I’d like to talk a bit about how we can move forward.

I guess first of all I should point out that I think the design tenets are sound and we’re doing it wrong by not following them, so the solution isn’t to just throw away the design tenets or replace them with new, shitty ones.

I should also point out that my criticism does not apply to Swift. Swift mostly gets it right.

If we want OpenStack to scale, to be resilient in the face of network failures, etc., we need to start going through the various components and see how they violate the design tenets. It’s no secret that I think our central data store is our biggest embarrassment. I cannot talk about our “distributed architecture” and then go on to talk about our central datastore while keeping a straight face.

I don’t believe there’s anything you can do to MySQL to make it acceptable for our use case. That goes for any flavour of MySQL, including Galera. You can make it “HA” in various ways, but at best you’ll be compartmentalising failures. A network partition will inevitably render a good chunk of your cloud unusable, since it won’t be able to interact completely with its datastore.

What we need is a truly distributed, fault tolerant, eventually consistent data store. Things like Riak and Cassandra spring to mind. And, while we’re at it, I think it’s time we stop dealing with the data store directly from the individual projects and instead abstract it away as a separate service that we can expose publically as well as consume internally. I know this community enjoys defining our own API’s with our own semantics, but I think we’d be doing ourselves a horrible disservice by not taking a good, hard look at AWS’s database services and working out how we can rework our datastore usage to function under the constraints these services impose.

I’m delighted to learn that we also have a queueing service in the works. As awesome as RabbitMQ is(!), it’s still a centralised component. ZeroMQ would probably solve a lot of this as well, but having an actual queueing service that we can expose publically as well as consume internall makes a ton of sense to me.

If we make these changes, that’ll take us a long way. What else do you think we should do?

7 thoughts on “OpenStack design tenets – Part 2

  1. Soren Post author

    Ryan,

    I intionally didn’t mention MongoDB :)

    Ok then. Enlighten me. Where can I read about Wikipedia’s database architecture so that we can have an actual conversation rather than just exchanging snarky, sarcastic comments?

  2. Soren Post author

    Let me just guess.. You’re sharding based on language and have a whole bunch of readonly slaves to offload the master?

  3. Pingback: Cool blog posts about OpenStack and others | Cloudistic

  4. Mink

    Actually, I like this idea. Cassandra’s idea that every node is equal gives you less complexity when your cloud grows beyond the simplest installation. Also, with the new CAS-implementation in the latests Cassandra release you can use the database in those cases where you really need atomic updates (e.g usernames).

  5. Ryan Lane

    That is indeed how we’re sharding. It works well to a point. English Wikipedia is obviously a problem and it’s definitely a problem across datacenters. Facebook is a better example of how to make this work. The Google system I linked is probably one of the best examples to work from right now.

    My only point is that it’s possible to properly scale MySQL and that a number of very large organizations do this. Of course, all of these large organizations also use a number of NoSQLs as well, when that use is appropriate.

    I decided to make the point through sarcasm since that was the writing style used for the two posts ;).

  6. Soren Post author

    > That is indeed how we’re sharding.

    Ok.

    First of all, I don’t believe sharding is an appropriate angle of attack for what we’re doing in OpenStack. At all. If we shard by e.g. initial letter of the tenant name, users whose resources are unavailable won’t give a ¤!% that other users whose tenant names start with another letter can still access their stuff. That’s just not good enough.

    I keep making this point, but it seems to never get across: This isn’t about scalability in terms of having *capacity* to handle large amounts of users, VM’s, networks, IP’s or any other resource. This is about scalability in terms of having *reliability* in handling large amounts of users, VM’s, etc.

    The question is: When something breaks, what sort of service degradation is acceptable? Is it ok to limit it to a particular subset of users? Is it ok to limit it to a particular network segment? Is it ok to limit it to a certain type of resource? Is it ok to fall back to read-only access to various things? Essentially, I think the answer to all those questions is “no”.

    If a compute node breaks, you should only lose access to the VM’s running on that compute node.

    If a networking issue splits your datacenter in half, IMO the only acceptable service degradation is failure to communicate between VM’s runnning in each half.

    Etc.

    VM’s on either side of a network partition should remain able to query/request/delete resources. This might be idealistic, but that’s what we should be aiming towards. A Galera based datastore for instance would only allow the half which happens to hold the quorate masters to remain fully functional. The rest would be unable to do anything at all.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>