The ever awesome James Page posted an article on an experiment conducted by the Ubuntu Server team:
It’s a fine post and I’m happy that they did this experiment. However, I find myself shaking my head in disbelief at these findings. I don’t question their accuracy, I’m just disappointed with the results.
First of all, 4 years into a project whose original intent was a service provider focused platform for running public clouds, a 640 node cluster (1536 cores) shouldn’t be a rarity. If anything, it should be an unusually *small* OpenStack cloud. It’s thought provoking to me that they had to disable Neutron, tweak RabbitMQ settings, etc. for it to work at all.
Let’s look at the numbers, though. The 168,000 instances were spawned on a cluster that grew halfway through the experiment, so I’m going to ignore that particular experiment. I’m going to guess the numbers aren’t going to be prettier at that larger scale anyway.
So, apparently, they got 75,000 instances running on 374 compute nodes in 6 hours and 33 minutes (= 393 minutes). That’s an average of 191 instances launched per minute.
They got 100,000 instance running in 10 hours and 49 minutes (= 649 minutes). That’s an average of 154 instances launched per minute. That’s a rather significant drop from 191. From 6 hours and 33 minutes to 10 hours and 49 minutes is 4 hours, 16 minutes = 256 minutes. Those last 25,000 instances were launched at an average rate of 98 per minute. That’s almost half the launch rate of the first 75,000. Wow.
Looking at it another way, 374 nodes each with 4 cores gives us a total of 1496 cores. 649 minutes, 1496 cores = 970,904 core minutes. With 100,000 instances launched, that’s an average of 9.7 core minutes per instance launch.
9.7 minutes. That’s embarassing. 30 seconds would be acceptable, but something like 10 seconds should be perfectly possible measured from when the launch request is sent until its state is RUNNING, and then another 20 seconds to finish the boot sequence and start listening for SSH connections.