Monitoring today sucks. Big time. It sucks so bad, it’s not even funny. The amount of time spent configuring stuff, dealing with problems when it’s already too late, and the amount of things your monitoring system could be monitoring, but isn’t, are all staggering. I’ll be spending a couple of posts whining about this. Who knows? Maybe I have a solution up my sleeve by the end of it.
Active vs. passive monitoring
Active monitoring.. It sounds cool. Way cooler than passive. Most of the time, if you have a choice between an active and a passive something, you go with the active one, right? Well, not this time.
The amount of times I’ve seen people set up their monitoring system to access an HTTP URL especially crafted to be useless, to simply respond to the probe as quickly as possible, is ridiculous. It’s surely active, but it’s almost entirely useless. Sure, if this is a service noone uses, it’s probably fine, but if this is a service that has almost any sort of real world use, in the customary 5 minutes between each of these “pings”, there will have been dozens, scores, if not hundreds or thousands of actual requests. Requests that actually did something. Exercised your service at least to some extent. Sadly, this information is almost universally ignored.
Telling Apache to log the amount of time it took to serve a request is trivial. Collecting this information is trivial. Feeding that data to your monitoring system (if not on a per-request basis, just a maximum request time over the last 10 seconds would be a vast improvement) really shouldn’t be too hard. So why don’t you?