flak rss random

production ready

A few thoughts on what it means for software to be production ready. Or rather, what if any information is conveyed to me when I’m told that something is used in production. Millions of users can’t be wrong!

Some time ago, I worked with a framework. It doesn’t matter which, the bugs have all been fixed, and I don’t think it was remarkable. But our team picked it because it was production ready, and then I discovered it wasn’t quite so ready.

Egregious performance because of a naive N^2 algorithm for growing a buffer.

A timezone library that could handle DST, but couldn’t handle the absence of DST, as in it would crash in such exotic locales as Arizona that don’t have DST.

A mail library that didn’t escape dots, thus terminating the SMTP conversation early.

Egregious performance on some platforms due to using the wrong threading primitives.

Bizarre database connection bugs for some queries that I can’t at all explain.

Buffer overflows in the unicode processing functions.

Lots of other assorted issues, but I picked the above because they were all standalone problems fixed with just a few lines of code. And of course, the kind of thing you’d definitely expect to have already been flushed out in a production system.

Now, we built our product on top of this. Some of the bugs were caught internally. Others were discovered by customers, who were of course a little dismayed. Like, how could you possibly ship this? Indeed. We were doing testing, quite a bit really, but when every possible edge case has a bug, it’s hard to find them all.

Previous to making our big bet, there was a validation prototype. It was a success, everything seemed to work, even all the functions we needed to use. But the validation didn’t use production scale data. It was tested with east coast time and west coast time, but not Arizona time. An email was sent, but not one consisting of ”.”. It was demonstrated on multiple platforms, but not performance profiled. Basic database queries were made, but not all of them. Text was demonstrated for English and Êürø languages, but not others. And so on and so forth.

In short, everything was “works for me” quality. But is that really production quality?

There are some obvious contenders for the title of today’s most “production ready” software, but it’s a more general phenomenon. People who have success don’t know what they don’t know, what they didn’t test, what unused features will crash and burn.

Posted 11 Nov 2016 20:11 by tedu Updated: 11 Nov 2016 20:11