Author: Sven Gregori / Source: Hackaday
Let’s be honest, no one likes to see their program crash. It’s a clear sign that something is wrong with our code, and that’s a truth we don’t like to see. We try our best to avoid such a situation, and we’ve seen how compiler warnings and other static code analysis tools can help us to detect and prevent possible flaws in our code, which could otherwise lead to its demise.
But what if I told you that crashing your program is actually a great way to improve its overall quality? Now, this obviously sounds a bit counterintuitive, after all we are talking about preventing our code from misbehaving, so why would we want to purposely break it?Wandering around in an environment of ones and zeroes makes it easy to forget that reality is usually a lot less black and white. Yes, a program crash is bad — it hurts the ego, makes us look bad, and most of all, it is simply annoying. But is it really the worst that could happen? What if, say, some bad pointer handling doesn’t cause an instant segmentation fault, but instead happily introduces some garbage data to the system, widely opening the gates to virtually any outcome imaginable, from minor glitches to severe security vulnerabilities. Is this really a better option? And it doesn’t have to be pointers, or anything of C’s shortcomings in particular, we can end up with invalid data and unforeseen scenarios in virtually any language.
It doesn’t matter how often we hear that every piece of software is too complex to ever fully understand it, or how everything that can go wrong will go wrong. We are fully aware of all the wisdom and cliches, and completely ignore them or weasel our way out of it every time we put a /* this should never happen */
comment in our code.
So today, we are going to look into our options to deal with such unanticipated situations, how we can utilize a deliberate crash to improve our code in the future, and why the average error message is mostly useless.
When Things Go Wrong
Let’s stick with a scenario where we end up with unexpected garbage data. How we got in such a situation could have many reasons: bad pointer handling, uninitialized variables, accessing memory outside defined boundaries, or a bad cleanup routine for outdated data — to name a few. How such a scenario ends, depends of course on the checks we perform, but more importantly, exactly what data we’re dealing with.
In some cases the consequences will be fairly obvious and instant, and we can look into it right away, but in the worst case, the garbage makes enough sense to remain undetected at first. Maybe we are working with valid but outdated data, or the data happens to be all zeroes and a NULL
check in the right spot averts the disaster. We might even get away with it altogether. Well, that is, until the code runs in a whole different environment for the first time.
Everything is easier with an example, so let’s pretend we collect some generic data that consists of a time stamp and a value between 0 and 100 inclusive. Whenever the data’s time stamp is newer than the previous one, we shall do something with the value.
123456789 | struct data { // data timestamp in seconds since epoch time_t timestamp; // new data value in range [0, 100] uint8_t value; }; void do_something( struct data *data) { // make sure data isn't NULL if (data != NULL) { // make sure data is newer than the previous if (data->timestamp > last_timestamp) { // make sure value is in valid range if (data->value <= 100) { // do something with the value ... } else { // this should never happen [TM] } // update timestamp last_timestamp = data->timestamp; } } } |
This seems like a reasonable implementation: no accidental NULL
dereferencing, and the logic matches the description. That should cover all the bases — and it probably does, until we end up with a pointer that leads to a bogus time stamp thousands of years from now, causing all further value processing to be skipped until then.
Often times, a problem like this gets fixed by adjusting the validation check. In our example, we could include the current time and make sure that time differences are within a certain period, and we should be fine. Until we end up in a situation where the time stamp is fine, but the value isn’t. Maybe we see a lot of outliers, so we add extra logic to filter them out, or smoothen them with some averaging algorithm.
As a result, the seemingly trivial task of checking that the data is newer and within a defined range exploded in overall complexity, potentially leading to more corner cases we haven’t thought about and we need to deal with at a later point. Not to mention that we ignore the simple fact that we are dealing with data that shouldn’t be there in the first place. We’re essentially treating the symptoms and not the cause.
Crash Where Crashing Is Due
The thing is, by the time we can tell that our data isn’t as expected, it’s already too late. By working around the symptoms, we’re not only introducing unnecessary complexity (which we most likely have to drag along to every other place the data is passed on to), but are also covering up the real problem hiding underneath. That hidden problem won’t disappear by ignoring it, and sooner or later it will cause real consequences that force us to debug it for good. Except, by that time, we may have obscured its path so well that it takes a lot more effort to work our way back to the origin of the problem.
Worst case, we never get there, and instead, we keep on implementing workaround after workaround, spinning in circles, with the next bug just waiting to happen. We tiptoe around the issue for the sake of keeping the program running, and ignore how futile that is as a long-term solution. We might as well give up and abort right here and now — and I say, you should do exactly that.
Sure, crashing our program is no long-term solution either, but it also isn’t meant to be one. It is meant as indicator that we ended up in a situation that we didn’t anticipate, and…
The post Crash your code – Lessons Learned From Debugging Things That Should Never Happen™ appeared first on FeedBox.