“Failure is not an option”, made famous by Apollo 13, became so popular that actor, Ed Harris who played Gene Kranz, used it as the title for his memoirs.
Arguably the biggest failure of all time was the O-ring leading to the fatal destruction of the Challenger Space Shuttle, on January 28, 1986. The Rogers Commission investigated the incident and discovered endemic organisational issues within NASA. Whilst “Failure is not an option” was the way the engineers and controllers behaved, it found a discrepancy between these groups and management.
As IT increasingly works its way into every aspect of our lives, “Failure is not an option” has a real meaning. Safety systems such as flight control, automatic pilots, traffic light management, rail power management and the much hyped driverless car all have safety costs clearly aligned to the risk to human life.
At what point is there an assessment of the cost of failure of your every day, garden variety IT project?
There is a cost to address failure
Large companies looking at IT solutions with a safety consequence make a (hopefully) scientific assessment of how much they should spend in order to meet safety standards. At some stage in the spending cycle, the returns diminish to the point where it is not worth spending the extra money. It is an informed decision at that point to stop spending to improve the level of safety, or address the risk of failure. For example, the language will be “we have built the system to five nines (99.999%)”. This means they stopped spending on addressing a failure in the system when that failure is only likely to occur once in every 100,000 times it is used.
For your garden variety IT solution, these assessments are usually not made. Sure, there is a risk register or maybe even a risk mitigation plan but is there an upfront assessment of the chances of failure and what the business can tolerate? It’s probably not required in most projects from a cost-benefit view. And what would you do with the information? Buy the more expensive option?
If we look back at #censusfail; it was an embarrassment, but no one died. The question remains though, will the impact of the failure and why it occurred be properly assessed? Sure, a Minister and an IT company will cop some stick for the failure, but where is the “Rogers Commission” to go beyond the high-level symptoms? Who looks at the root cause? Probably no-one and the cycle will be repeated.
Lemons to lemonade
Procurement, IT projects and start-ups have similar constraints: fixed requirements and limited budget. More often than not, the business is focused on “Failure is not an option”. This, however, does not always translate to proper management of expectations around scope and budget, and of course time to market or completion.
When you look at cheaper options, do you consider that “it’s only cheaper if it works”? I challenge you to do your own “Rogers Commission” on budget blowouts or poor user adoption. Look beyond the usual suspects. Did you take the cheaper option in the hope that it would work the same or better than the more expensive option? Did the final solution end up costing more?
If you are open with the answers to these questions, you might find the more expensive option would have been the right decision, influencing how you make better decisions in the future. And your failure could, in the long term, end up being the best decision you’ve made, if you can learn from it.