Zero-Bug Software Development
Perhaps one of the simplest process decisions that a software development team needs to make is also the most controversial: “What is our procedure for dealing with bugs?”
Is it really practical to aspire to have a Zero-Bug policy? And is that even desirable?
With the benefit of significant experience in this area, I can tell you that not only is a Zero-Bug policy absolutely achievable for your team, it is also easier to achieve than you may think.
However, let me start by setting the record straight:
Zero-Bug does not mean bug-free code production; it means striving to eradicate all known bugs.
It is impossible for developers to continuously produce bug-free, production ready code. Bugs will always exist. This article is about getting to a state of zero known bugs and that is absolutely possible.
A Case in Point
I recall working at the BBC about 10 years ago in London. I had just joined as an Agile coach, and I spent the first 2 weeks closely observing the 22-person team dynamics and following the work from start to finish.
The team was unable to achieve stability in their sprint velocity. Having looked back at their history, I could see that in some sprints they would deliver a high number of story points, and in other sprints they would deliver no points at all.
I observed the team constantly dealing with “small issues” at the expense of valuable work. The product owners and stakeholders were frustrated that they were not seeing the features they asked for, and fingers were being pointed in all directions.
It became apparent to me that the “small issues” were a combination of different bug types. There were cases of the bugs being due to implementation defects, but much more so, the bugs were a result of incorrect specification or missing specifications.
Yet, all three bug types were seen as equal, and it fell on the delivery team to deal with them — often urgently — sometimes because higher-ups were not happy, and other times because users were unable to receive the intended value from the system.
I also observed that due to the sheer number of bugs being dealt with, the team became desensitized to them.
After the observation period was over, I enforced the Zero-Bug approach.
It took 1 sprint to put this approach into action. In the following 2 sprints, the velocity increased by a few points, and over the span of 4 sprints it was stabilized. The level of unplanned work was more predictable, the quality of the product improved overall, and of course all of this led to a happier team with boosted morale.
So how does this voodoo magic work? It’s actually a simple system to implement. Allow me to tell you how and why.
How Zero-Bug Works
You start by getting the product owners to label the issues using a very strict classification. The issue has to be one of the following four types:
- Critical Issues
- Bugs
- Features
- Improvements
You then instruct the development team to prioritize the resolution of these issue types.
Here’s how you classify and prioritize each of the issue types, using a shop front as an example:
Critical Issues
A critical issue is like the shop being on fire. If you don’t put out the fire, you will no longer have a shop!
Classification:
Consumers are no longer receiving the value they are entitled to, or money / time is being wasted at an unacceptable rate.
Resolution:
Stop what you’re doing, and fix the issue immediately.
Bugs
Bugs are like water leaks. If you leave them too long, they can spoil your merchandise and slow down your business.
Classification:
The system is not behaving as specified, but consumers are able to receive the value they’re entitled to; or the rate of money / time wasted resulting from the issue is acceptable for the short-term.
Resolution:
Finish what you’re doing, and then fix it.
Features
Features are like products and the means to support the sale of the products. You need both to stay in business.
Classification:
New functionality that does not yet exist in the system.
Resolution:
Work on these in the backlog priority order.
Improvement
Improvements are like keeping the shop clean and modern, and providing customer delight.
Classification:
An enhancement to an existing functionality or system.
Resolution:
Work on these in the backlog priority order.
That’s it. I told you it’s simple!
Why Zero-Bug Works
The concept of critical issues is nothing new and is often referred to as a P1 in many teams. When you see a fire, you put it out.
The crux and the controversial part of this system is that all bugs take priority over all new feature development or improvements. Bug priorities like P2's or P3’s have no place in this Zero-Bug approach.
Or in other words, it is either a bug, or it isn’t. If you can live with it, it’s not a bug — it’s an improvement.
The classification of a bug is binary.
Either the issue is a bug and therefore takes priority, or as is more often the case, the issue can be reclassified as an improvement or even a new feature, and it can be prioritized in the backlog.
Here’s a visual to help you classify new issues as they are raised by your team members:
Classifying the 3 Types of Bugs
As I mentioned above, and in my “Preventing Software Bugs from Ever Occurring” article, there are 3 types of bugs:
- Implementation defects
- Incorrect Specifications
- Missing Specification
Any of these bug types can be classified as a critical issue. Any of them can also be reclassified as an improvement or a new feature. Here are some reclassification guidelines that I like to use:
- Can the implementation defect can be lived with?
e.g. Web font is being downloaded when it should be embedded in the app
Reclassify Bug → Improvement - Is the incorrect specification causing us a loss or a potential loss?
e.g. Specification states track clicks count, but it should be track spending
Reclassify Bug → Critical - Does the missing specification imply new functionality?
e.g. Users not able to edit and share their profile details on social networks
Reclassify Bug → New Feature
Because the development team is instructed to prioritize bugs over all other work, the stakes are raised for the product managers to correctly reclassify bugs, as they know the developers will not work on anything else until all bugs are cleared.
By enforcing a strict set of classification and handling rules, you get prioritization discipline for free.
Forming the Team Quality Standards
When the product manager or team decides whether or not a bug can be reclassified as an improvement, that decision process implicitly states the team quality standards.
For instance, a product owner that emphasizes high quality visuals might have a low tolerance for design discrepancies and would not put up with calling them improvements. Instead, they would classify these discrepancies as bugs.
Being consistent with the classification system implicitly creates the team quality standards.
The reclassification approach allows you to continually readjust and adapt expectation vs reality, while maintaining a structured delivery approach that puts your team quality standards first.
How to Start Today
You start by going through all the existing issues and classify them using the above system. If you have hundreds or thousands of issues, this might be a good time to archive them and start fresh. Don’t worry, you can always move issues from the archive to the backlog as you need to.
The development team does not need to wait until the entire classification exercise is done before they start squashing bugs; they can get started as soon as there are a few bugs classified.
The development team must not start working on any other items in the backlog until all bugs are cleared. NO EXCEPTIONS! If this rule is compromised, the technique will not work. It is this rule that raises the stakes for product managers to prioritize new work correctly.
Do not allow anyone and everyone to place tickets in the bugs column. This can quickly get unruly. Instead, you can create an “unclassified” bucket where anyone can post issues they find. The classification of bugs can then be made by the product managers (It’s also possible for other team members to do this; just be consistent and organized about it).
Let’s visualize what this approach might look like on a workflow board:
Critical issues come first → then what’s in progress → then bugs → then the items in the backlog.
Feel free to reprioritize the order of critical issues and bugs within their respective columns. This can sometimes help manage a crisis. As long as the team stays true to the priority order and the classification of issues, this system will make you go faster at the same time as continually maintaining your quality standards.
Won’t This Slow Us Down?
Quite the opposite actually, as was evident in the BBC case I mentioned above.
Imagine you’re on the fast lane on a highway and you’re moving at 100mph. You realize that you need to get to the exit but you’re going too fast to safely cross. So you speed to the next exit at a 120mph, hope that you don’t miss another exit or hit traffic, and then speed back to the intended exit.
Now imagine you’re on the same highway again but you’re doing 80mph. You realize you’re about to miss the exit, but you have time to correct your course and actually make it to the exit.
Who do you think would make it to the final destination first, and with fewer chances of getting into an accident or pulled over?
It’s exactly the same with the Zero-Bug policy. It slows your team down just enough and in just the right places. The speed increase may not be visible in the day-to-day, but it will be apparent in the throughput over time.
Don’t take my word for it. Try it, measure it, and see for yourself.
This is Not a New Concept
When I came up with this concept, I was quite proud to have done it all by myself. As it turns out, the idea of Zero-Bug is not a new one. Like many of the forgotten old-school wisdoms, it has in fact been around since the 60s!
Philip Crosby, the legendary quality expert, coined the term Zero-Defect when working at the Martin Company (now known as Lockheed Martin) where it was claimed they achieved “a 54% defect reduction in defects in hardware under government audit”.
The Zero-Defect technique was initially used in aerospace manufacturing in the 60s, and was then applied in automotive manufacturing in the 90s.
There are big similarities between software delivery and the manufacturing industry. For example, the popular Agile management modality Kanban originated from the Toyota Production System. What this tells us is that we can look to these manufacturing processes for inspiration in software development, and Zero-Bug is one of those inspirations.
One criticism of the Zero-Defect approach is around the extreme cost of meeting the standard. And this can indeed be true if it is implemented incorrectly. However, in the Zero-Bug policy I’ve directly addressed this problem through the reclassification of bugs to improvements or features. This allows the cost to be controlled through the team quality standards.
Final Thoughts
It is a fact that bugs will always exist in software, and it is a fact that it’s impossible to produce code without bugs.
Zero-Bug does not mean bug-free code production; it means striving to eradicate all known bugs.
I will leave you with one more quote from another legendary quality expert that really sums it up for me:
Let me know if you have any questions or thoughts in the comments below.