Six Reasons You Should Cult Around Code Coverage

I know, and genuinely believe that high code coverage is a weak metric of software quality. However, low code coverage is a strong metric of a lack of some types of quality. Similarly, a downward trend in code coverage is a strong metric that the team could well be cutting corners.

So I’m going to suggest that we adopt a cult approach of enforcing 100% code coverage for everything. As a thought experiment that is. To be clear, I’m not presently doing that on this project.

      thresholds: {
        lines: 97.5,
        branches: 90.6,
        functions: 100,
        statements: 97.5,
      },

It’s not far off… and it’s been way lower than this. In some corners of the project it has sat at fully 100% for a while… let’s say that, though, that I decided to enforce 100%.

Some people might think I would be going mad. But here are 6 reasons why it might have virtuous outcomes.

We Know the Important Code is Covered

In a case of 2.5% of lines not being covered (as above), how do we know which 2.5% it is? Is it dumb boilerplate? or critical business logic? When I add more code and tests, could I accidentally add some coverage into that 2.5% and add some new important lines with no coverage?

While it’s behaviour, not code that we want to protect, the proxy metric of coverage is indicating that the behaviour is covered with tests when we’re at 100%.

As this is a clickbaity title, I’m going to level with you here and clarify that coverage alone is not a measure of the quality of the testing. We’ll come to that a bit.

There’s Peer Pressure To Write Tests

When you force a minimum coverage threshold of any amount, and track it as a team, with build failures on regressing coverage, especially if that threshold is 100%, there’s a general sense of someone looking over your shoulder when you write code and tests.

If we think someone’s going to check that we’re coding and testing in the right ratio, then there’s an additional conscience. So long as that peer pressure comes from the desire to build appropriate tests for appropriately well behaved code, then the metric is just a conscience. At 100% coverage it’s also a black and white check. We’ll know when we’ve done it enough.

Library Updates are Less Scary

Some of the code we write is really bland and banal. We know it works before we run it the first time, and we’ll soon find out if it doesn’t. Maybe we don’t need to test something that uses a library. The library itself is well tested.

However, libraries update. There are breaking changes. Sometimes those changes are so subtle that they don’t break at compile time (or equivalent). In dynamic or other type-spicy languages, a breakage may occur deep into the deploy/run cycle.

Where we’ve got useful behavioural assertions for all our code, we can experiment with library updates on the basis of the tests passing.

We Write More Tests and Get Better at it

If we MUST write tests we have a choice:

  • Pay homage to the Testing Gods because the boss says so
  • Embrace testing and what it can do for us and make writing tests effective and fun

I’ll be honest with you. I’ve seen teams, especially off shore vendors, do a ton of the first. I’ve also seen people apply testing AFTER coding, on code which doesn’t like being tested, and try to do the second, but achieve the first.

My hope is that teams will tire of bad ways of doing things, especially if they’re not easy, and gravitate towards effective methods which yield actual benefits and are worth the time put into them. That you’re reading this far into this post suggests you may be of that category and are looking for ways to achieve more effective unit testing.

Keep trying.

The Code Coverage Tool Helps Reveal Surprises

At extreme coverage, we end up with a feedback loop from the code coverage tool.

Let’s say I write code tests first with a TDD tool that blows up if coverage drops a microscopic amount. My first feedback loop is the test which goes red. It should go red, and if it doesn’t then that’s a surprise. My next feedback loop is when I write just enough code to make it go green and it goes green. The last feedback loop is the code coverage tool not exploding owing to a drop in coverage.

If there’s a drop in coverage when I thought I was testing the code as I wrote it, then there’s been a surprise in the tests or the implementation. Such surprises are good places to investigate for bugs.

We’ll Tire of Culting and “Do It Properly”

I’ve touched on this already, but let’s look at it under the spotlight.

If you both enforce the practice and inspire the aspiration to get the benefits through better ways of achieving the metric, there’s a greater than 50% chance that people will converge on making the best use of their efforts.

In this instance “Do It Properly” means:

  • Writing test code without horrible test smells
  • Achieving and protecting the desired behaviour of the system
  • Increasing the confidence around changes within the system not breaking the system
  • Reducing the need for clumsy and fragile higher level integration/system/end-to-end tests
  • Avoiding the punishment for minor code change of a bunch of implementation-heavy unit tests deciding to fail for no good reason
  • Speeding the team up

Conclusion

Culting is a sort of sleepwalking that I don’t really recommend. However, there can be virtuous outcomes of enforcing things that, if done properly, can be constructive practices.

The hope is that the team really wants to get the benefits from their test automation practice, and that the enforcement of rules can act as a temporary set of training wheels to get everyone up to the same high level of realising those benefits.

At 100% code coverage, there’s some small degree of waste, but there’s also an easy boolean standard.

If we don’t cheat into the bargain, then we get better software practice.

2 comments

  1. It’s been pointed out that the addition of mutation testing can help convert lots of test coverage into a meaningful proof that the behaviour is “locked down” by assertions.

Leave a comment