It has been stated, quite accurately, that high code coverage is a weak metric of software quality. Doesn’t that just suck!? We spend all that time measuring code coverage to find where we’re missing out on testing, only to be told that the better it is, the less it could mean!
The good news is that code coverage is a strong metric too. When it’s low, it’s a very good indicator that your code quality is poor. It might not be, but how can you be sure that a team that can’t be bothered to write tests is doing a great job of clean implementation.
So perhaps code coverage is proportional to how much people give a damn about composing great software… unless they’re cheating, or culting around the testing.
Let’s make the assumption that a team has been motivated to writing better and better tests to drive the benefits those tests give them:
- Early warning of bugs introduced by change
- Easy get-it-right-first-time by driving accurate requirements in from tests
- Driving good design decisions in from the outside
- Creating small independent modules which are both easy to test and neat to compose into working software
If you’re doing that, then you’ll get higher code coverage and sensibly higher quality code.
So what if you push it further? What if you aim for 96%+ coverage? Is there anything more to get out of TDD?
First, you’ve probably GOT to do TDD to get that sort of coverage. You won’t be able to easily retrospectively add tests to code that wasn’t born from TDD and get that coverage. Given that TDD drives the benefits above, aiming for huge coverage is a way to force you to do TDD. Win!
Next, if you expect every line and branch to be covered by tests, checked incrementally a test at a time. Nearly every single uncovered line represents a bug. At huge coverage, each red line or branch is a surprise, since surely it was only written to fulfil the needs of a test.
This isn’t just theory. A couple of years back, I build a couple of thousand of lines of code at 97% code coverage in a very short period of time. Nearly every time a line went red through lack of coverage it showed me a mistake I’d made – either a test that wasn’t testing the right thing, or a twist in the logic that wasn’t executing the way I’d intended.
You can get these benefits at lower coverage, and there are some structural things in some languages which are not worth forcing a test to cover, so you can’t guarantee that extreme coverage is quite as high as I’m saying. However, from experience, here’s what I’d say.
Genuinely earned high code coverage is a strong metric of tests adding value.