27 May 2011

What's in a number?


Imagine this discussion.


Developer 1: In the past week our code coverage raised from x% to y%


Developer 2: That's pretty good. There was is still quite a bit of dead code after the recent refactoring so I think we'll get to z% by next week.


Developer 3: I still think we should aim for 100%.


User/Analyst/Sponsor: What does that mean?


Developer 1: It shows how much of the whole code base we can actually test as part of our build process.


User/Analyst/Sponsor: Ok, and why is that important?


Developer 2: It's important to know how much is not covered.



User/Analyst/Sponsor: Oh, so you know how much more you have to work before you get to 100%


Developer 3: Well, not exactly. Our policy is to aim for z%


User/Analyst/Sponsor: Hmmm, and what's the use of z%? What does that mean to us?


Developers: What does it mean to you? Well it means that... hmm... that you know... it means... well... er...




That's not quite my direct experience, but I recently had to explain code coverage to non-coders and I realised how little thought some developers have put into the practical importance of code coverage outside the developers' domain.

I think most developers should be familiar with the concept of code coverage, which is simply the number of lines of code that get executed as part of the test suite expressed as a percentage of the total number of lines of code written. Full coverage (100%) means that my tests manage to execute (cover) all of my code. Less than full coverage means that there is some code that is not being tested. I'm not just talking about unit tests: I am also including integration tests and automated UI tests into the picture.

There have been countless debates on how much coverage is good coverage, including various tools and methodologies, so I won't go into a lot of detail here and I don't want to entertain anyone with yet another post on the miracles of test-driven development (TDD). The main concern here is that we cannot measure how "important" a line of code is with respect to another line of code. If we could, we would be able to come up with an "ideal" coverage target and be happy with that. Unfortunately in practice when I have anything less than full coverage I cannot tell whether the "uncovered" code is going to eventually bite me with a vengeance, or whether it will ever get noticed at all. It all depends on usage.

What's the meaning of an x% code coverage to me as a developer? In all practical terms, if it's less than 100%, then it means nothing. So either I strive for full coverage, or why should I bother? That is the basis upon which code coverage sometimes is ignored altogether, or is used only to produce colorful block charts for project status reports.


Now what does that mean to testers? Testers usually understand the concept of code coverage and many are wise enough to avoid taking code coverage as a measure of quality of deliverables. Why is that? Well, quite simply, my automated tests might well test 100% of my code base, but that doesn't mean they actually test the right thing in the correct way.


So what does code coverage means to me as a tester? In general terms, a code coverage of x% means that, for every new feature I add to the product, or every bug I fix, I have roughly an x% degree of confidence that I am not going to break existing features or resurrect old bugs (provided my tests actually test behavior - not just classes and methods - and that each new bug is reproduced with a test before it gets fixed).

But what's the meaning of it to users and analysts? When they see the regular progress report on the system being developed, what information does code coverage convey to them?


The meaning of code coverage to me as a user is actually more important than one might think. Roughly speaking, a code coverage of x% means that with every action I perform on the system, there is roughly a (1-x)% probability that something will go wrong.
To put it slightly differently (or perhaps more precisely), an x% code coverage means there is roughly an x% probability that my interaction with the system produces the expected result because that's what it was programmed to do. Conversely, there is roughly a (1-x)% probability that my interaction with the system produces the expected result by chance.   The degree of criticality of the system then dictates the degree of acceptance or tolerance for failure (or tolerance to chance), which we can match against code coverage.