I thought I’d put together a catalogue of testing smells (having discussed making tests difficult in the past). There may be others, please feel free to comment on them. While I may focus on Java (and possibly occasional JS) examples, these should be pretty universal.
Here are some smells I’ve found:
- Everything is a property – where a test class keeps what should be temporary variables in instance variables
- Missing parameterised test – when you did it the long way round because you didn’t bring in parameterisation
- Test body is somewhere else – when the test method calls another method entirely with no other implementation in the test method – often a sign of missing parameterised test
- Two for the price of one – sometimes a sign of missing parameterised tests – where a single test case is testing two use cases with the same set up
- Integration test, masquerading as unit test – where there are too many layers involved in making a unit test, so it runs too long
- The Parasite – a test which should be written stand-alone, but depends on the running of a previous test
- Curdled Test Fixtures – where there’s an inappropriate union of tests in the same fixture, or splitting into multiple fixtures where one would be better
- Where Does This One Go? – similar to the Curdled Test Fixture, this is caused by many tests having the same entry point into the software, even though they represent different use cases. It can be a symptom of using integration-level tests to test low-level things, or weakly defining the rules for each test fixture.
Setup and Teardown Patterns
Consider the different types of test resource and how they can be misused:
- The First and Last Rites – where there’s some ritual/boilerplate at the start and end of most test bodies, suggesting a lack of common setup/teardown code
- Oversharing on setup – where every test sets up a lot of shared data which only some tests need
- Share the world – where the test sets up all its resources at the start of the fixture, leading to either bleeding of state between tests, or extra work to keep things tidy
- Test setup is somewhere else – where the test method just does the assertions, not the given/when part; this can be acceptable in the case of several tests on a single shared expensive resource setup, but seldom is at other times
- Well, My Setup Works – a test does not share enough of the setup code used in production, so the test setup can deviate from the production code meaningfully, or at best duplicates production code unnecessarily
- Herp Derp – words and comments in test code or names that add nothing, like
- Hidden Meaning – where something that should be part of the execution of the test, and appear in a test report, is hidden in a comment – essentially comment instead of name
- Over refactoring of tests – where you can’t read them because they’ve been DRYed out to death
- Boilerplate hell – where you can’t read the test because there’s so much code, perhaps a case of missing test data factory
- Absence of why – where the code of the test just IS and does nothing to explain the use case
- Half a helper method – where there’s a utility method to help a test do its job, yet all calls to it are immediately followed by the exact same code. This is because the method is only doing half the job it should, so your test has more boilerplate in it.
- What are we Testing? – where the test data, or the way we produce it, is not self-explanatory for the use case. This is the general case of many of the below smells, but also includes how test data is represented in the code.
- Second guess the calculation – where rather than using concrete test data, we use something that needs us to calculate the correct answer ahead of assertion
- Missing test data factory – where every test has its own way of making the same test example data
- Unworldly test data – where the test data is in a different style to real-world data e.g. time processing based on epoch milliseconds near 0, rather than on sensible timestamps that would be used in the real world
- Ground zero – where the lack of testing with 0 is the source of a lot of bugs.
- It looks right to me – where the test data for negative cases makes the test hard to understand
- Invalid test data – when the test data would not be valid if used in real life – does this make the test invalid or not?
- Wheel of fortune – where random values in the test can lead to error – see also It Passed Yesterday
- Identity Dodgems – where each test case shares some sort of global resource – perhaps a database, or a singleton data store, so needs to choose identifiers carefully in order to avoid collisions with other tests. Better in this case to use a central ID generator, to avoid accidental collisions, or each test having to be aware of all other tests’ choice of IDs.
- Chatty logging – often a substitute for self-explanatory assertions or well defined test names, the test writes lots of data to the console or logs in order to explain test failures outside of the assertions.
- Over exertion assertion – where the implementation of an assertion is heavy and in the body of the test, rather than in an assertion library
- Bumbling assertions – where there was a more articulate assertion available, but we chose a less sophisticated one and kind of got the message across. E.g. testing exceptions the hard way, or using equality check on list size, rather than a list size assertion.
- Assertion diversion – where the wrong sort of assert is used, thus making a test failure harder to understand
- Equality Sledgehammer Assertion – c.f. Assert The World – where the interesting behaviour we are trying to prove is a subset of asserting the equality of everything, for example just knowing the count would be enough, but we assert all values in all rows – don’t go too far, through, and end up with a Blinkered Assertion – the likely cause of this smell is lack of imagination when writing an assertion and just landing on equals
- Celery data – usually quite Stringy – where the data read from the system under test is in a format which is hard to make meaningful assertions on – for example raw JSON Strings.
- Conditional assertions – potentially a case of over exertion or diversion – the choice of assertion in a test appears to be a runtime choice, leading to tests whose objectives are harder to understand/guarantee.
- Fuzzy assertions – where lack of control for the system under test, causes us not to be able to predict the exact outcome, leading to fuzzy or partial matching in our assertions
- Accidental test framework – related to over exertion asserts, where there’s an ad-hoc bit of what should be a test library, this also includes home-made shallow implementations for deep problems like managing resources such as database or file. It also includes manually pumping framework primitives, rather than using the framework as a whole.
- Assertion Chorus – aka missing custom assertion method – where a series of assertions repetitively perform a long winded routine to test something.
- Over-eager Helper – where there’s a helper method that probes the system and then performs an assertion, rather than return its result for the caller to assert.
- The True Believer – just enough tests to convince the author that the code must surely be right, not that it most likely isn’t wrong
- Assert the world – where the assertions prove everything, even uninteresting stuff.
- Circumstantial evidence – where the assertions are looking at things which are not direct proof of the behaviour
- Happenstance testing – where assertions are trying to lock down implementation details that are not directly important and might validly change, for example the exact wording of an exception
- Blinkered assertions – where the assertions are blind to the fact that the whole answer is wrong, because they’re focusing on a subset of the detail.
Mocks and Hooks Madness
- Overmocking – where tests are testing situations that are guaranteed to pass as they’re whitebox tested against perfect mocks that do not indicate anything to do with reality. See also How Mocks Ruin Everything.
- Mock madness – where even near-primitive values like POJOs are being mocked, just because.
- Making a mockery of design – where pure functions have to be dependency injected so they can be mocked.
- Remote Control Mocking – where a class that depends on a service is tested with those service’s complex dependencies mocked, rather than the service itself being mocked.
- Hooks everywhere – where the production code has awkward backdoors in it to allow test-time rewiring or intercepting – a.k.a. Testing Causes an Abstraction Virus
- The telltale heart – where the production code is repeatedly calculating and returning values that are only used at test time.
- Is There Anybody There? – the flickering test that occasionally breaks a build – bad test or bad code?
- It was like that when I got here – ignoring the preparation of pre and post-test state, leading to all manner of shenanigans.
- Repeatedly re-reading the inputs – where some data that could be made immutable and loaded once is read for every instance of a test
- The painful clean-up – where every test needs to build or clean up an expensive resource, like a database, as the separation of tests is weak, or the test is too large
- I wrote it like this – testing the known implementation rather than the outcome of that implementation.
- Contortionist testing – this is really a design smell. You’re probably adding tests after the code was written and are required to bend over backwards to construct those tests owing to poorly designed code. This especially involves NEEDing to use mocking of static functions or types.
- The Hans Moretti Sword Box – named after The Hans Moretti Sword Box – where the more tests are added to the code, with accompanying testable entry points, the less we can avoid small uncovered pathways of production code.
Please feel free to complain about your own testing smells in the comments below. I plan to flesh out examples of the above in due course.
Other Test Smells resources: