How Mocks Ruin Everything

Despite being an ardent advocate of TDD, mocking and Mockito (which is totally awesome), I’m going to argue against mocks.

It had to happen. I’ve found really clear situations where mocks have made my work harder, and it’s got to stop!

Why?

When used incorrectly Mocks do not test your code

Yep. Mocks can be the fool’s gold of TDD.

What do good tests achieve?

Back to basics then. What’s it all about?

  • Really exercise the components under test
  • Specify what the component is expected to do
  • Speed up development and fault resolution
  • Prevent waste – because developing without the right tests is probably slower and less efficient overall

Why do we mock?

  • Create boundaries in tests
  • Speed of execution
  • Simulate the hard-to-simulate, especially those pesky boundary cases and errors

What is a mock?

For the purposes of this discussion, let’s bundle in all non-real test objects:

  • Pretend test data
  • Stubs – which receive calls and do nothing much
  • Mocks – which can employ pre-programmed behaviours and responses to inputs, despite being just test objects
  • Fakes – implementations that simulate the real thing, but are made for testing – e.g. an in-memory repository

How can mocks go wrong?

Your mock starts to work against you if:

  • It makes your test implementation focused, rather than behaviour focused
  • Mocking requires some obscure practices
  • Mocking becomes a boilerplate ritual performed before testing

Implementation focused?

You’ll know if you’re doing this. If you do it before writing the code, it feels like trying to write the code backwards, via tests that predict the lines of code you need. If you do it after the fact, you’ll find yourself trying to contrive situations to exercise individual lines or statements, or you’ll find yourself pasting production code into your tests to make a mock behave the way you want.

Obscure

Mocks have the power to bypass the real code, so we may find ourselves using the mocks to generate an alternative reality where things kind of work because the mocks happen to behave in a way which gives a sort of an answer. This seems to happen when the thing you’re mocking is quite complex.

Ritual

If all tests begin with the same pasted code, then there’s something odd about your test refactoring and mocking.

So What’s The Solution?

  • You ARE allowed to use real objects in tests
    • Mock at heavy interface boundaries
    • Refactor your code so more of your real algorithms and services can be used at test time
  • You SHOULD test with real-life data
    • Your fancy date algorithm may work fine with 4th July 2006, but if that’s not the sort of date your users will use, come up with more real life ones
    • Make the test data describe real-world use cases so people can better understand the intent of the module
  • Add mocks when you must
    • Add spies to real objects to simulate hard-to-reach places as a preference to total mocking
  • Consider using fakes
    • Complex behaviour of simple things tests best if you can produce a fake implementation – this might even allow for changes in how the code under tests uses the services it depends on
  • Test-first means that you delay thinking about the implementation
  • Test behaviour, not lines of code

In Conclusion

Test what you mean to test. Write tests first about important behaviour. Try to forget any knowledge about the implementation in the tests. Within reason, be proud to weave together a handful of collaborating objects to spin up a unit test.

Posted in Java, tdd

Mockito Argument Matching in Java 8

Fresh mint in closeup

When using mocks, we often want to check the inputs to a function that was called on the mock. It’s probably a subject for another post whether you should rely on doing this, or whether you should make more code that just returns something, rather than calls something, but let’s agree that you will, at some point, want to check how a mock’s function was called.

With Mockito as our mocking framework of choice, here’s the hard way:

// given some test execution has happened

// construct a fully-blown replica of what you think will have
// been passed into your function under test
SomeObject expectedInput = new SomeObject( ... );

verify(myMock).myMethod(eq(expectedInput));

Why is that hard?

Well, you need to predict a perfect replica of the input, which requires you to bind your test to the exact implementation. In some cases, certain fields, which are less interesting to the behaviour your testing, have to be specified so that the equals comparison works. Worse still, some fields that get arbitrary values at runtime have to have specific values pushed into them to make the test replicatable – timestamps, for instance.

In short, the above doesn’t always work well, so we often resource to the use of argument captors.

// given some test execution has happened

// Find out how the method got called
ArgumentCaptor<SomeObject> captor = ArgumentCaptor.for(SomeObject.class);
verify(myMock).myMethod(captor.capture());

// read the thing you're interested in from the captor
assertThat(captor.getValue().getInterestingProperty(), is("expected"));

Put mildly, this sucks! You have three activities to do with the captor, all of which are to enable you to check one fact. You have to construct one, use one to capture the value, and then read the value from the captor after the Mockito verify method is called. The test also reads oddly. It’s become:

  • When the test was executed
  • Given this captor
  • Check the method was called and capture
  • And check the captured value was expected

Since Mockito 2.1, used with Java 8, there has been a neater technique. You can use the argThat comparator with a verify call to inline your check in the verify method:

// given some test execution has happened

// Find out how the method got called
verify(myMock).myMethod(argThat(
        someObject -> someObject.getInterestingProperty().equals("expected")
    ));

This, I think, makes for a more straightforward test. Verify that the method was called with an argument that matches a certain filter.

The reason this is possible is that Mockito moved away from using Hamcrest internally, replacing it with its own strongly-typed ArgumentMatcher interface, which is essentially a functional interface. This means you can replace it with a lambda.

An extra bonus of this technique is that you can also use argThat within the when or given constructs in Mockito. This means you can neatly specify how your mock will behave based on nuances of the input:

when(myMock.myMethod(argThat(
        someObject -> someObject.getInterestingProperty().equals("expected")
    ))).thenReturn(42);

For more information on this please see the Mockito JavaDoc.

Posted in Java, tdd

Automated Testing is a Broad Church

9831509_orig

I’ve had the pleasure to be working on an automated testing framework recently. This seems to solve a problem that we’ve been having with Cucumber. I will write a more detailed piece on this in the near future, but here’s the elevator pitch.

JUnit and Mockito tests tend to be too technical so we can’t use them as an acceptance test framework directly.

Cucumber is the go-to technology for BDD/ATDD but implementing it can be cumbersome (cucumbersome perhaps? or an encucumbrance? – who knows!?).

In short, if you want the Given/When/Then and documentation friendly features of Cucumber you have to pay the price of using Cucumber.

What’s Wrong with Cucumber?

Nothing necessarily. Once you’re dealing with lots of similarly documented specs, especially if they’re simple, Cucumber can be a real boon.

However, for Cucumber to work you have to phrase your Gherkin right, implying but not implementing the test script. Then you need to write your glue code just write, and then you need one or two tiers of test execution code. This means you may have to cross 3 or 4 layers of software/script in various languages/styles to get to the code which reaches out to the system under test.

This is usually good, until you need to remember the outcome of one step in order to use it to verify a later one. At this point, you’ve no way of clearly putting that into the software layers. It ends up somewhere in the Orchestration or World code. It’s hinted at by the Gherkin and glue-code. It’s obscure and it’s caused entirely by the Achilles heel of Cucumber.

Cucumber’s Achilles Heel

To connect your spec with test execution code you have two degrees of separation. Plaintext Gherkin, used at runtime, plus whichever glue-code is kicking about. For tricky cases, this often obscures the intent of the spec/test implementation.

What can we do?

How about we write tests in Java but use the BDD syntax to structure them and report on them? With this in mind there are a few frameworks that offer just this:

Oleaster and Ginkgo4j both try to be equivalents of Jasmine and Mocha. Please see my post for more on these and other BDD frameworks in Java.

I have been working on Spectrum with its founder, Greg Haskins. In the current live release, there’s support for Jasmine/Mocha/RSpec like tests. In the next release (soon) there will be decent support for Gherkin syntax, and some rather neat ways of weaving in your favourite JUnit test frameworks (Spring, Mockito etc) via JUnit Rules.

Have a look.

The Right Test for the Right Job

Success comes not from finding the right tool, but from using the right tool for the job at hand. Where Cucumber succeeds, you should use it. It’s very helpful. Once it gets hard, change tactics.

Posted in Java, tdd

Obvious Mistakes

As a team leader who also writes code I have to worry about code several times over.

  • The coding standards we adhere to – they must be disciplined but not overbearing or pernickety
  • Every line of code the team writes – the objective is a decent product made a different way. While peer review is the way to ensure everyone takes collective pride in the work, the tech lead doesn’t get to stop worrying.
  • Every line of code I write – what kind of a person doesn’t act in the way they demand in others?

Recently, I’ve been noticing issues in the way we’ve been working. It only takes a few minor cases of letting our standards or techniques slip for our efforts to become counter productive. As team lead, I could point the fingers at the individuals who happened to write lines of code that I came to worry about. I don’t need to. As a contributor to that code, I can find recent examples where, for no reasons other than a combination of bad luck and time pressure, I dropped some balls. I then discovered the effect these minor slips had, and I’d like to confess to them.

Making mistakes is no big issue; learning from them is a great opportunity. I hope others will find this useful.

Name it after the implementation

We had the need for a hashing key which could be easily predicted. We settled on a date with two random digits after it. This would hash well, but you could, for any given date, predict the hash key to within 100, which is easily searchable. So we did a lot of talking about random.

Two bad things happened. This was a consequence of the noun random getting stuck into our discussion, when we were really making a predictable hash key.

Firstly, the code got filled with the word random, which others were asking about – why is random? How will that work?

Secondly, we made the code depend on a random number generator. Given that we were trying to make a hashing function which was going to be used for persistence, the ad-hoc random number generator, coupled with whatever Java happens to do with hashCode implementations, could best be described as something which works that way for now

All of this was a consequence of thinking random first, rather than seeing that statistically those two digits would be random, but we were trying to make a stable hash key. In the end, I switched it to be a two digit sample from an MD5 hash. This I tested for statistical variability and it was fine.

Human-friendly complexities

What’s the best way to represent a day code for computing? We ended up with YYYYMMDD as an 8 digit number. That’s definitely a day. Isn’t it? My mistake was to try to process this as a number. Given all the days between 20161101 and 20161202, you can just increment the day number, right?

Clearly not.

20161130 leads to 20161131 (?) and then 20161142 and 20161199 – these are valid numbers but they’re not valid days.

Where in a later part of this article I’m going to argue the opposite, it was clear at this point, that the unnecessarily human-friendly more complex-to-constrain number would have, if kept, led to more code around it to manipulate it. Totally misleading. I zapped it and replaced it with epoch day.

What unit tests?

I promise you. I practice test-first development. I do it a lot. I teach it. I believe in it. I’ve used it to help me out of situations where I couldn’t get something to work, and the high discipline incremental nature of it has given me revelations.

I jumped into a system wide refactor, algorithm change, and rework of some core code. Did I run the tests? Apparently not. I didn’t feel I had time, or wanted to make time, to run the tests – surely what I was doing was just going to work, right? I had all the answers in my head.

In fairness to me. I didn’t get too far out… but then I actually ran the tests. That’s when I discovered the problem below about typing. My code worked in seconds, other code worked in milliseconds. They don’t match. It didn’t work. It would have worked.

At this point, I felt very privileged to work with a team who have been trying to adopt the principle that I forgot for a couple of rabid hours. One of them had written some brilliant step by step unit tests for each feature. I got them to work one after the other by fixing the code and they guided me to my destination perfectly. Just think how much use they would have been 4 hours sooner!

Strong typing replaced by…

I wanted to represent a search time range. I had an object which happened to store the time internally as a long. This is fine. It serialises. Long is a common way to store time. It’s milliseconds, right? Or seconds? Epoch seconds? In UTC? Or epoch milliseconds. Surely it’s encapsulated…?

A friend of mine complained that we weren’t always using strong typing, we were something using String typing – where the info is just sitting there in the string in the right form if you know how to interpret it. What of abstraction and encapsulation? In the above situation, we didn’t have strong types, we had Long types.

Why was this an issue? Just choose milliseconds or seconds and it would work again? Two other things:

  • Occasional use of Joda time to help with the object
  • A module which manipulated the inputs to put into a time range object because it knew how to do things in milliseconds

I felt shocked. Here we are with Java 8 and we’re trying to operate on raw numbers from outside a class which is all about time in an environment where there’s the finest time library in the world baked into the language!

This was a poor design decision of mine taken literally by the team around me. I refactored my way out of it by introducing an external interface entirely composed of Java 8 time classes. The internals of the class remained as Long because that serialises in a compact way. The outside world was not allowed to push Long values in any more and a number of helper methods were absorbed into the time range class itself using the ask, don’t tell principle.

Users of this weakly typed object are left now with no doubt how to interpret what it means.

Unrealistic tests

The above weak typing is ameliorated if there are some decent examples of real world usage in the unit tests. If, however, you’re just dealing with simple values, you may be tempted to write tests in terms of unrealistic numbers like 1234, or 0. These can test for things like equality and comparison, but they also give no indication of what sorts of real-world usage might happen and whether the code would work predictably with real world numbers.

For example, if I tested my YYYYMMDD day calculator purely in terms of numbers like 11111111, I would not notice that there appear to be 70 days between the end of November and early December. 

In many cases, there’s no such thing as a good or bad input to an algorithm (I do a lot of property bag tests using the String Jim). However, if you have an abstraction and you don’t test it with real world inputs, you’re missing the tests document the code opportunity and may be sitting on it only works in theory problems.

Code bomb

There are two ways to interpret it when someone makes a lot of code and passes it to you. On the one hand, they’ve just delivered something of value that you should be able to use. On the other hand, it can feel like a hit and run. The code may or may not be fit for purpose. It may or may not have nuances that you can understand right away. It may be a boost or a few hours of head scratching waiting to happen.

On the whole, parachuting your code onto someone else’s to do list is definitely something to do with caution. I know what it’s like to see people really get a productivity boost from having an answer handed to them. I know what it’s like for some unimportant detail in that to steal time. I know how it feels receiving a batch of someone else’s incomplete work. In short, it’s a thing I’d like to see less of.

Documentation is for Wimps

My firmly held view is that documentation is not to be an input to development, but more of an output. I don’t think that technical documentation is innately valuable, especially where it can get out of date with the code. However, you need to leave something for the next person who needs to be able to use what you’ve made. That next person may well be you!

I value:

  • JavaDoc – public APIs to have their semantics described
  • Code review – just accounting for your changes to someone else in a discussion, especially where you comment on your own code to explain why you did it that way, it can really help you see your work from the outside and make some last minute refinements or simplifications
  • High level diagrams – if you can’t draw a diagram of your system on the back of a beermat you don’t understand it – if you can create high level diagrams as part of development then that is very helpful. They seldom get that far out of date. Extra points if you can get the diagrams to be generated automatically as an output of your work.
  • User guide – if you make a feature but it relies on the developer knowing exactly where to find it, and the exact semantics of using it.. well, you’ve failed. There should be a human-friendly interface, some of that may be a start-up script or a how to document, or just a well documented public API entry point.

I’ve not been strict enough regarding documentation. I really don’t want to force people into being authors… but the definition of done has documentation mentioned and I’ve been less focused on where the minimum threshold actually is. One of my mistakes came about because I had no documentation to guide me.

Summary

In the cut and thrust of development, it’s no surprise that sometimes one’s standards slip. The aim should be to commit to work which can be achieved at a sustainable pace. That’s no guarantee that there won’t be blips. The simple provable truth, though, is that dropping discipline when under pressure more often results in a spiral of rework as the poorer techniques appear to be less effort, but result in more confusion and rework.

I’ve reminded myself of a few useful points here. I hope others find this useful too.

Posted in Uncategorized

So you want to write BDD tests?

Before you can write BDD tests, you probably need to know what they are. Short answer – BDD tests try to black-box test something according to its behaviour. One of the most common languages is Gherkin. In principle, Gherkin is a natural language based technique for describing feature and scenarios in terms of how they appear to an observer. You’ll know it’s Gherkin if you see a lot of:

  • Feature – to describe a capability
  • Scenario – to describe an example of a feature
  • Given – setup
  • When – execution
  • Then – expectations

Other BDD specification languages also exist. RSpec style test frameworks tend to use:

  • Describe – to describe a capability in terms of its behaviour
  • It – to describe a scenario
  • Expect – to verify things during testing that scenario

RSpec is Ruby based. It has a JavaScript counterpart – Jasmine. Jasmine has proved brilliant, especially when used in conjunction with Karma for testing Angular and other bits of JavaScript.

This post is concerned with Java and JVM. There are a cavalcade of possible tools you can use for BDD testing on the JVM. Here’s a quick round-up of tools I’ve heard of:

  • Cucumber JVM – one of the leading products – based on Gherkin – uses plaintext feature files and wires up test code using reflection and regular expressions
  • JBehave – also based on Gherkin – similar feature set to Cucumber JVM
  • Spock – this relies on Groovy and feels like a blend of code and script – it has Gherkin-like when and then labels.
  • Oleaster – this uses RSpec/Jasmine like syntax and requires Java 8. It includes a port of the “expect” framework you’d find in Jasmine.
  • Spectrum – this is intended to be a Polyglot and Principle of Least Surprise framework. It uses RSpec/Jasmine syntax and also supports Gherkin syntax, all expressed in Java 8. No expectations framework is supplied as you can take your pick of JUnit, AssertJ, Hamcrest‘s ones as you prefer.
  • Ginkgo4j – a port of Ginkgo to Java, including direct support for Spring. Ginkgo seems to be very similar to RSpec in its syntax.
  • JGiven – this takes a completely different approach, encouraging you to create your own DSL for using in the tests.
  • ColaTests – a Gherkin based JUnit runner where you write the steps and tests directly in the test class in Java annotated with Gherkin syntax.
  • Specsy – intended for Scala, this can also be used a Java 8 lambda-based test framework, with examples and support for Groovy and Scala to boot. It’s very lightweight and hugely supports parallel testing and control of sharing state between tests.

The above is intended to be a rundown of tools that are out there – please comment with any omissions or errors and I will try to update the list.

Full disclosure – I’ve contributed to the Spectrum framework.

Posted in bdd, Java, tdd

How to make your Unit Tests harder

This is written about JUnit in Java, but much of this applies to other test frameworks. I’m going to tell you a bunch of ways to screw up your tests. You can probably guess how to write better ones – do the exact opposite.

Unit testing should be easy. It should be resistant to unimportant change and it should be sensitive to important change. Unit test frameworks like JUnit and Mockito make it easy to write test cases, assertions and mocks. So how can you make all the testing mistakes to make you question why you bothered writing tests in the first place?

Here’s how.

Ignore the entry and exit state of a test

If you totally avoid worrying about who has to put the objects or resources into the right state before a test, and what state those objects or resources will be in afterwards, then you can guarantee that your tests will only run successfully if nobody changes the order of anything or shares those same resources for future tests.

Example – one test unzips a file into a temp directory and another one uses that same file since it’s probably there already.

Example – we use something like Spring to build our context and then do some stateful things with the beans, assuming no other test minds about the state change.

Manage Temporary Files Ourselves

Why use things like JUnit’s TemporaryFolder rule when you can just write to local file system with your own ad-hoc techniques for writing temporary files. Even better, why not use the src folder and its descendents for keeping these files – what could possibly go wrong? Don’t worry we can stop these temp files from checking in with a suitable gitignore file, so really? What’s the harm? Apart from the fact that every developer’s machine will be telling them that their temp files are actually fixed resources that are part of every workspace always…

Mock a POJO

Yeah. Mock simple objects. The simpler they are, the more you can really mock them. Sure, your map may have a putter and a getter, but just mock the calls to the getter. For goodness sake, don’t just instantiate an object with the right values in it and use it.

Make a function into an object and mock it

Spring lets you turn everything into a bean. This means that discrete functions can be inside beans, with interfaces, and then mocked. While there may be reasonable points where the ecosystem is so chaotic that this is actually a good thing to lock down for a test, why not do it always? Then you can have unrealistic data examples flowing through complex chains of instantiation and dependency injection, rather than have a nice static function whose behaviour you can easily predict and whose presence will allow you to focus entirely on the input data to the test, rather than all the bits of micro implementation you have to mock to make the test run.

Make your tests a mirror of the implementaton

You may even need to paste bits of the implementation code into the test to be able to successfully predict every last value that flows through every microscopic node of your code…

Only ever whitebox test based on implementation

Ignore the basics of “what’s the behaviour to the outsider?” and make every test a deep dive with god-like knowledge of how the whole implementation works, so that any small change in that implementation requires test rewriting.

Never change your implementation to make it more testable

If that class does so much that it’s hard to test, then work your backside off to make the test that’s hard to do, rather than find an easier test boundary by moving a few responsibilities around.

In Conclusion

Make testing harder so you can keep yourself busier and less successful more slowly!

Posted in Uncategorized

How to Learn a Language

I received some marketing email relating to a Java programming course from JavaCodeGeeks.com. The email contained this (probably made up) horror story.

so I’ve been learning java for over 3 years. and I’ve given up due to feelings over sadness because of the sheer size of the language as well as every time I learn, I always feel like the resource I’m using isn’t the best and I end up switching (this has been going on for a long time). That has led me to not gaining any actual knowledge in Java, all I know is basic syntax and a little about classes and methods. I am currently using Head First Java and John Purcell’s Cave of Programming courses to learn while also building my own projects, but even now I feel like I’m not using the best resource available and I just want to give up. I am so confused and I feel like I won’t ever get this language down to a solid level of understanding. I’m very lost.

The conclusion, unsurprisingly, was that the course on offer would help you get beyond this sort of issue. However, the answer is a lot simpler than “go on yet another course”. The biggest clue for this person’s problem is in this part of the quote:

I end up switching (this has been going on for a long time). That has led me to not gaining any actual knowledge in Java

If you are going to make any progress in any technology you need to do one thing. So this is my…

One sure-fire trick to get you to understand any programming language

Build something non-trivial using one language with one set of frameworks/libraries. Start with a pre-cooked example, turn that into your full blown application and finish the thing.

You’ll have to pick up the skills you need to do this along the way, and having set a technology choice in stone, you’ll build a body of knowledge, rather than thrash within a swirling void of possibilities.

It’s that easy.

It’s also that hard.

Posted in Uncategorized