Automated Testing is a Broad Church

I’ve had the pleasure to be working on an automated testing framework recently. This seems to solve a problem that we’ve been having with Cucumber. I will write a more detailed piece on this in the near future, but here’s the elevator pitch.

JUnit and Mockito tests tend to be too technical so we can’t use them as an acceptance test framework directly.

Cucumber is the go-to technology for BDD/ATDD but implementing it can be cumbersome (cucumbersome perhaps? or an encucumbrance? – who knows!?).

In short, if you want the Given/When/Then and documentation friendly features of Cucumber you have to pay the price of using Cucumber.

What’s Wrong with Cucumber?

Nothing necessarily. Once you’re dealing with lots of similarly documented specs, especially if they’re simple, Cucumber can be a real boon.

However, for Cucumber to work you have to phrase your Gherkin right, implying but not implementing the test script. Then you need to write your glue code just write, and then you need one or two tiers of test execution code. This means you may have to cross 3 or 4 layers of software/script in various languages/styles to get to the code which reaches out to the system under test.

This is usually good, until you need to remember the outcome of one step in order to use it to verify a later one. At this point, you’ve no way of clearly putting that into the software layers. It ends up somewhere in the Orchestration or World code. It’s hinted at by the Gherkin and glue-code. It’s obscure and it’s caused entirely by the Achilles heel of Cucumber.

Cucumber’s Achilles Heel

To connect your spec with test execution code you have two degrees of separation. Plaintext Gherkin, used at runtime, plus whichever glue-code is kicking about. For tricky cases, this often obscures the intent of the spec/test implementation.

What can we do?

How about we write tests in Java but use the BDD syntax to structure them and report on them? With this in mind there are a few frameworks that offer just this:

Oleaster and Ginkgo4j both try to be equivalents of Jasmine and Mocha. Please see my post for more on these and other BDD frameworks in Java.

I have been working on Spectrum with its founder, Greg Haskins. In the current live release, there’s support for Jasmine/Mocha/RSpec like tests. In the next release (soon) there will be decent support for Gherkin syntax, and some rather neat ways of weaving in your favourite JUnit test frameworks (Spring, Mockito etc) via JUnit Rules.

Have a look.

The Right Test for the Right Job

Success comes not from finding the right tool, but from using the right tool for the job at hand. Where Cucumber succeeds, you should use it. It’s very helpful. Once it gets hard, change tactics.

Posted in Java, tdd

Obvious Mistakes

As a team leader who also writes code I have to worry about code several times over.

  • The coding standards we adhere to – they must be disciplined but not overbearing or pernickety
  • Every line of code the team writes – the objective is a decent product made a different way. While peer review is the way to ensure everyone takes collective pride in the work, the tech lead doesn’t get to stop worrying.
  • Every line of code I write – what kind of a person doesn’t act in the way they demand in others?

Recently, I’ve been noticing issues in the way we’ve been working. It only takes a few minor cases of letting our standards or techniques slip for our efforts to become counter productive. As team lead, I could point the fingers at the individuals who happened to write lines of code that I came to worry about. I don’t need to. As a contributor to that code, I can find recent examples where, for no reasons other than a combination of bad luck and time pressure, I dropped some balls. I then discovered the effect these minor slips had, and I’d like to confess to them.

Making mistakes is no big issue; learning from them is a great opportunity. I hope others will find this useful.

Name it after the implementation

We had the need for a hashing key which could be easily predicted. We settled on a date with two random digits after it. This would hash well, but you could, for any given date, predict the hash key to within 100, which is easily searchable. So we did a lot of talking about random.

Two bad things happened. This was a consequence of the noun random getting stuck into our discussion, when we were really making a predictable hash key.

Firstly, the code got filled with the word random, which others were asking about – why is random? How will that work?

Secondly, we made the code depend on a random number generator. Given that we were trying to make a hashing function which was going to be used for persistence, the ad-hoc random number generator, coupled with whatever Java happens to do with hashCode implementations, could best be described as something which works that way for now

All of this was a consequence of thinking random first, rather than seeing that statistically those two digits would be random, but we were trying to make a stable hash key. In the end, I switched it to be a two digit sample from an MD5 hash. This I tested for statistical variability and it was fine.

Human-friendly complexities

What’s the best way to represent a day code for computing? We ended up with YYYYMMDD as an 8 digit number. That’s definitely a day. Isn’t it? My mistake was to try to process this as a number. Given all the days between 20161101 and 20161202, you can just increment the day number, right?

Clearly not.

20161130 leads to 20161131 (?) and then 20161142 and 20161199 – these are valid numbers but they’re not valid days.

Where in a later part of this article I’m going to argue the opposite, it was clear at this point, that the unnecessarily human-friendly more complex-to-constrain number would have, if kept, led to more code around it to manipulate it. Totally misleading. I zapped it and replaced it with epoch day.

What unit tests?

I promise you. I practice test-first development. I do it a lot. I teach it. I believe in it. I’ve used it to help me out of situations where I couldn’t get something to work, and the high discipline incremental nature of it has given me revelations.

I jumped into a system wide refactor, algorithm change, and rework of some core code. Did I run the tests? Apparently not. I didn’t feel I had time, or wanted to make time, to run the tests – surely what I was doing was just going to work, right? I had all the answers in my head.

In fairness to me. I didn’t get too far out… but then I actually ran the tests. That’s when I discovered the problem below about typing. My code worked in seconds, other code worked in milliseconds. They don’t match. It didn’t work. It would have worked.

At this point, I felt very privileged to work with a team who have been trying to adopt the principle that I forgot for a couple of rabid hours. One of them had written some brilliant step by step unit tests for each feature. I got them to work one after the other by fixing the code and they guided me to my destination perfectly. Just think how much use they would have been 4 hours sooner!

Strong typing replaced by…

I wanted to represent a search time range. I had an object which happened to store the time internally as a long. This is fine. It serialises. Long is a common way to store time. It’s milliseconds, right? Or seconds? Epoch seconds? In UTC? Or epoch milliseconds. Surely it’s encapsulated…?

A friend of mine complained that we weren’t always using strong typing, we were something using String typing – where the info is just sitting there in the string in the right form if you know how to interpret it. What of abstraction and encapsulation? In the above situation, we didn’t have strong types, we had Long types.

Why was this an issue? Just choose milliseconds or seconds and it would work again? Two other things:

  • Occasional use of Joda time to help with the object
  • A module which manipulated the inputs to put into a time range object because it knew how to do things in milliseconds

I felt shocked. Here we are with Java 8 and we’re trying to operate on raw numbers from outside a class which is all about time in an environment where there’s the finest time library in the world baked into the language!

This was a poor design decision of mine taken literally by the team around me. I refactored my way out of it by introducing an external interface entirely composed of Java 8 time classes. The internals of the class remained as Long because that serialises in a compact way. The outside world was not allowed to push Long values in any more and a number of helper methods were absorbed into the time range class itself using the ask, don’t tell principle.

Users of this weakly typed object are left now with no doubt how to interpret what it means.

Unrealistic tests

The above weak typing is ameliorated if there are some decent examples of real world usage in the unit tests. If, however, you’re just dealing with simple values, you may be tempted to write tests in terms of unrealistic numbers like 1234, or 0. These can test for things like equality and comparison, but they also give no indication of what sorts of real-world usage might happen and whether the code would work predictably with real world numbers.

For example, if I tested my YYYYMMDD day calculator purely in terms of numbers like 11111111, I would not notice that there appear to be 70 days between the end of November and early December. 

In many cases, there’s no such thing as a good or bad input to an algorithm (I do a lot of property bag tests using the String Jim). However, if you have an abstraction and you don’t test it with real world inputs, you’re missing the tests document the code opportunity and may be sitting on it only works in theory problems.

Code bomb

There are two ways to interpret it when someone makes a lot of code and passes it to you. On the one hand, they’ve just delivered something of value that you should be able to use. On the other hand, it can feel like a hit and run. The code may or may not be fit for purpose. It may or may not have nuances that you can understand right away. It may be a boost or a few hours of head scratching waiting to happen.

On the whole, parachuting your code onto someone else’s to do list is definitely something to do with caution. I know what it’s like to see people really get a productivity boost from having an answer handed to them. I know what it’s like for some unimportant detail in that to steal time. I know how it feels receiving a batch of someone else’s incomplete work. In short, it’s a thing I’d like to see less of.

Documentation is for Wimps

My firmly held view is that documentation is not to be an input to development, but more of an output. I don’t think that technical documentation is innately valuable, especially where it can get out of date with the code. However, you need to leave something for the next person who needs to be able to use what you’ve made. That next person may well be you!

I value:

  • JavaDoc – public APIs to have their semantics described
  • Code review – just accounting for your changes to someone else in a discussion, especially where you comment on your own code to explain why you did it that way, it can really help you see your work from the outside and make some last minute refinements or simplifications
  • High level diagrams – if you can’t draw a diagram of your system on the back of a beermat you don’t understand it – if you can create high level diagrams as part of development then that is very helpful. They seldom get that far out of date. Extra points if you can get the diagrams to be generated automatically as an output of your work.
  • User guide – if you make a feature but it relies on the developer knowing exactly where to find it, and the exact semantics of using it.. well, you’ve failed. There should be a human-friendly interface, some of that may be a start-up script or a how to document, or just a well documented public API entry point.

I’ve not been strict enough regarding documentation. I really don’t want to force people into being authors… but the definition of done has documentation mentioned and I’ve been less focused on where the minimum threshold actually is. One of my mistakes came about because I had no documentation to guide me.


In the cut and thrust of development, it’s no surprise that sometimes one’s standards slip. The aim should be to commit to work which can be achieved at a sustainable pace. That’s no guarantee that there won’t be blips. The simple provable truth, though, is that dropping discipline when under pressure more often results in a spiral of rework as the poorer techniques appear to be less effort, but result in more confusion and rework.

I’ve reminded myself of a few useful points here. I hope others find this useful too.

Posted in Uncategorized

So you want to write BDD tests?

Before you can write BDD tests, you probably need to know what they are. Short answer – BDD tests try to black-box test something according to its behaviour. One of the most common languages is Gherkin. In principle, Gherkin is a natural language based technique for describing feature and scenarios in terms of how they appear to an observer. You’ll know it’s Gherkin if you see a lot of:

  • Feature – to describe a capability
  • Scenario – to describe an example of a feature
  • Given – setup
  • When – execution
  • Then – expectations

Other BDD specification languages also exist. RSpec style test frameworks tend to use:

  • Describe – to describe a capability in terms of its behaviour
  • It – to describe a scenario
  • Expect – to verify things during testing that scenario

RSpec is Ruby based. It has a JavaScript counterpart – Jasmine. Jasmine has proved brilliant, especially when used in conjunction with Karma for testing Angular and other bits of JavaScript.

This post is concerned with Java and JVM. There are a cavalcade of possible tools you can use for BDD testing on the JVM. Here’s a quick round-up of tools I’ve heard of:

  • Cucumber JVM – one of the leading products – based on Gherkin – uses plaintext feature files and wires up test code using reflection and regular expressions
  • JBehave – also based on Gherkin – similar feature set to Cucumber JVM
  • Spock – this relies on Groovy and feels like a blend of code and script – it has Gherkin-like when and then labels.
  • Oleaster – this uses RSpec/Jasmine like syntax and requires Java 8. It includes a port of the “expect” framework you’d find in Jasmine.
  • Spectrum – this is intended to be a Polyglot and Principle of Least Surprise framework. It uses RSpec/Jasmine syntax and also supports Gherkin syntax, all expressed in Java 8. No expectations framework is supplied as you can take your pick of JUnit, AssertJ, Hamcrest‘s ones as you prefer.
  • Ginkgo4j – a port of Ginkgo to Java, including direct support for Spring. Ginkgo seems to be very similar to RSpec in its syntax.
  • JGiven – this takes a completely different approach, encouraging you to create your own DSL for using in the tests.
  • ColaTests – a Gherkin based JUnit runner where you write the steps and tests directly in the test class in Java annotated with Gherkin syntax.
  • Specsy – intended for Scala, this can also be used a Java 8 lambda-based test framework, with examples and support for Groovy and Scala to boot. It’s very lightweight and hugely supports parallel testing and control of sharing state between tests.

The above is intended to be a rundown of tools that are out there – please comment with any omissions or errors and I will try to update the list.

Full disclosure – I’ve contributed to the Spectrum framework.

Posted in bdd, Java, tdd

How to make your Unit Tests harder

This is written about JUnit in Java, but much of this applies to other test frameworks. I’m going to tell you a bunch of ways to screw up your tests. You can probably guess how to write better ones – do the exact opposite.

Unit testing should be easy. It should be resistant to unimportant change and it should be sensitive to important change. Unit test frameworks like JUnit and Mockito make it easy to write test cases, assertions and mocks. So how can you make all the testing mistakes to make you question why you bothered writing tests in the first place?

Here’s how.

Ignore the entry and exit state of a test

If you totally avoid worrying about who has to put the objects or resources into the right state before a test, and what state those objects or resources will be in afterwards, then you can guarantee that your tests will only run successfully if nobody changes the order of anything or shares those same resources for future tests.

Example – one test unzips a file into a temp directory and another one uses that same file since it’s probably there already.

Example – we use something like Spring to build our context and then do some stateful things with the beans, assuming no other test minds about the state change.

Manage Temporary Files Ourselves

Why use things like JUnit’s TemporaryFolder rule when you can just write to local file system with your own ad-hoc techniques for writing temporary files. Even better, why not use the src folder and its descendents for keeping these files – what could possibly go wrong? Don’t worry we can stop these temp files from checking in with a suitable gitignore file, so really? What’s the harm? Apart from the fact that every developer’s machine will be telling them that their temp files are actually fixed resources that are part of every workspace always…

Mock a POJO

Yeah. Mock simple objects. The simpler they are, the more you can really mock them. Sure, your map may have a putter and a getter, but just mock the calls to the getter. For goodness sake, don’t just instantiate an object with the right values in it and use it.

Make a function into an object and mock it

Spring lets you turn everything into a bean. This means that discrete functions can be inside beans, with interfaces, and then mocked. While there may be reasonable points where the ecosystem is so chaotic that this is actually a good thing to lock down for a test, why not do it always? Then you can have unrealistic data examples flowing through complex chains of instantiation and dependency injection, rather than have a nice static function whose behaviour you can easily predict and whose presence will allow you to focus entirely on the input data to the test, rather than all the bits of micro implementation you have to mock to make the test run.

Make your tests a mirror of the implementaton

You may even need to paste bits of the implementation code into the test to be able to successfully predict every last value that flows through every microscopic node of your code…

Only ever whitebox test based on implementation

Ignore the basics of “what’s the behaviour to the outsider?” and make every test a deep dive with god-like knowledge of how the whole implementation works, so that any small change in that implementation requires test rewriting.

Never change your implementation to make it more testable

If that class does so much that it’s hard to test, then work your backside off to make the test that’s hard to do, rather than find an easier test boundary by moving a few responsibilities around.

In Conclusion

Make testing harder so you can keep yourself busier and less successful more slowly!

Posted in Uncategorized

How to Learn a Language

I received some marketing email relating to a Java programming course from The email contained this (probably made up) horror story.

so I’ve been learning java for over 3 years. and I’ve given up due to feelings over sadness because of the sheer size of the language as well as every time I learn, I always feel like the resource I’m using isn’t the best and I end up switching (this has been going on for a long time). That has led me to not gaining any actual knowledge in Java, all I know is basic syntax and a little about classes and methods. I am currently using Head First Java and John Purcell’s Cave of Programming courses to learn while also building my own projects, but even now I feel like I’m not using the best resource available and I just want to give up. I am so confused and I feel like I won’t ever get this language down to a solid level of understanding. I’m very lost.

The conclusion, unsurprisingly, was that the course on offer would help you get beyond this sort of issue. However, the answer is a lot simpler than “go on yet another course”. The biggest clue for this person’s problem is in this part of the quote:

I end up switching (this has been going on for a long time). That has led me to not gaining any actual knowledge in Java

If you are going to make any progress in any technology you need to do one thing. So this is my…

One sure-fire trick to get you to understand any programming language

Build something non-trivial using one language with one set of frameworks/libraries. Start with a pre-cooked example, turn that into your full blown application and finish the thing.

You’ll have to pick up the skills you need to do this along the way, and having set a technology choice in stone, you’ll build a body of knowledge, rather than thrash within a swirling void of possibilities.

It’s that easy.

It’s also that hard.

Posted in Uncategorized

A SureFire Classpath Fail!

Sadly, even the best plugins can work against you. We’ve been having a problem recently with Spring, Maven and SureFire. Specifically, it seemed that some of our unit tests ran absolutely fine with Spring and properties files when run via Eclipse, but got different behaviour when run via Maven. Even more specifically, the workaround seemed to be to duplicate values from .properties files that were in one jar file by putting them into a .properties file in the module whose tests were failing under Maven – usually by adding more properties to files in src/test/resources.

What this seemed to imply was that the classpath when running via eclipse, was set in such a way that the full module dependency structure was harnessed when looking for properties to evaluate in the @Value(“${someproperty}”) annotations, but was not being utilised at test time by Maven. Notably, these properties were used from the correct places at runtime in production.

Aaagh. What undocumented hell is this?

The solution is quite simple. You must not use the system class path in surefire if you want this to work. Here’s a snippet from our surefire configuration in the Maven pom.xml file:


As you can see, we’ve set this key to false. Now everything works. That’s it. You don’t want a default classloader: you want the one with all your dependencies in. That allows the Spring class path based dependencies to work correctly.

There was nothing about this on StackOverflow and it took months before we found the answer. Here it is now for posterity.

Posted in Uncategorized

When you want to disable a slow running test

A quick one. The code for this is here on GitHub.

When testing your software with automated unit tests, you want to be thorough to get code coverage and genuinely cover system behaviour, yet some of the things you want to test can take a fair bit of time to run through tests.

This poses a dilemma:

  • Make the test feedback process as quick as possible
  • Make the test process as thorough as possible

The best answer is to organise your code so that it can be thoroughly tested fast. If that’s not possible you may need a multi-tier automated test. The quick things can be tested straight away with a first-line of feedback, ideally within 5 minutes. Slower things should follow, with the whole end-to-end feedback in under 20 minutes if possible.

This leads to a dilemma – how and when do we disable long-running tests? And, given that we’re often using a combination of maven and within-IDE test runners, how can we disable a test in all contexts EXCEPT the one where it should be run. In my view moving a long-running test into the integration tests, just because you don’t like it, isn’t a great solution. Worse than that, you can jump through hoops trying to break things into categories, only to find that your IDE test runner is running them anyway.

A solution that worked for us recently was to disable certain JUnit tests based on a system property. If the property is not set, the test just doesn’t run. Simple as that. This can be done easily in JUnit by an assumption failure. More simple than that there’s the use of a JUnit rule to take the whole test class and make all of its tests abort if the system property isn’t set.

The code on GitHub is the JUnit rule I created. It is illustrated by this unit test:

 * Quick example of how to use this using a simpler test than that used to verify
 * that the mechanism works
 * The test within this class will only work if the system property "run.test" is set to "true"
 * To demo this running try setting the JVM option -Drun.test=true, which you can do in the maven
 * command line, or via your IDE's invocation of the junit runner
public class ExampleTest {
	public RequireSystemProperty property = new RequireSystemProperty("run.test", "true");
	public void systemPropertyDependentTest() {
		// do some sort of test we only want to do when a system property is set...

As you can see – the presence of this rule effectively stops that test running unless you specifically configure the JVM that’s running it with the system property that would activate it. We use this on our CI server in the integration test build where we enable everything and the build takes maybe 12-15 minutes. On the front-line CI build, we want the build time to be about 5 minutes, so we don’t run integration tests, or code coverage tools, or slow-running things.

Hope this is useful.

Tagged with: , ,
Posted in Uncategorized