Let’s cut to the chase. Where possible, create scripts and ideally full automation to perform data extraction for the purposes of acceptance testing.
While you may think you can get away with a grand manual extraction at some point in the project, this is frequently not what happens.
Perfecting Acceptance on Replatforming
We’ve had the benefit of a super user who asks the right questions. Such users make great product owners, especially in a situation where a team is replatforming software they didn’t right, with the aim of making it better without changing the business outcomes.
In the absence of the right unit tests and other test automation, a key technique is to get enough representative samples of real world use cases, paired with their current outcomes, to use as a possible acceptance test. If it gets the same answer for all these, surely it’s right.
This exact same problem occurs when doing any project that’s intended to operate over existing business data, whether there’s a previous system to pull example correct answers from or not.
A Manual Extract Makes us Reluctant
Imagine we did the up front extract the hard way and then the expert starts asking questions of it. How does it break down into major types? What if we tried to bias it so that it had lots of use cases of different types in, rather than the usual balance of production data? What if we chose a different time period to sample?
In all these cases, the manual extraction team tries to justify that their hard-won test set is the best one to move forward with; no need to waste time redoing work that was done.
That’s not always the right outcome for the stakeholder, and it’s not necessarily the right outcome for the project. It’s just a line of least resistance when it’s costly to get a new data set.
When Things Change
The more fixed the extracted data seems, the less flexible we are to change in requirements. Perhaps, during the project, the production workflow has changed. Even short projects can be affected by seasonal changes in usage, or new business rules coming in.
In the case of a replatforming project, it’s not uncommon to discover a major production bug in the legacy system, which needs fixing before the replatforming, and needs regression testing in the new version to.
All Roads Lead to Cheap Extraction
If part of the first extract is to find malleable ways to do lots of extracts repeatably (without going crazy and making a generic extract-anything tool), then all of the above concerns go away.
We embrace a fresh data set, or tweaking the parameters to rebalance our existing data set. We find it easy to produce metrics about the data we’re extracting. We consider it part of the solved problem space, to refresh the test data set, and we avoid over committing to an initial theory about our data which is getting more and more invalid as the project proceeds.
How to Extract Test Data
Start with a bunch of well-written SQL queries in a README file. If the tooling to turn that into useful inputs isn’t available, then buy or build it. If it needs composing into a regular test process, then choose the tooling wisely to avoid deployment problems. Make the data both easily machine and human readable for ease.
Above all else, don’t hold a gun to your own head with a weird data standard you don’t need.