Automate or Script the Extract

Let’s cut to the chase. Where possible, create scripts and ideally full automation to perform data extraction for the purposes of acceptance testing.

While you may think you can get away with a grand manual extraction at some point in the project, this is frequently not what happens.

Perfecting Acceptance on Replatforming

We’ve had the benefit of a super user who asks the right questions. Such users make great product owners, especially in a situation where a team is replatforming software they didn’t right, with the aim of making it better without changing the business outcomes.

In the absence of the right unit tests and other test automation, a key technique is to get enough representative samples of real world use cases, paired with their current outcomes, to use as a possible acceptance test. If it gets the same answer for all these, surely it’s right.

This exact same problem occurs when doing any project that’s intended to operate over existing business data, whether there’s a previous system to pull example correct answers from or not.

A Manual Extract Makes us Reluctant

Imagine we did the up front extract the hard way and then the expert starts asking questions of it. How does it break down into major types? What if we tried to bias it so that it had lots of use cases of different types in, rather than the usual balance of production data? What if we chose a different time period to sample?

In all these cases, the manual extraction team tries to justify that their hard-won test set is the best one to move forward with; no need to waste time redoing work that was done.

That’s not always the right outcome for the stakeholder, and it’s not necessarily the right outcome for the project. It’s just a line of least resistance when it’s costly to get a new data set.

When Things Change

The more fixed the extracted data seems, the less flexible we are to change in requirements. Perhaps, during the project, the production workflow has changed. Even short projects can be affected by seasonal changes in usage, or new business rules coming in.

In the case of a replatforming project, it’s not uncommon to discover a major production bug in the legacy system, which needs fixing before the replatforming, and needs regression testing in the new version to.

All Roads Lead to Cheap Extraction

If part of the first extract is to find malleable ways to do lots of extracts repeatably (without going crazy and making a generic extract-anything tool), then all of the above concerns go away.

We embrace a fresh data set, or tweaking the parameters to rebalance our existing data set. We find it easy to produce metrics about the data we’re extracting. We consider it part of the solved problem space, to refresh the test data set, and we avoid over committing to an initial theory about our data which is getting more and more invalid as the project proceeds.

How to Extract Test Data

Start with a bunch of well-written SQL queries in a README file. If the tooling to turn that into useful inputs isn’t available, then buy or build it. If it needs composing into a regular test process, then choose the tooling wisely to avoid deployment problems. Make the data both easily machine and human readable for ease.

Above all else, don’t hold a gun to your own head with a weird data standard you don’t need.

One comment

  1. Another personal preference is to not re-invent the wheel. I tend to use tools ( NetMRI, SolarWinds, etc.) that can run a bunch of show commands across devices, outputting a zip of the results, one file per device. With echoing of each show command and predictable separators between blocks to facilitate locating or extracting show blocks. Those tools already have the device inventory and credentials in them. By doing that, my scripting can focus on analysis of the data. I also don’t have to leave my laptop at the customer site overnight to gather the raw data. Where appropriate, my Python (previously PERL) script output sometimes ends up being tab separated, because that imports easily into Excel for sorting on different columns, etc. The point here is to expend my limited coding time getting the job done, leveraging other tools where possible.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s