ETL Testing in an Agile World

by | Nov 13, 2018 | Agile, Data Quality, Data Testing

Embracing agility requires the evolution of ETL testing

The pace of change in business is exceeding the speed at which many teams operate. This problem isn’t limited to trendy silicon valley shops. It exists in industries not necessarily known for their lightning pace like financial services, insurance, and healthcare. To address this problem, many organizations mandate Agile transformations. To “be agile” is to generate fast feedback so we can learn quickly and avoid lost time pursuing the wrong course. But, agility is more than merely learning and changing course. It is reducing the time between the two and minimizing the impact of the course change.

Yet, ETL testing can be a lengthy process involving significant manual setup and testing for each separate business need. The clear chasm between this reality and the agile mandate leaves many ETL testers alarmed and unsure how to comply.

Data is both a strategic asset and a growing problem

Data is fast becoming the currency of the 21st century. It’s hard to find a business still in operation that doesn’t need data to stay in business. In a 2018 data quality study, Experian found that “95% of C-level executives believe that data is an integral part of forming their business strategy.” That’s up a whopping 15% from the same study last year. This means that ETL processes are more important than ever.

Despite data’s growing importance, we continue to see news headlines about data security failures and glaring data inaccuracies. This isn’t sensationalism for the sake of filling 24-hour news cycles. Experian’s study shared that C-level executives shared similar grave concerns:

  • 89% say that inaccurate data undermines their ability to give their customers the experience they crave.
  • 84% say that a sustained growth in the sheer volume of data makes it difficult to meet regulatory obligations.

Is your (lack of) ETL testing killing your data quality?

For organizations to realize their strategic goals, we must mitigate data quality and management problems. The most obvious place to look is at your ETL processes.

While it is possible that the source data is incorrect, it’s more likely that your ETL transformations are introducing errors.  The probability increases with the number of transformations you have. The fix isn’t to avoid touching the data but, rather, find ways to make these necessary transformations safer by finding problems sooner. The way to do this is through frequent and thorough testing of ETL applications.

Work small and test often

When you think about ETL, speed and agility probably aren’t the first words that come to mind, but they can be. First, break down work into small, valuable pieces, each with their own specific test cases.

Once those tests pass, run broader regression tests to ensure that the new code integrates well with everything else. This frequent, broad testing can prevent future data inaccuracies that can plague your organization. But, if you haven’t noticed, that’s a lot of testing!

If I were to walk over to a group of ETL testers and give them a mandate to test more often, they’d rail at me about how labor intensive it is to do all the manual testing required. They’d share how they can barely meet their commitments as it is and that they often have to do only the minimal amount of testing possible to meet compressed deadlines.

“Manual testing under compressed deadlines is the leading cause of data quality problems,” says Compact Solutions CEO Pankaj Agrawal.

In my experience, compressed deadlines result, primarily, in the compression the later stages of a process. You see teams speed through development, making too many temporary choices and accruing work that needs to be done later when other things are a priority. Even worse, you see a reduction in the amount of testing performed, sometimes significantly, or skipping it altogether, resulting in too many post-deployment issues that could have been avoided. If we wonder why we have so much executive concern, here’s your answer. Fortunately, these situations are easily prevented with better ways of working, starting with doing smaller amounts of work more often and making testing easier and faster.

Performing quicker ETL testing is necessary to achieve agility. But, before you give up, take a cue from the DevOps movement and embrace the mantra “if something is hard to do, do it more often.”

Not too long ago, it was as hard to build and deploy software as it is to do ETL testing now. Teams realized that the only way to make fixing the pain a priority was to do it over and over. We don’t need to be masochists to make ETL testing process faster, but we can learn from the DevOps journey and identify key strategies to mitigate our ETL testing pain.

3 strategies for pain-free ETL testing

  • Centrally manage test cases

Repeatability is the cornerstone of testing. If we can repeat our tests in exactly the same way time after time and get the same results, we can have faith that the results are accurate. If every tester maintains their own copy of their test cases, we can’t guarantee that tests will be comparable across different testers. This means we only have confidence in our results if we deal with a single tester. Using tools that allow you to centrally store and manage your test cases is a smart step to build more trust in your ETL test process.

The best ETL tools also let you create and maintain coding standards for your ETL applications. These standards make it easier to move quickly in the development phase and allow teams to rely on lint tests to ensure code meets the standards set by the team.

  • Configure automated tests and triggers

If you try to take all the manual testing in traditional ETL work and  stuff it into an Agile process, you’ll have a sure-fire recipe for disaster. As in the DevOps world, ETL teams must use automation to ease the pain of testing.

If the goal is fast feedback, failing a build definitely meets the criteria! You can’t integrate all your tests into your build process, that would take too long! But, you can integrate your quickest and most crucial ones. Your quickest tests will be those that test your ETL code against your coding standards.

Full regression tests and other test cases that must process data can run asynchronously, perhaps on a nightly basis. Your frequency will depend on the rate of change in your team. Regardless of how often they are run, its crucial for the results to be available to the team quickly.

Smart decoupling your automated test plans allows you to test the right things at the right times and get feedback as quickly as possible.

  • Focus testers on definitions, not scripts

Often, people who test ETL applications have to manually create complex scripts and SQL queries. They then need access to the servers where the scripts need to run and the command-line prowess to execute them. It’s daunting enough for one scenario, but this process repeats for each business need – each time, reflecting the specifics of the systems in use.  It can take time to gain the experience and expertise required to do this well. In other words, ETL testing has become a specialty. This means that it’s difficult for just anyone to join in and contribute. Welcome to the birth of a dependency.

Probabilities show us that, for every dependency we introduce, we halve our chances of delivering on time. Fortunately, there’s a bright side to this – every dependency we remove doubles the chance of delivering on time.

Today’s best ETL testing automation tools allow you to streamline testing by having testers focus on defining expected data inputs, outputs and triggers and letting the tool handle the creation and execution of the ETL scripts themselves. This means more team members can setup testing and there’s less chance of waiting on any one person to make progress. It’s never been easier to improve your chances!

Take the next steps

ETL testing, despite its painful past, has a bright future. Start by taking stock of how your team gets the job done. Take an especially deep look at the testing portion of the process. Are people manually creating test cases and/or storing them locally? Is you ETL testing primarily manual? Does it seem like you’re always bottlenecked because you have to wait on the right specialist? If you answered yes to one or more of those questions, you’ve got some work to do and the right tools can help you take the first steps to agility.

Learn more about automated testing in an agile environment from the people who have decades of experience building and testing ETL applications. Learn about TestDrive or reach out to talk about how TestDrive might be right for you.

Learn how TestDrive improves automated testing in an agile environment. 


One Lincoln Center
18 West 140th W Butterfield Road;
15th Floor
Oakbrook Terrace, IL 60181