Wednesday, May 23, 2007

Unit Testing SSIS

SSIS packages are almost like a mini application in their own right. On most occasions there is one or more input items of data that may consist of either a single variable value or an entire data set. This will follow a transformation or validation process before a final output that could again be in a number of different formats.

Unit testing these packages should involve minimal change to the structure or behaviour of the package itself. The risk of influencing the code behaviour through the testing process is as great as an incorrect deployment or missing a particular testing scenario.

The most important factor in the process of testing a package is to understand how it will react in controlled circumstances and be in a position to test the success of the anticipated result. Testing this using the natural output of the package, for the reasons discussed previously, will provide the most robust results.

Due to some of the current debug limitations of SSIS and taking into account the need to keep the package structure and design static; it is only really possible to effectively test the control flow of a package whilst remaining ‘hands off’.

Let's look at a simple package example;

The same type of output would be taken from the other standard control flow tasks. The execute SQL task would have an initial state, a completion state and a resulting success state based on the other outputs. The initial state of n rows in the destination table before and 0 rows after the step has executed. This is measured by examining the row count in the table before and after execution and comparing the value with the expected result. For the archive file system task all states would be measured using a different mechanism and so on and so forth.

Essentially this means that whatever the task there may be numerous methods of gathering the data necessary to confirm whether the test process has been successful. Simplifying the process of measuring the results of a test would make applying a standard testing mechanism far easier to implement.

Currently packages provide the ability to perform some manual logging as I've posted about in past. This can be used to establish whether tasks have completed successfully or not but where a measurement is needed to confirm this, this type of logging is lacking. For example, truncating a table will not provide you with row count confirming the number of rows affected or the subsequent amount of rows left in the table whilst a delete statement would. It would not be wise to change all truncates to deletes to allow this information to bubble up to the logging process and therefore allow capture of the state of the task before and after execution.

There are a number of different ways of trying to perform strict, robust SSIS unit testing of which I've generalised them into 3 options.

First is to create a more robust testing methodology that has processes particularly for unit testing SSIS packages based on the control flow. The success criteria should be based on use cases defined at the package design stage. Testing each of the steps within the control flow of the package with documented pre-execution criteria, expected output and the subsequent result.

This does not lend itself to any kind of automated testing approach and would involve stepping through the package control flow and logging the results.

The second option allows for the automation of tests and the capture of results but would require another application to control the execution of the process and establish pre-execute criteria in addition to measuring the post-execute result.

Using the unique identifier for each of the control flow tasks, during the pre execute and post execute process of each step a lookup would be made to either a database or configuration file containing the method call that would initiate the processes necessary to prepare the execution and subsequently validate and measure the success on post execute.

A change like this would mean integrating a procedure call, perhaps using the CLR, to execute such tasks based on an internal variable indicating that unit test, or debug mode was enabled within the package. Whilst providing a number of advantages in automated testing and capture of the test results there would be a great deal of work required in the preparation of the test. This would all have to be completed in addition to that suggested in the first option as the pseudo code necessary to design each of the test criteria would be based on the use cases defined at the package design stage.

The final option would be to remove the more advanced mechanism from option 2. The pre analysis and use case definition would still be required but in this option additional test scripts would be placed in the pre and post execution events of the package. This would mean embedding code into the package that would only be used during unit testing that could possibly be switched off using a similar variable to that in the second option.

Thought it would be possible to automate a great deal of the testing for the package it would mean changing the structure or behaviour of the package directly and increase the danger of introducing problems with script tasks that have previously been experienced on 32bit to 64bit transitions.

So there you go. There are ways of doing SSIS unit testing. There are even ways of automating the process but it's not a quick win. It doesn't remove the need to establish good formal testing strategies and it certainly isn't going to appear out of a box, well not soon anyway.

1 comment:

Anonymous said...

Hi, Steve. If you are still interested in unit testing SSIS, you might want to take a look at www.codeplex.com/ssisunit. It's an open sourced unit testing framework for SSIS that I've developed, and I'd like to hear any feedback that you have on it.