Wednesday, October 04, 2006

Simple SSIS Logging - Part 3, Bad, Bad Rows..!

It’s all well and good knowing that there has been a failure in your package which has lead to a bleary eyed support technician getting phoned at 3 in the morning but sometimes it’s not really necessary to get Barry (we’ll call him Barry for the purposes of this post) out of bed. Your package may be bullet proof and work perfectly but the nature of data is that it will change and can quite often come back and bite you on the backside.

Take for example early arriving facts. When you start integrating a number of systems it is possible for things to start happening that don’t conform to the world according to the system owner. Take the example of an in-house order system operating completely separately to the companies CRM system. During the analysis phase of the data warehousing project it was determined that CRM was the source of Customer ID therefore that’s where customers get created and that’s where customers get maintained. But a new customer phones up and makes an order before the customers details are completed in the CRM mechanism due to one thing or another. Your data warehouse is quite likely to be collecting that fact row before the relevant CRM data has been correctly populated and passed on to the customer dimension. So what happens in your SSIS package when looking up the Customer key for that dimension, well not a hell of a lot.

To be fair there are numerous techniques for dealing with early arriving facts. Some involve the population of the Customer ID and retrieval of the key so that when the correct dimension member arrives in the dimension table, its attributes are populated automatically. The problem is that things aren’t always as clear cut as that. You have managed to capture the early arriving fact data and correctly allocate it to members in the Customer dimension but when board reports run that specify a particular customer type the data is incorrect, the numbers don’t roll up and for 24 hours you could be running around like a headless chicken trying to figure out where the missing data has gone, oh that’s until the next day when the customer dimension member is correctly updated and things mysteriously fix themselves .

So where does logging fit into all of this? Well knowing that we have an early arriving fact would be a good idea. So we have to log the early arriving fact and let the data pass through to allow it into the fact table.

This looks quite simple but when you’re trying to handle a number of dimensions and any of these could have early arriving facts then you can get have a lot of data destinations. In addition to all this there is also the fact that each package could have a different error table at each dimension lookup making logging, which should really be centralised and in a defined structure, a little reactive and difficult to deal with. Dealing with this may not be as difficult as it seems though and for the first time I’m going to mention a customer SSIS task that has been written with this in mind. A colleague of mine has written a component for just this type of situation. Whilst I would have liked to have completed all 3 parts of this Simple SSIS Logging piece without referring to any external code, this really does make life a lot easier. The full source code is available on the site and I recommend its download.

The Xmilfy component (to give it its officially christened name) will turn a series of columns and convert them into a single XML column that has some obvious benefits. In the above example the data is then written to a single table in my SSIS logging database which has a data type of XML allowing the data to be viewed in a number of ways. Extra information can be added into the data flow that specifies where the error occurred, the number of rows etc, again creating detailed and useful reporting. Want to know the number of early arriving for a particular package or even a particular dimension, easily done using the data in the table. Finally a warning can be raised using script to inform Barry, remember him, that some data problems occurred but now it’s a warning and can be handed over to the department in charge of wielding the large stick that is used to stop sales people logging sales before the customer is correctly created. Furthermore the system can be automated to alert said department taking Barry completely out of the loop. But wait a second, Barry is now worried about losing his job but luckily he’s been learning SSIS in his spare time.

It’s ok people, Barry is fine…!

No comments: