Monday, December 11, 2006

Making The Client Happy - The Real Reason For Yesterdays Post That Rapidly Turned Into Some Kind Of Rant...!

If there's one thing that users hate it's being kept in the dark. So communication is essential and to be honest BI is all about communication. Articulating a series of 0's and 1's via OLAP or reporting something tangible that an end user can get real benefit from is what it's all about.

So we have a process that may rely on another process outside of the control of the BI infrastructure. This is a typical scenario and due to the 24 hour worldwide mannerisms of a lot of the BI implementations I've worked with, that process may be smack bang in the middle of someone's working day.

Scenario established, what are the problems we have here. Well first there's the problem of processing whilst people are working but with the performance tuning techniques available to us in the Microsoft BI space that shouldn't have so much of an impact on your user base. Really you just want to delivery new data to the user as quick as possible.

The real problem here is a user not knowing when their new data is available. Does that balance include yesterdays transactions yet or not? A typical question that may get asked in financial departments at month end something that would be asked a lot more regularly than you would like.

So communication is a problem, or lack of anyway (see the way I did that, started off talking about it and brought it round again to the subject, you're impressed, I can tell..!).

What can be done about it? There are a number of clever little things we should be able to do as part of our BI profession. Use our skill set and let the consumers of our data know the data is there to be consumed. First method, deploy a report at the end of processing to the user or send an information e-mail. Database mail in SQL Server 2005 makes this a pretty simple task these days. This could be a standard link telling the user base to go to a reporting services link and open a report telling them what has happened in the data processing. Problem with this though is it is reliant on a patient user base who will happily stare at the mail box until that little shining light of hope gets sent from "SQLSVR978BSMNT" (I do miss the days when there were fewer servers and they had names based on characters from The Simpsons or something like that.).

OK so your client / user isn't patient or it's 4:30 on the Friday before Christmas (topical huh..!), either works, and you don't want to have to go answer 20 calls from finance departments around the world demanding to know where the data is. Let's look at what they need to know. Take a typical end to end process, it may go something like this;

  • Extract data from one or more source systems.
  • Process said new data into the warehouse / mart / etc.
  • Process data into cube.
  • Distribute reports (or warm cache, it's useful have this catch all at the end)

Now this is high level generalisation of what may be going on but being able to let the uses know this much gives them a massive increase in the perception of the system. Even to just know that the sources system data has not ben extracted empowers the user, if there are aware of where the data is coming from, and why shouldn't they be, to get on the phone and ask New York why their figures haven't been entered yet.

I've already talked about making this happen via sending reports out but lets face it, this isn't the ideal way of doing this when there are numerous users and numerous steps within the process to communicate. So we make the user pro-active. Go to the portal or website and find out what the hold up is or where the problem lies. All we're doing is taken something that has been happening in customer service departments  for years and applying it to the BI systems that we put in place and to be honest with our experience we really should know how to provide someone with beneficial information in a number of formats.

So what is my point in all this? There has been an idea bubbling around my head for a few months now around this subject and visualising end to end processes for clients, mostly based on SSIS and the simple logging model I've posted about previously. The idea was to use Visio to expose data in a diagrammatic format to the users showing them something similar to the SSIS control flow, you know greens, reds, yellows. This became all the more interesting after the recent UK BI user group at Reading where David Parker was presenting discussing visualising data in Visio, essentially the same topic.

There, that's the idea, now I'm going to put it into practice. Stay tuned and if anyone else has had any experience in doing what I've said then I would be really interested in hearing.

This Week I am Mostly Listening To...



Raconteurs - Broken Boy Soldiers

Thursday, November 30, 2006

Making The Client Happy...

Being a consultant involves meeting a lot of different people in a lot of different situations. Most of the people I meet are by an large pretty nice but not all. I shall stop there for fear of incriminating myself :).

One thing is consistent throughout everything I do in my job though and that is that I try to deliver the best solution possible within the constraints placed upon me, whether that be ambitious requirements, time, money, equipment, coffee, etc, etc. I do take a certain pride in what I do and the best thing to come out of it at the end is the successful delivery of a project with the users grateful for the spanking new BI implementation my colleagues and myself have put in place. And hey, if you get to play around with a few extra toys then that's a bonus...

This may counter the myth that as a consultant my role is to bleed the client dry and ensure that the people that command my time have enough cash to bathe in asps milk, ordering people to remove all of the blue M&Ms from large jars made from empty Playstation 3 carcasses whilst drinking fine wine made from the grapes of pandas! (In my head I know what I mean).

I do what I do because I'm interested in it and it pays fairly well at the moment. Ideally I would be a tester at a major games developer, but I can't see that happening until the mortgage is paid off, until then I'll just keeping BI-ing my heart out in the interest of the client.

There was a technical reason for this post but I think my rant went on long enough so I'll perhaps share that tomorrow when I've had a chance to investigate further..

Evening After The Night Before....

Last night I had the pleasure of attending the 2nd UK BI User Group and happened to ask Chris Webb when the next one might be and he answered probably about 6 months. Now go onto one of the IT job boards and type in BI, SSIS, OLAP or any of the many acronyms that relate to this kind of thing and you'll be overloaded with agencies and other organisations offering large sums of money and incentives to come and join them.

What's my point?

Well with that kind of level of interest in BI over the last 18 months and that interest not looking like it will wane in the foreseeable future I think there are a couple of things that should happen....

Firstly more people should attend, now that may be location as TVP (Microsoft UK, Reading) isn't the easiest of places to get to but MS kindly offer their place so who's going to refuse it.

Secondly more regular, things are moving pretty fast, people are coming out with new tips tricks and ways of adapting and implementing the MS BI platform so surely there are plenty more things that can be discussed, one look at a PASS event shows there a hell of a lot of ideas out there.....

I should do my bit as well I suppose, I could make some Turkey sandwiches to take, there might be some left overs in the next month or so..!

Tuesday, October 17, 2006

Failure - Data Source Does not Exist..! - Update

I got asked to post the demo project I had mentioned in the original post a little while ago but unfortunately I had lost it. Luckily I was able to pull enough from memory to build it again and make it available for you pleasure. Whilst not embedded with premier coding standards it certainly provides some direction on how this kind of thing is achievable.

The project is here, any questions then mail me at the usual address.

Apologies for the delay :)

Sunday, October 15, 2006

This Week I am Mostly Listening To...



Morning Runner - Wilderness is Paradise Now

Thursday, October 05, 2006

A Note From The Editor....

Just a little clarification here. I am not a professional Dolphin trainer nor am I an amateur one either although I did once swim with a dolphin in the Florida Keys and found it to be a quite unfulfilling experience as the Dolphin was in a pool that I would consider to be the Dolphin equivalent of one of those Japanese capsule hotels.

The freelance dolphin trainer title comes from a time in my youth when life consisted of doing the things from this song below.

Thank you for your attention in this matter.

Editor out..!

Next Gen Blogging

I really wish I could blog more but I do find that the things I want to blog about do tend take a large chunk of time so when things come along that make blogging a bit more streamlined, I' first in the queue. Firstly the blogging support in Word 2007 is pretty good but I found it a little restrictive on occasion so now I've switched to Windows Live Writer.

I'm not going to write anything about it but point you in the direction of this, where you can get it, and the video below. Direct from YouTube and embedded with a couple of clicks from Live Writer.

Enjoy....

Wednesday, October 04, 2006

Simple SSIS Logging - Part 3, Bad, Bad Rows..!

It’s all well and good knowing that there has been a failure in your package which has lead to a bleary eyed support technician getting phoned at 3 in the morning but sometimes it’s not really necessary to get Barry (we’ll call him Barry for the purposes of this post) out of bed. Your package may be bullet proof and work perfectly but the nature of data is that it will change and can quite often come back and bite you on the backside.

Take for example early arriving facts. When you start integrating a number of systems it is possible for things to start happening that don’t conform to the world according to the system owner. Take the example of an in-house order system operating completely separately to the companies CRM system. During the analysis phase of the data warehousing project it was determined that CRM was the source of Customer ID therefore that’s where customers get created and that’s where customers get maintained. But a new customer phones up and makes an order before the customers details are completed in the CRM mechanism due to one thing or another. Your data warehouse is quite likely to be collecting that fact row before the relevant CRM data has been correctly populated and passed on to the customer dimension. So what happens in your SSIS package when looking up the Customer key for that dimension, well not a hell of a lot.

To be fair there are numerous techniques for dealing with early arriving facts. Some involve the population of the Customer ID and retrieval of the key so that when the correct dimension member arrives in the dimension table, its attributes are populated automatically. The problem is that things aren’t always as clear cut as that. You have managed to capture the early arriving fact data and correctly allocate it to members in the Customer dimension but when board reports run that specify a particular customer type the data is incorrect, the numbers don’t roll up and for 24 hours you could be running around like a headless chicken trying to figure out where the missing data has gone, oh that’s until the next day when the customer dimension member is correctly updated and things mysteriously fix themselves .

So where does logging fit into all of this? Well knowing that we have an early arriving fact would be a good idea. So we have to log the early arriving fact and let the data pass through to allow it into the fact table.

This looks quite simple but when you’re trying to handle a number of dimensions and any of these could have early arriving facts then you can get have a lot of data destinations. In addition to all this there is also the fact that each package could have a different error table at each dimension lookup making logging, which should really be centralised and in a defined structure, a little reactive and difficult to deal with. Dealing with this may not be as difficult as it seems though and for the first time I’m going to mention a customer SSIS task that has been written with this in mind. A colleague of mine has written a component for just this type of situation. Whilst I would have liked to have completed all 3 parts of this Simple SSIS Logging piece without referring to any external code, this really does make life a lot easier. The full source code is available on the site and I recommend its download.

The Xmilfy component (to give it its officially christened name) will turn a series of columns and convert them into a single XML column that has some obvious benefits. In the above example the data is then written to a single table in my SSIS logging database which has a data type of XML allowing the data to be viewed in a number of ways. Extra information can be added into the data flow that specifies where the error occurred, the number of rows etc, again creating detailed and useful reporting. Want to know the number of early arriving for a particular package or even a particular dimension, easily done using the data in the table. Finally a warning can be raised using script to inform Barry, remember him, that some data problems occurred but now it’s a warning and can be handed over to the department in charge of wielding the large stick that is used to stop sales people logging sales before the customer is correctly created. Furthermore the system can be automated to alert said department taking Barry completely out of the loop. But wait a second, Barry is now worried about losing his job but luckily he’s been learning SSIS in his spare time.

It’s ok people, Barry is fine…!

This Week I am Mostly Listening To...



Boy Kill Boy - Civilian

Friday, September 29, 2006

SQL Server 2005 Metadata

Hidden in the depths of Microsoft’s useful download division in Redmond is a handy pair of utilities you may have discovered that analyse dependencies within and across SSIS and Analysis Services. The first tool, Dependency Analyzer, examines the meta data contained within SSIS packages and Analysis Services objects collating their inter-dependencies writing the results to a pre-defined database structure.

The second tool is the Dependency Viewer and provides a graphical interface displaying relationships between the analysed objects .


The amount of data is considerable and has some quite powerful possibilities. There has been a distinct gap in the MS BI product set, dependency and data analysis being two key areas that their competitors eulogise in large amounts about. This appears to be the beginning of some gap plugging before the release of Katmai (SQL Server 2007). In the meantime all the source code, examples and even some reporting interfaces are provided in the download.

I’m quite looking forward to plugging this tool into some of my end to end 2005 projects and looking at the results. Once I've got some good examples of how everything integrates together on this I'll post ome of the results.

Dynamic Partition Creation and Processing – Part 2

The first part of this post discussed how to identify whether a partition existed and create a partition dynamically. Although this post is a little on the suck eggs side I thought it would be better to write it for completeness. Processing a package can be performed using the exact same techniques that are used for dynamic partition creation thus the suck eggs comment.

First let’s look at the XMLA query for processing a partition similar to the one created in the previous post.

<Batch
xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"
>

<Parallel>

<Process
xmlns:xsd="http://www.w3.org/2001/XMLSchema"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>

<Object>

<DatabaseID>Adventure Works DW</DatabaseID>

<CubeID>Adventure Works DW</CubeID>

<MeasureGroupID>Measure_Group_Name</MeasureGroupID>


<
PartitionID>200609</PartitionID>

</Object>

<Type>ProcessFull</Type>

<WriteBackTableCreation>UseExisting</WriteBackTableCreation>

</Process>

</Parallel>

</Batch>

Note there’s no mention of the partition name anywhere. The partition is processed using its ID and that is all we are really interested in. Replacing just that value in the Source expression in an SSIS Analysis Services Execute DDL Task will point us to the correct partition for processing.

Of course if the entire cube structure is built on the same partition naming strategy then creating s single script to process all of this periods partitions is relatively easy. The added bonus of being able to process partitions in parallel can prove a huge advantage in reducing the time taken to update the cube on whatever schedule you have.

These examples are based on cubes structured with monthly or period based partitions. Partition strategy is an important subject for both cube querying and processing performance. I’ll leave that to another post but I just wanted to state that employing a rules mechanism that say merged period based partitions into a single year partition whilst creating new partitions on the fly does not need to be a massively complex process. The methods I’ve discussed in the two posts are probably some of the simpler ways of implementing this.

Tuesday, September 26, 2006

This Week I am Mostly Listening To...



The Young Knives - Voices of Animals and Men

Dynamic Partition Creation and Processing – Part 1

Processing cubes in SSIS is pretty simple if all you want to do is pick a partition and process it but if that partition is time based, which is quite likely then the partition you process each day, week or month will change. So what are your options?

First you can change your package each month, or use a variable with the partition name. Probably not the best of options as someone is going to have to physically make the change each month or come up with some clever method of changing your variable using something like VBA, Excel, etc. Inefficient if you ask me and defeating the object. Couple this with creating a years’ worth or partitions or creating them every time a new one is required and the whole task becomes quite laborious.

So what else can be done? My personal favourite is to use dynamic XMLA but this has a few of its own pitfalls you have to be aware of. First let’s look at the XMLA script for creating a partition and some of the key tags you have to be aware of.

<Create
xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"
>

<ParentObject>

<DatabaseID>Adventure Works DW</DatabaseID>

<CubeID>Adventure Works DW</CubeID>

<MeasureGroupID>Fact Internet Sales 1</MeasureGroupID>

</ParentObject>

<ObjectDefinition>

<Partition
xmlns:xsd="http://www.w3.org/2001/XMLSchema"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>

<ID>Internet_Sales_2001</ID>

<Name>Internet_Sales_2001</Name>

<Annotations>

<Annotation>

<Name>AggregationPercent</Name>

<Value>20</Value>

</Annotation>

</Annotations>

<Source
xsi:type="QueryBinding"
>

<DataSourceID>Adventure Works DW</DataSourceID>

<QueryDefinition>SELECT * FROM [dbo].[FactInternetSales]

WHERE OrderDateKey &lt;= '184'</QueryDefinition>


</Source>

<StorageMode>Molap</StorageMode>

<ProcessingMode>Regular</ProcessingMode>


<
Slice>[Date].[Calendar].[July 2006]</Slice>

<ProactiveCaching>

<SilenceInterval>-PT1S</SilenceInterval>

<Latency>-PT1S</Latency>

<SilenceOverrideInterval>-PT1S</SilenceOverrideInterval>

<ForceRebuildInterval>-PT1S</ForceRebuildInterval>

<AggregationStorage>MolapOnly</AggregationStorage>

<Source
xsi:type="ProactiveCachingInheritedBinding"
>

<NotificationTechnique>Server</NotificationTechnique>

</Source>

</ProactiveCaching>

<EstimatedRows>1013</EstimatedRows>

<AggregationDesignID>AggregationDesign</AggregationDesignID>

</Partition>

</ObjectDefinition>

</Create>

Above is the create script of one of the partitions from the Adventure Works cube. There are some obvious elements in there but the ones that you need to be aware of when building your script dynamically are as follows;

ID – The most important one. Even though this is the same as the name of the partition there’s a couple of things to remember. Firstly they don’t have to be the same, secondly by default it’s named after the partition name. Why is this important? When you first create a measure group in your Analysis Services project (in Visual Studio) the default partition is named after the measure group name and subsequently the ID. The ID is the unique identifier of a partition within a measure group so if your partitions are named MeasureGroup1_200608, MeasureGroup1_200609 you may find the ID’s are MeasureGroup and MeasureGroup_200609 and processing a partition in XMLA is based on the partition ID.

The partition ID is something you can’t change in an Analysis Services project, well I haven’t found a way yet so in my little world it can’t be changed..! The first thing I would do is create a duplicate partition in the cube design and name it what I would like the ID to be. Once created, you can change the name to whatever you like, the ID won’t be changed.

The Name element becomes pretty arbitrary but you still need to ensure that every partition you create has a different name, again if you want it to be the same as the ID then name it accordingly.

The final two are the QueryDefinition and Slice. This is where you may have to step back and think about the overall design of cubes and their dimension structures. For the purpose of this post I’m talking about partitions being sliced by date, this is because in my experience that’s how most cubes will get sliced. Where it’s suitable I would try to make the key of the date dimension a meaningful value i.e. 20060926 = 26th September 2006. Before we go into multiple discussions about why this is a good or bad idea stop, it’s something I do, I know its pros and cons but I’ve found it to be beneficial in a hell of a lot of occasions, this example being one of them.

My QueryDefinition value will be the query that forms the dataset my partition is to be processed from. Generally you’ll see something with a where clause like “WHERE DateKey BETWEEN 20060900 AND 2006999” if the partition is query bound. If the partition is table bound, more than likely you’re using views to distinguish the different data sets. Finally the partition slice. Even though this is documented as being for information purposes only there have been some other well documented grey areas when not populating it. My recommendation is to put a value in anyway, even if it is for information purposes. The slice value for hmmmm let’s say the September 2006 partition in my cube would look something like this “<Slice>[Date].[Calendar].[September 2006]</Slice>
or ideally <Slice>[Date].[Calendar].&[200609]</Slice>“. The key for the slice makes it clear how you need to be considering this kind of thing when designing your dimensions at the data mart as well as OLAP stage as we’re making the month level of the dimension a critical value.

Looking at those key elements of the XMLA statement for creating a partition we get something like this;

<ID>200609</ID>

<Name>Measure_Group_Name_200609</Name>

<QueryDefinition>

SELECT *

FROM FactTable

WHERE DateKey BETWEEN 20060900 AND 20060999

</QueryDefinition>

<Slice>[Date].[Calendar].&[200609]</Slice>

Each of the elements has the common value of “200609” which can be replaced allowing partitions to be created before processing, in this is example at month end. To actually execute this code within a SSIS package I populate the Source expression of an Analysis Services Execute DDL task. This can be either done via direct input or variable (selected by the SourceType property). The expression takes the variable, or direct input XMLA, and evaluates it based on any parameters you need to input. The parameters in this case would be replacing any reference with 200609 to correctly create the new partition.

Couple of things to remember when creating expression of XMLA like this, the script will have to be in double quotes and any double quotes within the script will have to be replaced with ‘\”’. That’s forward slash and double quote.

Finally, before you run the XMLA it would really be good practice to know if the partition already exists before you attempt to create it. AMO (Analysis Management Objects) is the answer to that. The script below, kindly mixed, matched and cribbed from the those lovely Project REAL chaps, will populate a SSIS boolean variable with the answer to whether the partition exists or not. Don’t forget to add AMO as a reference in the SSIS script task.

Imports System

Imports System.Data

Imports System.Math

Imports Microsoft.SqlServer.Dts.Runtime

Imports Microsoft.AnalysisServices

Public
Class ScriptMain

Public
Sub Main()

Dim WellIsItThere As
Boolean

Dim ASConnection As ConnectionManager

Dim ASServer As
String

Dim ASDatabase As
String

Dim ASCube As
String

Dim ASMeasureGrp As
String

Dim ASPartition As
String

ASPartition = Now().ToString("yyyyMM") ' This will give us “200609”

ASConnection = Dts.Connections("AnalysisServer") ' This is the name of the connection manager that you're going to connect to in the package

ASServer = CStr(ASConnection.Properties("ServerName").GetValue(ASConnection))

ASDatabase = CStr(ASConnection.Properties("InitialCatalog").GetValue(ASConnection))

ASCube = "Adventure Works"
' How you populate this is really up to you

ASMeasureGrp = "Internet Sales"
' And again the population of this is up to you

WellIsItThere = IsPartitionThere(ASServer, ASDatabase, ASCube, ASMeasureGrp, ASPartition)

Dts.Variables("PartitionExists").Value = WellIsItThere

Dts.TaskResult = Dts.Results.Success

End
Sub

Public
Function IsPartitionThere(ByVal sServer As
String, _

ByVal sDatabase As
String, ByVal sCube As
String, _

ByVal sMeasureGroup As
String, _


ByVal sPartition As
String) As
Boolean

' By default, we will assume that it isn't there

' Only if we get all of the way to the end and everything is

' found, will we set it true

Dim bIsPartitionThere As
Boolean =
False

Dim oServer As
New Microsoft.AnalysisServices.Server


' connect to the server and start scanning down the

' object hierarchy

oServer.Connect(sServer)

Dim oDB As Database = oServer.Databases.FindByName(sDatabase)

If oDB Is
Nothing
Then

MsgBox("Did not find expected database: " & sDatabase, _

MsgBoxStyle.OkOnly, "Error looking for partition")


GoTo Done


Else


Dim oCube As Cube = oDB.Cubes.FindByName(sCube)


If oCube Is
Nothing
Then

MsgBox("Did not find expected cube: " & sCube, _

MsgBoxStyle.OkOnly, "Error looking for partition")


GoTo Done


Else


Dim oMG As MeasureGroup = _

oCube.MeasureGroups.FindByName(sMeasureGroup)


If oMG Is
Nothing
Then

MsgBox("Did not find expected Measure Group: " & _

sMeasureGroup, _

MsgBoxStyle.OkOnly, _


"Error looking for partition")


GoTo Done


Else


'-- This is the real test -- to see if the partition


' is really there


Dim oPart As Partition = _

oMG.Partitions.Find(sPartition)


If
Not oPart Is
Nothing
Then


' This is the only place to set the value to TRUE

bIsPartitionThere = True


End
If


End
If


End
If


End
If

Done:

oServer.Disconnect() ' disconnect from the server -- we are done

NoConnection:

Return bIsPartitionThere

End
Function

End
Class

Monday, September 18, 2006

This Week I am Mostly Listening To...



Captain - This is Hazelville

Tuesday, September 12, 2006

Simple SSIS Logging - Part 2, Devil in the Detail

Ok so now you know when a package process starts, when it finishes and its final status (See Part 1). What we don’t have is the ability to accurately analyse the individual steps within a package and make effective decisions based on the information available.

First you will have to decide what you are looking to track within the system. Obviously errors are the most important element as fast resolution is essential. Next should be warnings. The warnings generated within SSIS are key to identifying areas that may have performance issues. A common warning in SSIS would be where a column is brought into from a data source but subsequently not used throughout the package. SSIS will raise a warning at every transformation informing a user that these unused columns may hinder performance but does not effect the final outcome of the package.

For these examples we’re writing our log to a database so it’s necessary to have a structure to hold it. A nice simple structure should let you know the type of alert, timestamp, source task and any other descriptive information (such as error code and error description).

When you have the table structure you can write an insert statement, the most sensible option is of course a stored procedure with a number of parameters serving your needs, something like this;

usp_WriteSSISEvent @ServerName, @EventType, @SourcePackage, @SourceTask, @ErrorCode, @ErrorDesc, @TimeStamp

So how do these variables get populated? Well for the most control on what gets written and where (and it’s pretty simple to boot, hence the title), I’m first going to put an Execute SQL task in to the OnError event handler. After setting the connection manager set the SQL statement via an expression. If you’ve not done this before it’s a very useful thing to know about but to cut a long story short it’s possible to set any number of task properties via an expression.

The expression will effectively be the constructed SQL statement with your parameters populated via a combination of system variables and direct input. Here’s The statement for an OnError event based on the above stored procedure example.

"EXECUTE [usp_WriteSSISEvent]

@ServerName = '" + @[System::MachineName] + "'

,@EventType = '" + "OnError" + "'

,@SourcePackage = '" + @[System::PackageName] + "'

,@SourceTask = '" + @[System::SourceName] + "'

,@ErrorCode = " + (DT_WSTR, 15)@[System::ErrorCode] + "

,@ErrorDescription = '" + @[System::ErrorDescription] + "'

,@EventTimeStamp = '" + (DT_WSTR, 30)GETUTCDATE() + "'"

Yeah I know it looks a bit of a mess but trust me, this works .

The pain with this method is it’s not possible to populate the @EventType via a system variable meaning that you have to populate it manually for each event type you wish to record. The other gotcha is that the system variables differ dependent on the event type. For example of the 4 I usually use, OnError, OnWarning, OnPostExecute and OnPreExecute, only errors and warning have the error code and description. Something to be aware of.

If you want to see it working then I have an example project all ready to send, just mail me at mailto:steve@stevemchugh.com?subject=Simple%20SSIS%20Logging%20Part%20Deux and I’ll send you a copy. It has the database structure as a script so there’s not too much effort involved in having a look.

Yet Another Tip as Cunning as a Fox Who's Just Been Appointed Professor of Cunning at Oxford University

SSIS package deployment, in my opinion is a right royal pain in Aris (I'm sure there's some kind of Wiki out there for cockney rhyming slang, there certainly is for Blackadder quotes).

This became evident whilst trying to successfully deploy password protected packages to folder locations on SSIS servers. So again I came up with a plan to make my life a little more bearable. A plan as as cunning as a fox that used to be professor of cunning at Oxford University, but has moved on, and is now working for the UN at the high commission of international cunning planning..!

All my SSIS projects, of significant size, now include a package called sys_DeployPackages. This does something very simple. deploys packages. A couple of variables such as the source folder and the destination, the password I want to use to encrypt the packages, the location folder on the SSIS server and hey presto. Put the variables into a config file and then you can deploy the same files to another server with just a double click.

All this with just the joining of the Execute Process task and dtutil.exe. a combination so cunning you could brush your teeth with it.





This Week I am Mostly Listening To…



The Dears – Gang of Losers

Another Little Tip

I don’t know if you sometimes have a package that contains lots of execute package tasks. Lord knows I do and often too. This is especially the case when pulling a number of tables out of a source system into a staging database.

Well all those individual packages have to have their properties set and their names changes and quite frankly double clicking on each of those tasks waiting for them to spring into life is something I’m not that keen on.

As an expert in trying to makes life a tad easier I now set the package name property via an expression that cunningly grabs the Integration Server path from a variable along with the task name, the combination of which is the full location of the package on the Integration Server.

All I have to do then is copy and paste the task X amount of times and change the name of the task to the package I want to execute. Lazy I hear you say, and there was me thinking I was working “smarter”…!

Wednesday, August 30, 2006

A little tip...

...you know in SSIS when you have to create a parameter based OLE DB command (UPDATE tablename SET column1 = ?, column2 = ?, etc) well an easy way of getting the SQL statement together without the pain of typing those ? in yourself is this.

Right click on the table in SQL Management Studio and create a script to update the table you’re looking to update. Then do a Ctrl H (search and replace) which then gives you the option to replace based on a wildcard search. Search for “= <*>” and replace with “= ?” and voila one update script ready for pasting into the OLE DB command.

Mind you it doesn’t help when you then have almost 100 columns to map out :(

Thursday, August 03, 2006

Simple SSIS Logging - Part 1, The Load ID

Logging in SSIS gives you plenty of options. You have the built in event handling as well as the purpose built logging options. How this is used tends to be pretty different dependent on what project you’re working on and in the majority of cases you’ll find yourself implementing something very similar every time, then you’ll have to implement something very similar into every package on every project. This is time consuming and a pain and you can lose out on a lot of information that to be honest you will need later on for administration.

Let me break it down into a few simple key elements that I believe are important in BI based SSIS projects. Firstly you need to be able to capture the start and end time of a particular process. Secondly, absolute traceability of the data from the source system interfaces through to the warehouse storage itself. Finally, organising the output from any logging into a known structure for later analysis.

The concept of a load ID is fairly familiar to anyone moving data out of a source into a staging database, or if you’re skipping the middle man, straight into a warehouse. The load ID can be a unique indicator based on the specific process execution that extracted the data or identifies a number of loads to different tables. In the second case the source dataset itself becomes part of the identifier.




















Typically the start load process will populate a known table in a logging environment somewhere and return to the package a Load ID. On a recent project this was built in to its own component that had properties such as the connection. The returned Load ID will get tagged onto each of the records that are written in the Data Flow Task. Using this information, along with the data source or file, you have full end to end coverage of where a record came from and where it has ended up in, for the purposes of this say, the staging database.

Obviously there’s a little more to this than just putting a script at the top and bottom of the control flow but not much more. The structure of the table that both provides the Load ID and logs the start time, end time and also the status of a particular load is pretty simple.

click on image to enlarge

There’s a bunch of other information you can attach to this table such as the package name, source server, pretty much anything you like. The important thing is capturing the core data and the real power comes from the reporting available from such a narrow dataset. Add the environment type and you can make comparisons between development, test and production performance. Add in a process identifier and you can see if one of your processes for transforming BI data is affecting a daily load task performed on one of your transactional systems.

Although this provides a very simple mechanism for logging and tracking SSIS packages, that’s all it does. For even more detailed information and to track problem areas within a package, a whole new level of logging is required. That I’ll go into in another post. I am of the view that structuring logging in this fashion not only provides benefits for administration after us consultant types have cleared off site, but also speeds up implementation when you have a few simple SSIS components and a single logging database to deploy. All this and able to apply the K.I.S.S. * rule to the process.

* Keep It Simple Stupid

Wednesday, August 02, 2006

Failure - Data Source Does Not Exist..!

One of the SSIS interfaces I have been working on a project pulls data from 10 identically structured databases. The task we had was to pull data for a particular day, supplied via a parent variable, and place the data into a single file for import into the staging database of the BI infrastructure.

The problem was that if one of the data sources was down, the WAN connection playing up or there was any other problem that would cause one of the data sources to fail, the entire package would fail. We didn’t want this behaviour because the client wanted as much data as possible rather than a complete failure.

For starters the OLEDB source wasn’t going to help us in this situation so we had to go for a scripted data source and changing the Acquire Connection code with a simple Try-Catch statement. This worked well and does the trick. We can run the entire package without any problems building up the connection string as variables that are passed into the script and connecting to the required single data-source. But then another requirement popped up.

How do I know when a data-source has failed, I don’t want the package to fail but a warning would be nice and enable us to capture the event in our logging mechanism. Enter the FireWarning method. This little diamond, along with its cousins FireError, FireInformation and FireProgress allows you to place custom entries into the SSIS logging.

I have an example package built on Adventure Works if anyone wants it and, although quite simple, I have come across a number of instances where connecting to multiple data sources (including flat file connections) where this behaviour is needed.

Secret Source of Pleasure..

... or not so secret if you check out some of my other links over there -->

I am the proud owner of a launch day purchased Xbox 360 and in my spare time enjoy the odd thrash on there. My recommnedations at the moment are

Geomtery Wars - for a quick thrash on the machine.
Table Tennis - so simple yet so enjoyable.
Street Fighter II - coming to Xbox Live Arcade today, god I wasted so much of my time at college on that game..!

If anyone fancies a round or two on anything my gamertag is one1eg (that's a one in there).

See you online.

Steve

This Week I am Mostly Listening To...













The Rifles - No Love Lost

Other Places to go...

In my first proper contribution to this years blogging (yeah, yeah, I know) I was going to give a quick overview of some of the blogs I regularly read and why.

First up is a Mark Hill (http://markhill.org/blog). I've known Mark for a number of years and worked with him for almost all that time. His blog is in his own words "a ramble through Microsoft BI technologies". Give the guy a little credit here, it's not a ramble but quite an insightful comment on Microsoft’s BI platform and essential viewing if you're looking for information on the problems you'll face implementing large scale SSIS and Analysis Services systems and especially if you're looking to put 64 bit servers within your architecture.

Next is Thiru Sutha's SQL Server 2005 BI blog (http://tsutha.blogspot.com/). Again someone I've know and worked with for a number of years has been working on the same projects as Mark Hill and significantly contributed to stabilising 64 bit SSIS. Definitely worth a viewing and contains some good example SSIS packages for your pleasure in addition to postings on known SSIS bugs.

Chris Webb (http://cwebbbi.spaces.live.com/) is a pretty well known individual in the world of Microsoft business intelligence and is your first port of call if you need to know anything about using MDX in the field. His collaboration in the latest version of MDX Solutions (along with George Spofford amongst others) gives the book a lot more weight and makes it an excellent guide to anyone looking to enhance their MDX knowledge.

Off the BI band wagon slightly here The Joy of Code (http://www.thejoyofcode.com/) is a collaborative blog that includes Josh Twist, a colleague I’ve worked with recently on a SQL Server 2005 BI architecture implementation. Josh is .Net developer by trade but was huge benefit to the projects custom SSIS component development. A number of the SSIS components he worked on are available there with example code for download, some of which I’ll discuss in further detail later along with some recommended applications.

Finally another SSIS orientated blog from Jamie Thomson (http://blogs.conchango.com/jamiethomson). I have, lets say, borrowed plenty of SSIS code from this site recently and his postings on things like what you could do in DTS but aren’t as obvious in SSIS are very useful for anyone making that transition.
Well that’s it for now hopefully you’ll find these sites as useful as I have.

Steve

Confession of a Freelance Dolphin Trainer...

Forgive me Blogger for I have sinned.

I've slacked in the blogging world in recent months so I thought I would get my ass into gear and whilst my time isn't complete monopolised with the daily grind I thought I would contribute to the amount of information currently in existence.

The next few post will be general updates on what is happening in my world be it work related or not.....

Steve