Friday, September 29, 2006

SQL Server 2005 Metadata

Hidden in the depths of Microsoft’s useful download division in Redmond is a handy pair of utilities you may have discovered that analyse dependencies within and across SSIS and Analysis Services. The first tool, Dependency Analyzer, examines the meta data contained within SSIS packages and Analysis Services objects collating their inter-dependencies writing the results to a pre-defined database structure.

The second tool is the Dependency Viewer and provides a graphical interface displaying relationships between the analysed objects .


The amount of data is considerable and has some quite powerful possibilities. There has been a distinct gap in the MS BI product set, dependency and data analysis being two key areas that their competitors eulogise in large amounts about. This appears to be the beginning of some gap plugging before the release of Katmai (SQL Server 2007). In the meantime all the source code, examples and even some reporting interfaces are provided in the download.

I’m quite looking forward to plugging this tool into some of my end to end 2005 projects and looking at the results. Once I've got some good examples of how everything integrates together on this I'll post ome of the results.

Dynamic Partition Creation and Processing – Part 2

The first part of this post discussed how to identify whether a partition existed and create a partition dynamically. Although this post is a little on the suck eggs side I thought it would be better to write it for completeness. Processing a package can be performed using the exact same techniques that are used for dynamic partition creation thus the suck eggs comment.

First let’s look at the XMLA query for processing a partition similar to the one created in the previous post.

<Batch
xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"
>

<Parallel>

<Process
xmlns:xsd="http://www.w3.org/2001/XMLSchema"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>

<Object>

<DatabaseID>Adventure Works DW</DatabaseID>

<CubeID>Adventure Works DW</CubeID>

<MeasureGroupID>Measure_Group_Name</MeasureGroupID>


<
PartitionID>200609</PartitionID>

</Object>

<Type>ProcessFull</Type>

<WriteBackTableCreation>UseExisting</WriteBackTableCreation>

</Process>

</Parallel>

</Batch>

Note there’s no mention of the partition name anywhere. The partition is processed using its ID and that is all we are really interested in. Replacing just that value in the Source expression in an SSIS Analysis Services Execute DDL Task will point us to the correct partition for processing.

Of course if the entire cube structure is built on the same partition naming strategy then creating s single script to process all of this periods partitions is relatively easy. The added bonus of being able to process partitions in parallel can prove a huge advantage in reducing the time taken to update the cube on whatever schedule you have.

These examples are based on cubes structured with monthly or period based partitions. Partition strategy is an important subject for both cube querying and processing performance. I’ll leave that to another post but I just wanted to state that employing a rules mechanism that say merged period based partitions into a single year partition whilst creating new partitions on the fly does not need to be a massively complex process. The methods I’ve discussed in the two posts are probably some of the simpler ways of implementing this.

Tuesday, September 26, 2006

This Week I am Mostly Listening To...



The Young Knives - Voices of Animals and Men

Dynamic Partition Creation and Processing – Part 1

Processing cubes in SSIS is pretty simple if all you want to do is pick a partition and process it but if that partition is time based, which is quite likely then the partition you process each day, week or month will change. So what are your options?

First you can change your package each month, or use a variable with the partition name. Probably not the best of options as someone is going to have to physically make the change each month or come up with some clever method of changing your variable using something like VBA, Excel, etc. Inefficient if you ask me and defeating the object. Couple this with creating a years’ worth or partitions or creating them every time a new one is required and the whole task becomes quite laborious.

So what else can be done? My personal favourite is to use dynamic XMLA but this has a few of its own pitfalls you have to be aware of. First let’s look at the XMLA script for creating a partition and some of the key tags you have to be aware of.

<Create
xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"
>

<ParentObject>

<DatabaseID>Adventure Works DW</DatabaseID>

<CubeID>Adventure Works DW</CubeID>

<MeasureGroupID>Fact Internet Sales 1</MeasureGroupID>

</ParentObject>

<ObjectDefinition>

<Partition
xmlns:xsd="http://www.w3.org/2001/XMLSchema"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>

<ID>Internet_Sales_2001</ID>

<Name>Internet_Sales_2001</Name>

<Annotations>

<Annotation>

<Name>AggregationPercent</Name>

<Value>20</Value>

</Annotation>

</Annotations>

<Source
xsi:type="QueryBinding"
>

<DataSourceID>Adventure Works DW</DataSourceID>

<QueryDefinition>SELECT * FROM [dbo].[FactInternetSales]

WHERE OrderDateKey &lt;= '184'</QueryDefinition>


</Source>

<StorageMode>Molap</StorageMode>

<ProcessingMode>Regular</ProcessingMode>


<
Slice>[Date].[Calendar].[July 2006]</Slice>

<ProactiveCaching>

<SilenceInterval>-PT1S</SilenceInterval>

<Latency>-PT1S</Latency>

<SilenceOverrideInterval>-PT1S</SilenceOverrideInterval>

<ForceRebuildInterval>-PT1S</ForceRebuildInterval>

<AggregationStorage>MolapOnly</AggregationStorage>

<Source
xsi:type="ProactiveCachingInheritedBinding"
>

<NotificationTechnique>Server</NotificationTechnique>

</Source>

</ProactiveCaching>

<EstimatedRows>1013</EstimatedRows>

<AggregationDesignID>AggregationDesign</AggregationDesignID>

</Partition>

</ObjectDefinition>

</Create>

Above is the create script of one of the partitions from the Adventure Works cube. There are some obvious elements in there but the ones that you need to be aware of when building your script dynamically are as follows;

ID – The most important one. Even though this is the same as the name of the partition there’s a couple of things to remember. Firstly they don’t have to be the same, secondly by default it’s named after the partition name. Why is this important? When you first create a measure group in your Analysis Services project (in Visual Studio) the default partition is named after the measure group name and subsequently the ID. The ID is the unique identifier of a partition within a measure group so if your partitions are named MeasureGroup1_200608, MeasureGroup1_200609 you may find the ID’s are MeasureGroup and MeasureGroup_200609 and processing a partition in XMLA is based on the partition ID.

The partition ID is something you can’t change in an Analysis Services project, well I haven’t found a way yet so in my little world it can’t be changed..! The first thing I would do is create a duplicate partition in the cube design and name it what I would like the ID to be. Once created, you can change the name to whatever you like, the ID won’t be changed.

The Name element becomes pretty arbitrary but you still need to ensure that every partition you create has a different name, again if you want it to be the same as the ID then name it accordingly.

The final two are the QueryDefinition and Slice. This is where you may have to step back and think about the overall design of cubes and their dimension structures. For the purpose of this post I’m talking about partitions being sliced by date, this is because in my experience that’s how most cubes will get sliced. Where it’s suitable I would try to make the key of the date dimension a meaningful value i.e. 20060926 = 26th September 2006. Before we go into multiple discussions about why this is a good or bad idea stop, it’s something I do, I know its pros and cons but I’ve found it to be beneficial in a hell of a lot of occasions, this example being one of them.

My QueryDefinition value will be the query that forms the dataset my partition is to be processed from. Generally you’ll see something with a where clause like “WHERE DateKey BETWEEN 20060900 AND 2006999” if the partition is query bound. If the partition is table bound, more than likely you’re using views to distinguish the different data sets. Finally the partition slice. Even though this is documented as being for information purposes only there have been some other well documented grey areas when not populating it. My recommendation is to put a value in anyway, even if it is for information purposes. The slice value for hmmmm let’s say the September 2006 partition in my cube would look something like this “<Slice>[Date].[Calendar].[September 2006]</Slice>
or ideally <Slice>[Date].[Calendar].&[200609]</Slice>“. The key for the slice makes it clear how you need to be considering this kind of thing when designing your dimensions at the data mart as well as OLAP stage as we’re making the month level of the dimension a critical value.

Looking at those key elements of the XMLA statement for creating a partition we get something like this;

<ID>200609</ID>

<Name>Measure_Group_Name_200609</Name>

<QueryDefinition>

SELECT *

FROM FactTable

WHERE DateKey BETWEEN 20060900 AND 20060999

</QueryDefinition>

<Slice>[Date].[Calendar].&[200609]</Slice>

Each of the elements has the common value of “200609” which can be replaced allowing partitions to be created before processing, in this is example at month end. To actually execute this code within a SSIS package I populate the Source expression of an Analysis Services Execute DDL task. This can be either done via direct input or variable (selected by the SourceType property). The expression takes the variable, or direct input XMLA, and evaluates it based on any parameters you need to input. The parameters in this case would be replacing any reference with 200609 to correctly create the new partition.

Couple of things to remember when creating expression of XMLA like this, the script will have to be in double quotes and any double quotes within the script will have to be replaced with ‘\”’. That’s forward slash and double quote.

Finally, before you run the XMLA it would really be good practice to know if the partition already exists before you attempt to create it. AMO (Analysis Management Objects) is the answer to that. The script below, kindly mixed, matched and cribbed from the those lovely Project REAL chaps, will populate a SSIS boolean variable with the answer to whether the partition exists or not. Don’t forget to add AMO as a reference in the SSIS script task.

Imports System

Imports System.Data

Imports System.Math

Imports Microsoft.SqlServer.Dts.Runtime

Imports Microsoft.AnalysisServices

Public
Class ScriptMain

Public
Sub Main()

Dim WellIsItThere As
Boolean

Dim ASConnection As ConnectionManager

Dim ASServer As
String

Dim ASDatabase As
String

Dim ASCube As
String

Dim ASMeasureGrp As
String

Dim ASPartition As
String

ASPartition = Now().ToString("yyyyMM") ' This will give us “200609”

ASConnection = Dts.Connections("AnalysisServer") ' This is the name of the connection manager that you're going to connect to in the package

ASServer = CStr(ASConnection.Properties("ServerName").GetValue(ASConnection))

ASDatabase = CStr(ASConnection.Properties("InitialCatalog").GetValue(ASConnection))

ASCube = "Adventure Works"
' How you populate this is really up to you

ASMeasureGrp = "Internet Sales"
' And again the population of this is up to you

WellIsItThere = IsPartitionThere(ASServer, ASDatabase, ASCube, ASMeasureGrp, ASPartition)

Dts.Variables("PartitionExists").Value = WellIsItThere

Dts.TaskResult = Dts.Results.Success

End
Sub

Public
Function IsPartitionThere(ByVal sServer As
String, _

ByVal sDatabase As
String, ByVal sCube As
String, _

ByVal sMeasureGroup As
String, _


ByVal sPartition As
String) As
Boolean

' By default, we will assume that it isn't there

' Only if we get all of the way to the end and everything is

' found, will we set it true

Dim bIsPartitionThere As
Boolean =
False

Dim oServer As
New Microsoft.AnalysisServices.Server


' connect to the server and start scanning down the

' object hierarchy

oServer.Connect(sServer)

Dim oDB As Database = oServer.Databases.FindByName(sDatabase)

If oDB Is
Nothing
Then

MsgBox("Did not find expected database: " & sDatabase, _

MsgBoxStyle.OkOnly, "Error looking for partition")


GoTo Done


Else


Dim oCube As Cube = oDB.Cubes.FindByName(sCube)


If oCube Is
Nothing
Then

MsgBox("Did not find expected cube: " & sCube, _

MsgBoxStyle.OkOnly, "Error looking for partition")


GoTo Done


Else


Dim oMG As MeasureGroup = _

oCube.MeasureGroups.FindByName(sMeasureGroup)


If oMG Is
Nothing
Then

MsgBox("Did not find expected Measure Group: " & _

sMeasureGroup, _

MsgBoxStyle.OkOnly, _


"Error looking for partition")


GoTo Done


Else


'-- This is the real test -- to see if the partition


' is really there


Dim oPart As Partition = _

oMG.Partitions.Find(sPartition)


If
Not oPart Is
Nothing
Then


' This is the only place to set the value to TRUE

bIsPartitionThere = True


End
If


End
If


End
If


End
If

Done:

oServer.Disconnect() ' disconnect from the server -- we are done

NoConnection:

Return bIsPartitionThere

End
Function

End
Class

Monday, September 18, 2006

This Week I am Mostly Listening To...



Captain - This is Hazelville

Tuesday, September 12, 2006

Simple SSIS Logging - Part 2, Devil in the Detail

Ok so now you know when a package process starts, when it finishes and its final status (See Part 1). What we don’t have is the ability to accurately analyse the individual steps within a package and make effective decisions based on the information available.

First you will have to decide what you are looking to track within the system. Obviously errors are the most important element as fast resolution is essential. Next should be warnings. The warnings generated within SSIS are key to identifying areas that may have performance issues. A common warning in SSIS would be where a column is brought into from a data source but subsequently not used throughout the package. SSIS will raise a warning at every transformation informing a user that these unused columns may hinder performance but does not effect the final outcome of the package.

For these examples we’re writing our log to a database so it’s necessary to have a structure to hold it. A nice simple structure should let you know the type of alert, timestamp, source task and any other descriptive information (such as error code and error description).

When you have the table structure you can write an insert statement, the most sensible option is of course a stored procedure with a number of parameters serving your needs, something like this;

usp_WriteSSISEvent @ServerName, @EventType, @SourcePackage, @SourceTask, @ErrorCode, @ErrorDesc, @TimeStamp

So how do these variables get populated? Well for the most control on what gets written and where (and it’s pretty simple to boot, hence the title), I’m first going to put an Execute SQL task in to the OnError event handler. After setting the connection manager set the SQL statement via an expression. If you’ve not done this before it’s a very useful thing to know about but to cut a long story short it’s possible to set any number of task properties via an expression.

The expression will effectively be the constructed SQL statement with your parameters populated via a combination of system variables and direct input. Here’s The statement for an OnError event based on the above stored procedure example.

"EXECUTE [usp_WriteSSISEvent]

@ServerName = '" + @[System::MachineName] + "'

,@EventType = '" + "OnError" + "'

,@SourcePackage = '" + @[System::PackageName] + "'

,@SourceTask = '" + @[System::SourceName] + "'

,@ErrorCode = " + (DT_WSTR, 15)@[System::ErrorCode] + "

,@ErrorDescription = '" + @[System::ErrorDescription] + "'

,@EventTimeStamp = '" + (DT_WSTR, 30)GETUTCDATE() + "'"

Yeah I know it looks a bit of a mess but trust me, this works .

The pain with this method is it’s not possible to populate the @EventType via a system variable meaning that you have to populate it manually for each event type you wish to record. The other gotcha is that the system variables differ dependent on the event type. For example of the 4 I usually use, OnError, OnWarning, OnPostExecute and OnPreExecute, only errors and warning have the error code and description. Something to be aware of.

If you want to see it working then I have an example project all ready to send, just mail me at mailto:steve@stevemchugh.com?subject=Simple%20SSIS%20Logging%20Part%20Deux and I’ll send you a copy. It has the database structure as a script so there’s not too much effort involved in having a look.

Yet Another Tip as Cunning as a Fox Who's Just Been Appointed Professor of Cunning at Oxford University

SSIS package deployment, in my opinion is a right royal pain in Aris (I'm sure there's some kind of Wiki out there for cockney rhyming slang, there certainly is for Blackadder quotes).

This became evident whilst trying to successfully deploy password protected packages to folder locations on SSIS servers. So again I came up with a plan to make my life a little more bearable. A plan as as cunning as a fox that used to be professor of cunning at Oxford University, but has moved on, and is now working for the UN at the high commission of international cunning planning..!

All my SSIS projects, of significant size, now include a package called sys_DeployPackages. This does something very simple. deploys packages. A couple of variables such as the source folder and the destination, the password I want to use to encrypt the packages, the location folder on the SSIS server and hey presto. Put the variables into a config file and then you can deploy the same files to another server with just a double click.

All this with just the joining of the Execute Process task and dtutil.exe. a combination so cunning you could brush your teeth with it.





This Week I am Mostly Listening To…



The Dears – Gang of Losers

Another Little Tip

I don’t know if you sometimes have a package that contains lots of execute package tasks. Lord knows I do and often too. This is especially the case when pulling a number of tables out of a source system into a staging database.

Well all those individual packages have to have their properties set and their names changes and quite frankly double clicking on each of those tasks waiting for them to spring into life is something I’m not that keen on.

As an expert in trying to makes life a tad easier I now set the package name property via an expression that cunningly grabs the Integration Server path from a variable along with the task name, the combination of which is the full location of the package on the Integration Server.

All I have to do then is copy and paste the task X amount of times and change the name of the task to the package I want to execute. Lazy I hear you say, and there was me thinking I was working “smarter”…!