IN2BI, Microsoft Business Intelligence
data, stories & insights
rss | email | twitter

Building a data warehouse while the source systems are still in development

Some years ago the client of a contractor I worked for made some major investments in their IT-landscape. They replaced their ERP - and CRM system and of course BI/Reporting was part of the acceptance criteria for the new system. It was a big project and a disaster! ERP, CRM and BI consultants from the contractor where running in and out. Discussing requirement details with the client and adapting the systems to these wishes. For the BI team it was especially hard. When we build something on Monday chances were slim that it still worked on Thursday. We depended upon the ERP and CRM team to communicate their changes and provide us with correct test data. And there was no love wasted between the teams. I was very glad when the opportunity arose to leave this war hole and move on. And I did.

Trending

Nevertheless it seems to become a trend to build/adapt the BI system before a new source system has been launched. And make it part of the acceptance criteria for the new system. This of course offers the client an easy way to verify the new system by comparing the reports they work with. In my previous three projects (part) of the source system was still in development and I would probably have gone crazy keeping up with all the changes if I hadn’t been able to automate the work.

Agility is the name of the game

image

In these situations I have found that you (as in an agility drill) need three things:

 

Testing Framework

A testing framework has a complicated ring to it. But if you keep it simple and focus on the results the users expect to see it will be easy and the advantages are great:

  • You’ll notice any errors early in the process.
  • You can correct the errors before the users’ starts testing.
  • You can pinpoint any changes made in the source system that ruin your reports.
  • You’ll gain confident in your solution and sleep better.

I’ll describe a method that has worked for me. Key to success herein is starting early!

First get hard copies and digital versions of reports the users are using now. Ask them to highlight the important measures. Start entering the tests in an Excel spreadsheet and use the digital versions of the reports to copy any relevant data) I use the following format where I keep values in the column testName unique

testName testType criterion1 criterion2 expectedValue expression
InternetSales201401 DWH 201404 Internet 125.035  

Now the difficult/laborious part: start entering SQL statement in the expression column that will return the expected value. Use [criterion1] and/or [criterion2] as placeholders in these statements. The values in the criterion1 and criterion2 column will replace these placeholders at execution time. You will now be able to copy the expression to similar test with different values for the criteria. Example expression:

SELECT CAST(SUM(f.sales) AS int) FROM factSales f INNER JOIN dimDate d ON f.DateKey = d,DateKey INNER JOIN dimChannel ch ON f.ChannelKey = ch.ChannelKey WHERE d.YearMonth = [criterion1] AND ch.ChannelName = ‘[criterion2]’ /* SQL code of example test expression */

Testing Framework: Tables

Import your Excel spreadsheet into a SQL Server table with the following definition:

CREATE TABLE DataTests ( testName nvarchar(64) , testType nvarchar(20) , criterion1 nvarchar(128) , criterion2 nvarchar(128) , expectedValue int , expression nvarchar(1024) ) /* SQL code to create the DataTests table */

Create an additional table to store the result of the tests that were performed. Use the following definition:

create table DataTestResults ( testName nvarchar(64) , testDate datetime default getdate() , actualValue int , expectedValue int ) /* SQL code to create the DataTestResults table */

Testing Framework: SSIS Package

Now create an SSIS Package that will perform the tests. It uses:

  • An Execute SQL Task to get a list of tests
  • A For Each Loop Container that is used to loop through the list with in it:
  • An Execute SQL Task to execute every test
  • An Execute SQL Task to log the result of every test

 

image

This package can be described with the following BIML (Business Intelligence Markup Language) snippet:

<Packages> <Package Name="PKG TestData" ConstraintMode="Linear"> <Variables> <Variable Name="TestList" DataType="Object" /> <Variable Name="testQuery" DataType="String" /> <Variable Name="testName" DataType="String" /> <Variable Name="expectedValue" DataType="Int32"> 0 </Variable> <Variable Name="actualValue" DataType="Int32"> 0 </Variable> </Variables> <Tasks> <!--Get list with tests to be performed—> <ExecuteSQL Name="SQL GetTestList" ConnectionName="META" ResultSet="Full"> <DirectInput> SELECT testName , testQuery = REPLACE(REPLACE(expression,'[criterion1]',criterion1),'[criterion2]',criterion2) , expectedValue FROM DataTests WHERE testType='DWH' </DirectInput> <Results> <Result Name="0" VariableName="User.TestList" /> </Results> </ExecuteSQL> <!--Loop through tests and perform them—> <ForEachAdoLoop Name="FELC Test" SourceVariableName="User.TestList" ConstraintMode="Linear" > <VariableMappings> <VariableMapping Name="0" VariableName="User.testName" /> <VariableMapping Name="1" VariableName="User.testQuery" /> <VariableMapping Name="2" VariableName="User.expectedValue" /> </VariableMappings> <Tasks> <!--Perform test—> <ExecuteSQL Name="SQL Execute Test" ConnectionName="DWH" ResultSet="SingleRow" > <VariableInput VariableName="User.testQuery" /> <Results> <Result Name="0" VariableName="User.actualValue" /> </Results> </ExecuteSQL> <!--Log test result—> <ExecuteSQL Name="SQL Log Test Result" ConnectionName="META" ResultSet="None"> <DirectInput> INSERT INTO DataTestResults (testName,actualValue,expectedValue) VALUES (?,?,?) </DirectInput> <Parameters> <Parameter Name="0" Direction="Input" DataType="String" VariableName="User.testName" /> <Parameter Name="1" Direction="Input" DataType="Int32" VariableName="User.actualValue" /> <Parameter Name="2" Direction="Input" DataType="Int32" VariableName="User.expectedValue" /> </Parameters> </ExecuteSQL> </Tasks> </ForEachAdoLoop> </Tasks> </Package> </Packages> */ BIML code to create the testing ssis package */

Conclusion

In this blog post I discussed the components of an agile system that are necessary when you build a data warehouse where the source systems are still in development:

  • An Alert System to inform you of changes in the source systems.
  • A Data Warehouse Automation Framework.
  • A Testing Framework.

I ended with a simple implementation of a testing framework that worked for me. Does this help you or did you implement something similar? … I love to hear from you!

» Similar Posts

  1. Logging in SSIS with BIML 1 – Logging Task Execution
  2. Logging in SSIS with BIML 2 – Logging Package Execution in the Control Flow
  3. BIML Script To Disable Indexes And Later Rebuild Indexes

» Trackbacks & Pingbacks

    No trackbacks yet.
Trackback link for this post:
http://blog.in2bi.eu/trackback.ashx?id=96

» Comments

    There are no comments. Kick things off by filling out the form below.

» Leave a Comment