IN2BI, Microsoft Business Intelligence
data, stories & insights
rss | email | twitter

TSQL: MERGE With EXCEPT to Handle Type 2 Loading

Type 2 Loading is a pattern that is often used to incrementally load dimensions in a data warehouse. But you can also use it for loading satellites in a data vault or loading tables in a historical/persistent staging area. It consist of three parts:

  1. Identify new rows and add them to the target database.
  2. Identify changed rows and add them to the target database.
  3. End date the old rows in the target database that have been replaced with those of step 2.

 

MERGE

There are several ways to implement a Type 2 loading pattern. In this blog post I’ll describe using the TSQL statement: MERGE. I’ve written earlier rather critical about using the SQL MERGE Statement (in Dutch) Since then I have come to like it and compared to other methods it seems to be a good performing option for handling slowly changing dimension type solutions. (See this study from Microsoft’s Matt Mason)

Warren Thornthwaite from the Kimball Group did a good explanation on using the SQL MERGE Statement for Slowly Changing Dimension in  Design Tip #107 Using the SQL MERGE Statement for Slowly Changing Dimension Processing.

In his example for processing type 2 dimension he uses a sample that is similar to the one below:

INSERT INTO dimCustomer (AuditID,StartDate,EndDate, CustomerID, CustomerName, Adress..)
SELECT @AuditID, GETDATE(),NULL,CustomerID, CustomerName, Adress,..
FROM

( MERGE DWH.dbo.dimCustomer tgt
USING STG.dbo.Customer src
ON src.CustomerID = tgt.CustomerID

WHEN NOT MATCHED THEN
INSERT VALUES (@AuditID, GETDATE(), NULL, CustomerID, CustomerName, Adress ..)

WHEN MATCHED
AND tgt.__EndDate IS NULL
AND (tgt.CustomerName <> src.CustomerName
AND (tgt.Adress <> src.Adress)
..
THEN
UPDATE SET tgt.EndDate = GETDATE()

OUTPUT $Action Action_Out, src.CustomerID, CustomerName, Adress ..
) AS MERGE_OUT
WHERE MERGE_OUT.Action_Out = 'UPDATE';

The outcome of the MERGE statement is used as source for the INSERT INTO statement at the top.
(Afore mentioned step 2: Identify changed rows and add them to the target database)
In the Merge statement there are two actions:
1.   Insert any new rows
3.   End date the rows in the source that have changed.

NULL <> NULL

A problem with this example is the implicit assumption that there are no NULL values stored in the columns. Because otherwise the highlighted expressions with nulls would yield to true (NULL<>NULL) and rows would unnecessary be added to the model.

There are however two SQL expressions that consider NULL Values as equal: INTERSECT and EXCEPT. So we can easily circumvent this problem by changing the highlighted rows into:

AND EXISTS (SELECT  tgt.CustomerName, tgt.Adress …
EXCEPT SELECT  src.CustomerName, src.Adress …)

» Similar Posts

  1. Four Approaches to Data Warehousing
  2. SQL2012 Windowing Functions In The Data Warehouse–1. Dimensions
  3. Dimensional modeling and Data Vault – a happy marriage?

» Trackbacks & Pingbacks

    No trackbacks yet.

» Comments

  1. knyazs avatar

    If you want to create and maintain MERGE statement for SCD quickly, use scdmergewizard.codeplex.com

    cheers

    knyazs — November 30, 2012 10:38 PM

Comments are closed