IN2BI, Microsoft Business Intelligence
data, stories & insights
rss | email | twitter

SQL2012 Windowing Functions In The Data Warehouse–1. Dimensions

Windowing functions, introduced in SQL Server 2005 and greatly enhanced in SQL Server 2012, have something magical: Within the context of one row in the result set you have access to the contents of the other rows of the result set.

With these functions you can greatly simplify the TSQL you write. Many complex queries with CTE’s, temp tables and sub queries can be rewritten to more simple, better maintainable and better performing queries.  In this post I’ll dive into some possibilities for dimensions.

In SQL 2008 and earlier

In SQL 2008 and earlier I will typically build a dimension table based upon the Type 2 Slowly Changing Dimensions system. And then use a view with a self join to present the user with the historical and/or current attribute values. I will use this small example of a customer dimension:

image

When Marco moved to Paris on July 20th 1988 and married Jose for both of them a new row was added with the new attributes and the EndDate of the old row was changed to the date of the change. This is how attributes changes are handled for Type 2 Slowly Changing Dimensions.

To consume the dimension information I will typically use a view in the model schema as a source for Analysis Services / PowerPivot / Report:

CREATE VIEW model.Customer AS
SELECT his.Id
  , his.Number
  , his.Name
  , his.City
  , CurrentCity = cur.City 
  , his.MaritalStatus
  , CurrentMaritalStatus = cur.MaritalStatus
FROM dim.Customer his
JOIN dim.Customer cur
  ON his.Number = cur.Number
WHERE cur.EndDate ='9999-12-31'

Which will result into:

image

 

In SQL 2012  Using the LAST_VALUE Function

If your not working for the  Oracle at Delphi the last value will typically be the current value of an attribute. So in 2012 this view can be replaced with:

CREATE VIEW model.Customer AS
SELECT Id
  , Number
  , Name
  , City
  , CurrentCity = LAST_VALUE(City)
    OVER(PARTITION BY Number ORDER BY StartDate
     ROWS BETWEEN UNBOUNDED PRECEDING
     AND UNBOUNDED FOLLOWING)
  , MaritalStatus
  , CurrentMaritalStatus = LAST_VALUE(MaritalStatus)
    OVER(PARTITION BY Number ORDER BY StartDate
     ROWS BETWEEN UNBOUNDED PRECEDING
     AND UNBOUNDED FOLLOWING)
FROM dim.Customer

Although the LAST_VALUE function seems a bit awkward to write due to the long window frame clause: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING” it has some nice advantages:

  • No need to use a self join, which will enhance performance.
  • No need to use the EndDate column.
  • Much easier to maintain. Because the purpose of the LAST_VALUE function will be more obvious for your successors.

 

SCD Type 0 with the FIRST_VALUE Function

Occasionally you may stumble upon a request/requirement to show the original value of an attribute. (e.g. the sales that landed the customer). In that case you can simply add a column using the FIRST_VALUE function:

FirstCity = FIRST_VALUE(City) 
  OVER(PARTITION BY Number ORDER BY StartDate)

 

Mapping Queries Using Windowing Functions

When you load fact tables you will want to lookup the surrogate keys of the dimensions. In the most simple variant you would use (in SQL 2008)

SELECT Number, Id FROM dim.Customer
  WHERE EndDate ='9999-12-31'

In SQL 2012, assuming you will not store the EndDate in your ETL process, you can use:

SELECT Number, Id FROM
(SELECT Number, Id, RowNumber = ROW_NUMBER() OVER(PARTITION BY Number ORDER BY StartDate DESC)
FROM dim.Customer) Sub
WHERE RowNumber=1

Unfortunately you will have to use the sub query construct here because it’s not yet possible to use Windowing Functions in the WHERE clause.

 

But why Would you not add an EndDate in the ETL Process?

If you don’t end date rows the ETL process gets much easier, faster and less error-prone: You don’t have the distinguish between new and changed rows you’ll just add both in the same way to the dimension table. And you don’t have to identify and update the ‘old’ rows.

And of course if you really need the EndDate you can just get if with the new LEAD function:

EndDate = LEAD(StartDate, 1, '9999-12-31')
    OVER(PARTITION BY Number ORDER BY StartDate)

 

More Information:

MSDN (Books on Line) about the OVER Clause

Jamie Thomson Debunking Kimball Effective Dates part 2 – Windowing Functions

Introduction blog series on Windowing Functions by Fabiano Amorim on simple talk

Blog post with a instructional video on the SQL 2012 Windowing Functions Leaving the Windows Open by Jeremiah Peschka

Book: Microsoft® SQL Server® 2012 High-Performance T-SQL Using Window Functions by Itzik Ben-Gan

» Similar Posts

  1. SQL2012 Windowing Functions In The Data Warehouse–2. Reporting
  2. Four Approaches to Data Warehousing
  3. Dimensional modeling and Data Vault – a happy marriage?

» Trackbacks & Pingbacks

  1. This is the second post of a diptych on the magical windowing functions in data warehouse scenarios. With these functions you can greatly simplify the TSQL you write. Many complex queries with CTE’s, temp tables and sub queries can be rewritten to simpler

» Comments

    There are no comments.

Comments are closed