IN2BI, Microsoft Business Intelligence
data, stories & insights
rss | email | twitter

SQL2012 Windowing Functions In The Data Warehouse–2. Reporting

This is the second post of a diptych on the magical windowing functions in data warehouse scenarios. With these functions you can greatly simplify the TSQL you write. Many complex queries with CTE’s, temp tables and sub queries can be rewritten to simpler, better maintainable and better performing queries.  In this post I’ll dive into some possibilities for reporting.

For the examples in this post I’ll use the Contoso Retail Data Warehouse database. A sample database for data warehouses provided by Microsoft.

Year To Date (YTD) Calculations

On the Internet you’ll find a lot of examples on using the running total technique to calculate year to date values. In this example I need the monthly sales and the YTD sales for every store.

SELECT CalendarMonth
  , StoreName
  , PeriodSales
  , SalesYTD = SUM(PeriodSales) OVER 
     (PARTITION BY StoreName, CalendarYear ORDER BY CalendarMonth) FROM
  (
  SELECT CalendarYear
    , CalendarMonth
    , StoreName
    , PeriodSales = SUM(sal.SalesAmount)
  FROM FactSales sal
  JOIN DimDate dat ON sal.DateKey = dat.Datekey
  JOIN DimStore sto ON sal.StoreKey = sto.StoreKey
  GROUP BY CalendarYear, CalendarMonth, StoreName
  ) SalesByMonth
ORDER BY StoreName, CalendarMonth

The sub query “SalesByMonth” aggregates the sales amount for every store per month. The windowing function SUM() OVER() calculates the YTD sales. Which will result in the required dataset:

image

 

The SUM(SUM()) OVER() Construction

Since you can use Windowing Functions over an aggregated we don’t need the sub query and we can simplify this query to:

SELECT CalendarMonth
  , StoreName
  , PeriodSales = SUM(SalesAmount)
  , SalesYTD = SUM(SUM(SalesAmount)) OVER 
     (PARTITION BY StoreName, CalendarYear ORDER BY CalendarMonth) FROM FactSales sal
JOIN DimDate dat ON sal.DateKey = dat.Datekey
JOIN DimStore sto ON sal.StoreKey = sto.StoreKey
GROUP BY CalendarYear, CalendarMonth, StoreName
ORDER BY StoreName, CalendarMonth

The second SUM in: “SUM(SUM()) OVER() GROUP BY “ is used in conjunction with the GROUP BY clause to calculate the monthly sales first.
The first SUM in: SUM(SUM()) OVER() GROUP BY is then used in conjunction with the OVER clause to calculate the YTD sales.

 

Comparing to previous year

Adding the figures of the previous year as a comparison is a common reporting requirement. You can easily realize this by using the LAG function returning the results 12 months back in time. Building upon our earlier query:

SELECT *
  , PeriodSalesPrevYear = LAG(PeriodSales,12,0)
      OVER (PARTITION BY StoreName ORDER BY CalendarMonth)
  , YTDSalesPrevYear = LAG(SalesYTD,12,0)
      OVER (PARTITION BY StoreName ORDER BY CalendarMonth)
FROM
(
  SELECT CalendarMonth
    , StoreName
    , PeriodSales = SUM(SalesAmount)
    , SalesYTD = SUM(SUM(SalesAmount))
        OVER (PARTITION BY StoreName, CalendarYear ORDER BY CalendarMonth)
  FROM FactSales sal
  JOIN DimDate dat ON sal.DateKey = dat.Datekey
  JOIN DimStore sto ON sal.StoreKey = sto.StoreKey
  GROUP BY CalendarYear, CalendarMonth, StoreName
) Base
ORDER BY StoreName, CalendarMonth

Which results into:

image

 

How Do We Do Compared to the other stores?

In this example I use the RANK() Function to determine the store’s rank in the total monthly sales and the store’s sales as a percentage of the total monthly sales:

SELECT CalendarMonth
  , StoreName
  , PeriodSales = SUM(SalesAmount)
  , StoreRank = RANK() OVER
     
(PARTITION BY CalendarMonth ORDER BY SUM(SalesAmount) DESC)
  , StoreShare = 100*SUM(SalesAmount)/
      SUM(SUM(SalesAmount)) OVER (PARTITION BY CalendarMonth)
FROM FactSales sal
JOIN DimDate dat ON sal.DateKey = dat.Datekey
JOIN DimStore sto ON sal.StoreKey = sto.StoreKey
GROUP BY CalendarMonth, StoreName

image

 

Compare to (Average of) Previous Periods

In a recent client engagement a report which was used to audit the monthly invoice process gave a lot of troubles. The SQL query behind it was very difficult to comprehend and consisted of several sub queries. By using windowing functions our team was able to greatly simplify the query. The requirement can be restated/simplified to our example as: Give us the current month sales, the previous 3 and the average of those previous 3. This is the resulting query:

SELECT CalendarMonth
  , StoreName
  , PeriodSales = SUM(SalesAmount)
  , SalesPrevPeriod1 = LAG(SUM(SalesAmount),1,0)
      OVER (PARTITION BY StoreName ORDER BY CalendarMonth)
  , SalesPrevPeriod2 = LAG(SUM(SalesAmount),2,0)
      OVER (PARTITION BY StoreName ORDER BY CalendarMonth)
  , SalesPrevPeriod3 = LAG(SUM(SalesAmount),3,0)
      OVER (PARTITION BY StoreName ORDER BY CalendarMonth)
  , AveragePrevPeriods = AVG(SUM(SalesAmount))
      OVER (PARTITION BY StoreName ORDER BY CalendarMonth
      ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING)
FROM FactSales sal
JOIN DimDate dat ON sal.DateKey = dat.Datekey
JOIN DimStore sto ON sal.StoreKey = sto.StoreKey
GROUP BY CalendarYear, CalendarMonth, StoreName
ORDER BY StoreName, CalendarMonth

image

I especially like the way you can use the window frame clause to limit the average to 3 periods: ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING

More Information:

My previous post on using Windowing Functions focused on dimensions 

MSDN (Books on Line) about the OVER Clause

Introduction blog series on Windowing Functions by Fabiano Amorim on simple talk

Blog post with a instructional video on the SQL 2012 Windowing Functions Leaving the Windows Open by Jeremiah Peschka

Book: Microsoft® SQL Server® 2012 High-Performance T-SQL Using Window Functions by Itzik Ben-Gan

» Similar Posts

  1. SQL2012 Windowing Functions In The Data Warehouse–1. Dimensions
  2. SQL Server, SSIS, SSAS and SSRS on ONE Server
  3. Four Approaches to Data Warehousing

» Trackbacks & Pingbacks

    No trackbacks yet.

» Comments

    There are no comments.

Comments are closed