IN2BI, Microsoft Business Intelligence
data, stories & insights
rss | email | twitter

DWH DECK Part 1: Overview

This is the first part of a series introducing  the DWH DECK. A tool that will allow you to build and maintain your data warehouse better and faster. Read this series that acts as documentation and that will give you some insights into the program.

Or view this video:

  • Tue
    11
    Oct 11

    Data Vault Automation Conference

    in2bi was one of the sponsors of the Data Vault Automation Conference which was held in Utrecht on October 6th.

    Data Vault has become pretty popular in the Netherlands the last years. Due to the rigid modular approach of Data Vault modeling source data can easily be divided in the three standard table types with standard load mechanisms which lead to easy automation.

    Both aspects contribute to a lot of innovation in the Netherlands with respect to data warehouse automation.

    During the conference more than 100 participants where kept up to date by:

    You can download the slides of my session by clicking on the powerpoint logo.PowerPoint

  • Mon
    06
    Jun 11

    Satellite table template used in the DWH Deck

    The next version of the DWH Deck will support templates. This is the documentation of the satellite table template.

    Variables/placeholders used:

    @Database, the name of the data vault database e.g.: DWH_dv
    @MetaDatabase, the name of the meta database e.g.: DWH
    @BusinessArea, the name of the business area, implemented as schema e.g.: sales
    @Table, the name of the satellite table e.g.: Categories_hub_sat
    @ParentTable, the name of the related hub or link for this satellite
    @SourceTableName, the full identifying name of the related staging table e.g.: [DWH_st].[sales].[Northwind_dbo_Categories]
    @ModelTableID, is used to track the model table to the audit table
    @Sqn, the suffix that is used for the surrogate key in the link or hub (parent)table
    @SourceNumber, when there is more than one source for the link table
    @LookupParentSqn, a sql statement to find  the related parent identifier like:

    SELECT MAX([Categories_hub_sqn])
         FROM [sales].[Categories_hub] R WITH (NOLOCK)
         WHERE (st.[CategoryID = R.[CategoryID] OR (st.[CategoryID] IS NULL AND R.[CategoryID] IS NULL)

    @@ColumnList, a list of columns that are part of the satellite:

    [CategoryName]
    , [Description] 

    @@ColumnTableList, a list of columns with the data type defining the table:

    [CategoryName] varchar(64)
    , [Description] varchar(max)

    @@ColumnViewList, a list used in the view to get the related column info from the staging area

    [CategoryName] = st.[CategoryName]
    , [Description] = st.[Description]

    @@ColumnWhereList, a list of Where statements checking existents in target satellite table

    (src.[CategoryName] = trgt.[CategoryName] OR (src.[CategoryName] IS NULL AND trgt.[CategoryName] IS NULL)
    AND (src.[Description] = trgt.[Description] OR (src.[Description] IS NULL AND trgt.[Description] IS NULL)

     

    Satellite Table Template

    The SQL statement used to create the hub table:

    USE [@Database]
    GO

    --save old data (if any) into backup of table with prefix _
    IF EXISTS (SELECT 1 FROM [sys].[objects]
    WHERE object_id=OBJECT_ID(N'[@BusinessArea].[@Table]') AND type in (N'U'))
    BEGIN
    IF (SELECT COUNT(*) FROM [@BusinessArea].[@Table])>0
    BEGIN
    IF EXISTS (SELECT 1 FROM [sys].[objects]
    WHERE object_id=OBJECT_ID(N'[@BusinessArea].[_@Table]') AND type in (N'U'))
    DROP TABLE [@BusinessArea].[_@Table];
    SELECT * INTO [@BusinessArea].[_@Table]
    FROM [@BusinessArea].[@Table];
    END
    DROP TABLE [@BusinessArea].[@Table]
    END

    --create table
    CREATE TABLE [@BusinessArea].[@Table]
    ( [@ParentTable@Sqn] INT NOT NULL
    , [LoadDate] DATETIME NOT NULL
    , [LoadDateEnd] DATETIME
    , [AuditID] INT
    , @@ColumnTableList
    , CONSTRAINT [pk_@Table] PRIMARY KEY ([@ParentTable@Sqn],[LoadDate])
    , CONSTRAINT [fk_@Table] FOREIGN KEY ([@ParentTable@Sqn]) REFERENCES [@ParentTable
    );

    CREATE INDEX [ix_@Table] ON [@Database].[@BusinessArea].[@Table]
    ( [LoadDate]
    , [LoadDateEnd]);

    The part of the statement to drop and recreate the table is rather large (as in all data vault tables) This is helpful if you have accumulated some history in your link table and you made a change in the definition. All data is first stored in a new table with the prefix "_" before dropping the table. Once you have created your new link table you can still access the old data and import it manual from this _table to the newly created link table.

     

    Satellite View Template

    The SQL statement used to create a view which is used to import the data from the staging table:

    USE [@Database]
    GO

    --drop previous view if it exists
    IF EXISTS (SELECT 1 FROM [sys].[views]
    WHERE object_id = OBJECT_ID(N'[@BusinessArea].[vwImport_@Table@SourceNumber]'))
    DROP VIEW [@BusinessArea].[vwImport_@Table@SourceNumber];

    --create view
    USE [@Database]
    GO
    --@
    CREATE VIEW [@BusinessArea].[vwImport_@Table@SourceNumber]
    AS

    SELECT DISTINCT TOP 100 PERCENT src.*
    FROM (SELECT 
      [@ParentTable@Sqn]= ISNULL((@LookupParentSqn
       ),0)
    @@ColumnViewList
    FROM @SourceTableName st WITH (NOLOCK)) src

    --check if source (src) row exists in target (trgt)
    WHERE NOT EXISTS (SELECT 1
    FROM [@BusinessArea].[@Table] trgt WITH (NOLOCK)
    WHERE ISNULL(src.[@ParentTable@Sqn],0)=ISNULL(trgt.[@ParentTable@Sqn],0)
    AND @@ColumnWhereList))
    ORDER BY [@ParentTable@Sqn]


    Satellite ETL Template

    The SQL Statement to create a stored procedure that is used to import the data:

    USE [@Database]
    GO

    --drop previous etl-procedure if it exists
    IF EXISTS (SELECT * FROM [sys].[objects]
    WHERE object_id = OBJECT_ID(N'[@BusinessArea].[uspImport_@Table@SourceNumber]')
    AND type in (N'P', N'PC'))
    DROP PROCEDURE [@BusinessArea].[uspImport_@Table@SourceNumber]

    --create etl-procedure
    USE [@Database]
    GO
    --@
    CREATE PROCEDURE [@BusinessArea].[uspImport_@Table@SourceNumber]
    AS

    --get the audit_id
    DECLARE @AuditID INT
    INSERT INTO [@MetaDatabase].[dbo].[Audit] (StartDate, ModelTableID)
    VALUES( getdate(), @ModelTableID);
    SET @AuditID=@@IDENTITY
    DECLARE @DATE DATETIME;
    DECLARE @ROWSINSERTED INT;
    SET @DATE=GETDATE();

    --insert into target
    INSERT INTO [@BusinessArea].[@Table]
    ( [AuditID]
    , @@ColumnList
    ) SELECT
      @AuditID AS AuditID
    , @DATE AS LoadDate
    , * FROM [@BusinessArea].[vwImport_@Table@SourceNumber] ;

    -- get rowCount of inserted rows
    SET @ROWSINSERTED=@@ROWCOUNT

    --enddate the old entries of this satellite
    UPDATE [@BusinessArea].[@Table]
    SET LoadDateEnd=@DATE
    WHERE [@ParentTable@Sqn] IN
         (SELECT [@ParentTable@Sqn]
         FROM [@BusinessArea].[@Table]
         WHERE LoadDateEnd IS NULL
         GROUP BY [@ParentTable@Sqn]
         HAVING COUNT(*)>1 )
    AND LoadDate<@DATE

    -- update audit table with the number of rows
    UPDATE [@MetaDatabase].[dbo].[Audit]
    SET RowsInserted=@ROWSINSERTED, RowsUpdated=@@ROWCOUNT, EndDate=getdate()
    WHERE AuditID=@AuditID;

  • Wed
    01
    Jun 11

    Link table template used in the DWH Deck

    The next version of the DWH Deck will support templates. This is the documentation of the link table template.

    Variables/placeholders used:

    @Database, the name of the data vault database e.g.: DWH_dv
    @MetaDatabase, the name of the meta database e.g.: DWH
    @BusinessArea, the name of the business area, implemented as schema e.g.: sales
    @Table, the name of the link table e.g.: Sales_link
    @SourceTableName, the full identifying name of the related staging table e.g.: [DWH_st].[sales].[Northwind_dbo_Orders]
    @ModelTableID, is used to track the model table to the audit table
    @Sqn, the suffix that is used for the surrogate key in the link table
    @SourceNumber, when there is more than one source for the link table

    @@ColumnList, a list of surrogate key columns that identify the related hubs like:

    [Product_hub_sqn]
    , [Customer_hub_sqn]

    @@ColumnTableList, a list of columns with the data type defining the table:

    [Product_hub_sqn] int
    , [Customer_hub_sqn] int

    @@ColumnForeignKeyList, a list defining the foreign key relations to the hubs:

    CONSTRAINT fk_Orders_link1 FOREIGN KEY ([Product_hub_sqn]) REFERENCES [Product_hub]
    ,CONSTRAINT fk_Orders_link2 FOREIGN KEY ([Customer_hub_sqn]) REFERENCES [Customer_hub]

    @@ColumnViewList, a list that seeks the related surrogate key in the hub or satellite:

    [Product_hub_sqn] =(SELECT MAX(ref1.[Product_hub_sqn]) FROM Product_hub_ProductID ref1 WHERE ref1.ProductID=st.ProductID)
    , [Customer_hub_sqn] = (SELECT MAX(ref2.[Customer_hub_sqn]) FROM Customer_hub ref2 WHERE ref2.CustomerID=st.CustomerID)

    @@ColumnWhereList, a list of Where statements checking existents in target link table

    ISNULL(src.[Product_hub_sqn],0) = ISNULL(trgt.[Product_hub_sqn],0)
    AND ISNULL(src.[Customer_hub_sqn],0) = ISNULL(trgt.[Customer_hub_sqn],0)

    Link Table Template

    The SQL statement used to create the hub table:

    USE [@Database]
    GO

    --save old data (if any) into backup of table with prefix _
    IF EXISTS (SELECT 1 FROM [sys].[objects]
    WHERE object_id=OBJECT_ID(N'[@BusinessArea].[@Table]') AND type in (N'U'))
    BEGIN
    IF (SELECT COUNT(*) FROM [@BusinessArea].[@Table])>0
    BEGIN
    IF EXISTS (SELECT 1 FROM [sys].[objects]
    WHERE object_id=OBJECT_ID(N'[@BusinessArea].[_@Table]') AND type in (N'U'))
    DROP TABLE [@BusinessArea].[_@Table];
    SELECT * INTO [@BusinessArea].[_@Table]
    FROM [@BusinessArea].[@Table];
    END
    DROP TABLE [@BusinessArea].[@Table]
    END

    --create table
    CREATE TABLE [@BusinessArea].[@Table]
    ( [@Table@Sqn] INT NOT NULL IDENTITY(1,1)
    , [AuditID] INT
    , @@ColumnTableList
    , CONSTRAINT [pk_@Table] PRIMARY KEY
    ( [@Table@Sqn])
    , @@ColumnForeignKeyList);

    CREATE UNIQUE INDEX [ux_@Table]
    ON [@Database].[@BusinessArea].[@Table]
    ( @@ColumnList)
    INCLUDE ([@Table@Sqn]);

    The part of the statement to drop and recreate the table is rather large (as in all data vault tables) This is helpful if you have accumulated some history in your link table and you made a change in the definition. All data is first stored in a new table with the prefix "_" before dropping the table. Once you have created your new link table you can still access the old data and import it manual from this _table to the newly created link table.

    The index uses the surrogate key in the "include" part. Since most queries to this table will need this key its covered in the index.

    Link View Template

    The SQL statement used to create a view which is used to import the data from the staging table:

    USE [@Database]
    GO

    --drop previous view if it exists
    IF EXISTS (SELECT 1 FROM [sys].[views]
    WHERE object_id = OBJECT_ID(N'[@BusinessArea].[vwImport_@Table@SourceNumber]'))
    DROP VIEW [@BusinessArea].[vwImport_@Table@SourceNumber];

    --create view
    USE [@Database]
    GO
    --@
    CREATE VIEW [@BusinessArea].[vwImport_@Table@SourceNumber]
    AS
    SELECT src.* FROM (SELECT
    @@ColumnViewList
    FROM @SourceTableName st WITH (NOLOCK)) src

    --check if source (src) row exists in target (trgt)
    WHERE NOT EXISTS (SELECT 1
    FROM [@BusinessArea].[@Table] trgt WITH (NOLOCK)
    WHERE (@@ColumnWhereList)
    )


    Link ETL Template

    The SQL Statement to create a stored procedure that is used to import the data:

    USE [@Database]
    GO

    --drop previous etl-procedure if it exists
    IF EXISTS (SELECT * FROM [sys].[objects]
    WHERE object_id = OBJECT_ID(N'[@BusinessArea].[uspImport_@Table@SourceNumber]')
    AND type in (N'P', N'PC'))
    DROP PROCEDURE [@BusinessArea].[uspImport_@Table@SourceNumber]

    --create etl-procedure
    USE [@Database]
    GO
    --@
    CREATE PROCEDURE [@BusinessArea].[uspImport_@Table@SourceNumber]
    AS

    --get the audit_id
    DECLARE @AuditID INT
    INSERT INTO [@MetaDatabase].[dbo].[Audit] (StartDate, ModelTableID)
    VALUES( getdate(), @ModelTableID);
    SET @AuditID=@@IDENTITY

    --insert into target
    INSERT INTO [@BusinessArea].[@Table]
    ( [AuditID]
    , @@ColumnList
    ) SELECT @AuditID, * FROM
    [@BusinessArea].[vwImport_@Table@SourceNumber] ;

    -- update audit table with the number of rows
    UPDATE [@MetaDatabase].[dbo].[Audit]
    SET RowsInserted=@@ROWCOUNT, EndDate=getdate()
    WHERE AuditID=@AuditID;

  • Tue
    31
    May 11

    Hub table template used in the DWH Deck

    The next version of the DWH Deck will support templates. This is the documentation of the hub table template.

    Variables/placeholders used:

    @Database, the name of the data vault database e.g.: DWH_dv
    @MetaDatabase, the name of the meta database e.g.: DWH
    @BusinessArea, the name of the business area, implemented as schema e.g.: sales
    @Table, the name of the hub table e.g.: Product_hub
    @SourceTableName, the full identifying name of the staging table e.g.: [DWH_st].[sales].[Northwind_dbo_Products]
    @ModelTableID, is used to track the model table to the audit table
    @Sqn, the suffix that is used for the surrogate key in the hub table
    @SourceNumber, when there is more than one source for the hub

    @@ColumnList, a list of columns that identify the business key like:

    [ProductCode]
    , [Color]

    @@ColumnTableList, a list of columns with the data type defining the table:

    [ProductCode] char(6)
    , [Color] char(2)

    @@ColumnViewList, a list of columns used in the view:

    [ProductCode]=src.[ProductCode]
    ,[Color]=src.[Color]

    @@ColumnWhereList, a list of Where statements checking existents in target

    src.[ProductCode] = trgt.[ProductCode] OR (src.[ProductCode] IS NULL AND trgt.[ProductCode] IS NULL))
    AND (src.[Color] = trgt.[Color] OR (src.[Color] IS NULL AND trgt.[Color] IS NULL))

     

    Hub Table Template

    The SQL statement used to create the hub table:

    USE [@Database]
    GO

    --save old data (if any) into backup of table with prefix _
    IF EXISTS (SELECT 1 FROM [sys].[objects]
    WHERE object_id=OBJECT_ID(N'[@BusinessArea].[@Table]') AND type in (N'U'))
    BEGIN
    IF (SELECT COUNT(*) FROM [@BusinessArea].[@Table])>0
    BEGIN
    IF EXISTS (SELECT 1 FROM [sys].[objects]
      WHERE object_id=OBJECT_ID(N'[@BusinessArea].[_@Table]') AND type in (N'U'))
    DROP TABLE [@BusinessArea].[_@Table];
    SELECT * INTO [@BusinessArea].[_@Table]
      FROM [@BusinessArea].[@Table];
    END
    DROP TABLE [@BusinessArea].[@Table]
    END

    --create table
    CREATE TABLE [@BusinessArea].[@Table]
    ( [@Table@Sqn] INT NOT NULL IDENTITY(1,1)
    , [AuditID] INT
    , @@ColumnTableList
    , CONSTRAINT [pk_@Table] PRIMARY KEY
    ( [@Table@Sqn]) );

    CREATE UNIQUE INDEX [ux_@Table]
      ON [@Database].[@BusinessArea].[@Table]
    ( @@ColumnList)
    INCLUDE ([@Table@Sqn]);

    The part of the statement to drop and recreate the table is rather large (as in all data vault tables) This is helpful if you have accumulated some history in your hub table and you made a change in the definition. All data is first stored in a new table with the prefix "_" before dropping the table. Once you have created your new hub table you can still access the old data and import it manual from this _table to the newly created hub table.

    The index uses the surrogate key in the "include" part. Since most queries to this table will need this key its covered in the index.

     

    Hub View Template

    The SQL statement used to create a view which is used to import the data from the staging table:

    USE [@Database]
    GO

    --drop previous view if it exists
    IF  EXISTS (SELECT 1 FROM [sys].[views]
      WHERE object_id = OBJECT_ID(N'[@BusinessArea].[vwImport_@Table@SourceNumber]'))
    DROP VIEW [@BusinessArea].[vwImport_@Table@SourceNumber];

    --create view
    USE [@Database]
    GO
    --@
    CREATE VIEW [@BusinessArea].[vwImport_@Table@SourceNumber]
    AS
    SELECT DISTINCT
    @@ColumnViewList
    FROM @SourceTableName src WITH (NOLOCK)

    --check if source (src) row exists in target (trgt)
    WHERE NOT EXISTS (SELECT 1
    FROM [@BusinessArea].[@Table] trgt WITH (NOLOCK)
    WHERE (@@ColumnWhereList)
    )


    Hub ETL Template

    The SQL Statement to create a stored procedure that is used to import the data:

    USE [@Database]
    GO

    --drop previous etl-procedure if it exists
    IF  EXISTS (SELECT * FROM [sys].[objects]
    WHERE object_id = OBJECT_ID(N'[@BusinessArea].[uspImport_@Table@SourceNumber]')
    AND type in (N'P', N'PC'))
    DROP PROCEDURE [@BusinessArea].[uspImport_@Table@SourceNumber]

    --create etl-procedure
    USE [@Database]
    GO
    --@
    CREATE PROCEDURE [@BusinessArea].[uspImport_@Table@SourceNumber]
    AS

    --get the audit_id
    DECLARE @AuditID INT
    INSERT INTO [@MetaDatabase].[dbo].[Audit] (StartDate, ModelTableID)
    VALUES( getdate(), @ModelTableID);
    SET @AuditID=@@IDENTITY

    --insert into target
    INSERT INTO [@BusinessArea].[@Table]
    ( [AuditID]
    , @@ColumnList
    ) SELECT @AuditID, * FROM
    [@BusinessArea].[vwImport_@Table@SourceNumber] ;

    -- update audit table with the number of rows
    UPDATE [@MetaDatabase].[dbo].[Audit]
    SET RowsInserted=@@ROWCOUNT, EndDate=getdate()
    WHERE AuditID=@AuditID;

  • Mon
    30
    May 11

    Staging table template used in the DWH Deck

    The next version of the DWH Deck will support templates. This is the documentation of the staging table template.

    Variables / placeholders used:

    @Database, the name of the staging database e.g.: DWH_st
    @MetaDatabase, the name of the meta database e.g.: DWH
    @BusinessArea, the name of the business area, implemented as schema e.g.: sales
    @Table, the name of the staging table e.g.:Northwind_dbo_Categories
    @SourceTableName, the full identifying name of the source table e.g.: [Server].[Northwind].[dbo].[Categories]
    @ModelTableID, is used to track the model table to the audit table
    @DeltaWhereClause, to determine the where clause if a delta column is selected
    @DeltaUpdateStatement, to record last delta value in the audit table

    @@ColumnList, a list of columns like:

      [CategoryID]
    , [CategoryName]
    , [Description]

    @@ColumnTableList, a list of columns with the data type defining the table:

      [CategoryID] int
    , [CategoryName] varchar(64)
    , [Description] varchar(max)

    @@ColumnViewList, a list of columns used in the view, to import the data from the source. Performing some 'light' transformations:

      [CategoryID] = CONVERT( int, ISNULL([CategoryID],0))
    , [CategoryName] = CONVERT( varchar(64), LTRIM(RTRIM([CategoryName])))
    , [Description] = CONVERT( varchar(max), [Description])

     

    Staging Table Template

    The SQL statement used to create the staging table:

    USE [@Database]
    GO

    --drop previous table
    IF EXISTS (SELECT 1 FROM [sys].[objects]
    WHERE object_id=OBJECT_ID(N'[@BusinessArea].[@Table]') AND type in (N'U'))
    DROP TABLE [@BusinessArea].[@Table];

    --create table
    CREATE TABLE [@BusinessArea].[@Table]
    ( AuditID INT
    @@ColumnTableList
    )

     

    Staging View Template

    The SQL statement used to create a view which is used to import the data from the source:

    USE [@Database]
    GO

    --drop previous view if it exists
    IF  EXISTS (SELECT 1 FROM [sys].[views]
      WHERE object_id = OBJECT_ID(N'[@BusinessArea].[vwImport_@Table]'))
    DROP VIEW [@BusinessArea].[vwImport_@Table];

    --create view
    USE [@Database]
    GO

    --@
    CREATE VIEW [@BusinessArea].[vwImport_@Table]
    AS
    SELECT
      @@ColumnViewList
    FROM @SourceTableName WITH (NOLOCK)

     

    Staging ETL Template

    The SQL Statement to create a stored procedure that is used to import the data:

    USE [@Database]
    GO

    --drop previous etl-procedure if it exists
    IF  EXISTS (SELECT * FROM [sys].[objects]
    WHERE object_id = OBJECT_ID(N'[@BusinessArea].[uspImport_@Table]')
    AND type in (N'P', N'PC'))
    DROP PROCEDURE [@BusinessArea].[uspImport_@Table]

    --create etl-procedure
    USE [@Database]
    GO

    --@
    CREATE PROCEDURE [@BusinessArea].[uspImport_@Table]
    AS

    --get the audit_id
    DECLARE @AuditID INT
    INSERT INTO [@MetaDatabase].[dbo].[Audit] (StartDate, ModelTableID)
    VALUES( getdate(), @ModelTableID);
    SET @AuditID=@@IDENTITY

    --get number of rows and truncate table
    DECLARE @RowsDeleted INT
    SET @RowsDeleted=(SELECT Count(*) FROM [@BusinessArea].[@Table])
    TRUNCATE TABLE [@BusinessArea].[@Table]

    --insert into target
    INSERT INTO [@BusinessArea].[@Table]
    ( [AuditID]
       ,@@ColumnList
    ) SELECT @AuditID, * FROM
    [@BusinessArea].[vwImport_@Table] @DeltaWhereClause;

    -- Update Audit Table with the number of rows

    UPDATE [@MetaDatabase].[dbo].[Audit]
    SET RowsInserted=@@ROWCOUNT, EndDate=getdate(), RowsDeleted=@RowsDeleted @DeltaUpdateStatement
    WHERE AuditID=@AuditID;

  • Mon
    23
    May 11

    New features in the next release of the DWH Deck

    The next release of the DWH Deck ...

    will support templates

    In the current version of the DWH Deck all SQL statements are build based upon the metadata. Future releases will support templates that you can change to fit your data warehouse / ETL framework. Working on these templates we will provide documentation as we work along..

    will support the creation of SSIS packages

    In the current version of the DWH Deck all ETL is handled with stored procedures. The next release will also support the creation of SSIS packages to handle the ETL. Providing better Error-handling possibilities.

    and more ...

     

    Current plan is to release next version in September 2011. (updated on July 4th 2011)

  • Mon
    02
    May 11

    Adding columns to your data warehouse model

    Adding an extra column to a 'source' table is a new feature in the DWH Deck. It's very straight forward: Select Add Column in menu Source and enter the Column name and a value.

    Add Column in DWH Deck

    But how would you use this?

    Well you could misuse this feature to implement some business rules. In this example I added the column Sales amount and changed the DWH data type to money and the DWH formulae to: [Quantity]*[UnitPrice]

    Extra Column Sales Amount

    But please don't use this feature like this. Implement your business rules in the data mart area.

    Another possible use is to construct a smart key that can be used as business key.

    This feature was however intended to help you implement the integration I mentioned in my blogpost on Integration in the data vault. It's especially helpful to store role playing source system information:

    Suppose you have two or more fiscal entities and your erp system stores the general ledger entries in a separate table or even a separate database for every entity. After you've imported the source table meta data and profile you'll change the DWH name for the table and all entities are integrated in the same table. By adding a column for each table with the name 'Fiscal entity' and values like: 'ABC Holding', 'ABC Construction' etc you have create a consolidated view on your financial information and you are still capable of reporting on individual entities.

  • Tue
    19
    Apr 11

    Integration in the data vault

    And how we handle it in the DWH Deck

    One of the key characteristics of using the Data Vault modeling and methodology is the shift of a large part of the ETL work upstream in the data flow. In pictures:

    simple etl schema traditioneel datawarehouse

    Fig 1. Simple ETL schema in a traditional data warehouse

     

    simple etl schema datawarehouse with datavault

    Fig 2. Simple ETL schema in a data warehouse with the data vault architecture

    This shift comes with several advantages like:

    • auditable/traceable: we can recreate the source on a given day/time
    • flexibility: supporting (possible) multiple versions of the truth (e.g. when business rule change)

    Dan Lindstedt in his article: Data Vault series 5 - Loading practices even argues that you shouldn't touch or change the data on the way into the Data Vault, but manipulate it according to business rules on the way out - to the Data Marts. However there is still some ETL from the sources to the data vault:

    • The typical conversion into the data vault structure with hubs, links and satellites.
    • Some acceptable minor changes and default values (article 2.1.3 and 2.1.4 of the Data Vault loading specification)
    • Integration based on business keys.

    This integration based on business keys is a very powerful  aspect and sometimes overlooked. Consider the example of a large wholesaler with warehouses around the globe. By using a uniform business key for 'product' a lot of the integration already happens in a smooth way in the data vault. Creating the data marts and answering questions like what's the total stock for product xyz gets a lot easier this way.
    For more information on this subject, see the blogpost of Ronald Damhof and the discussions in the comment area.

     

    So how do we handle this integration in the DWH Deck?

    1. You can identify the business keys as described in Part 6 of the series describing DWH Deck's functionality.
    2. And make sure you use the same DWHName in de table section of the main application screen for all tables that describe the same object (e.g. Product)

    When you do this the SQL statement to create the data vault table will only be generated once. The SQL statements to create the view and ETL stored procedure to import the data will be generated for each table.

  • Mon
    18
    Apr 11

    DWH DECK Part 7: Data Mart

    Adapted for version 1.1 on 18/04/2011

    This is the seventh part of a series I'm writing on the DWH DECK. A tool that will allow you to build and maintain your data warehouse better and faster. This series will act as documentation and will give you some insides into the way the program works.

    Overview of this series

    The data mart area is the area that your user will use to query the data warehouse. The architecture is based on Ralph Kimball's dimensional modeling concept (star schema):
     
    star schema - dimensional modelling 
    There are two types of tables in the data mart area:
     

    Fact tables
    A fact table that contains the measurements associated with a specific business process. In the DWH Deck this translates to: Every link table in the data vault with measures in the associated satellites will be the basis for a fact table. A fact table will contain:

    • References to dimension tables
    • All related measures (in DWH Deck: check the Measure checkbox)
    • All related Time indicators (in DWH Deck: check the Time indicator checkbox)

     dwh deck - data mart 
     

    Dimension Tables
    Dimensions describe the objects of the business, such as employee, customer etc. In the DWH Deck this translates to: Every hub in the data vault that has a relation with afore mentioned fact tables and its related satellites.
     
    Slowly changing dimensions
    Managing changes over the time has always been a major issue in data warehousing. Kimball uses the concept of slowly changing dimensions. For every attribute/column you can enter the change type:
    Type 1. Overwrite Old Data. (Default) Use this method if you don't care about keeping track of historical values.
    Type 2. Track History. This is a powerful technique for capturing attribute values over time. In the ETL process a new row in the dimension table is added and the previous row describing the object is end dated.

  • Mon
    18
    Apr 11

    DWH DECK Part 6: Data Vault

    Adapted for version 1.1 on 18/04/2011

    This is the sixth part of a series I'm writing on the DWH DECK. A tool that will allow you to build and maintain your data warehouse better and faster. This series will act as documentation and will give you some insides into the way the program works.
    Overview of this series
     
    Many Tables in your data warehouse
    Using the "simple" demo database Northwind with only 13 source tables DWH DECK will build 57-110 tables depending on your choices. A breakdown by table type:
    13 Staging tables
    10 Hub tables
    7 Link tables
    17-70 Satellite tables
    7 Dimension tables
    3 Fact tables

    At first hand this seems to be overkill. Especially the many relative small tables in the data vault will be met skeptical by some. The reason we do this is that a data warehouse should be built with the future in mind… Your company will make major changes in their erp and crm systems… They will acquire new companies… They will start new initiatives… And management will change (their focus) …And all this will lead to changes in your data warehouse.
     
    Utilizing a technique like the data vault with many 'small' tables will give you the flexibility to adapt better and faster to these changes. This is for me the most import reason for using the data vault method. But of course there are many more. Dan Lindstedt the inventor of the data vault describes these advantages in:
    http://danlinstedt.com/datavault/about-the-data-vault/
     
    The Data Vault in essence consists of three types of tables:
     
    Hub Entities (Hubs)
    A hub is a unique list of business keys. The business key is the key that business users use in their daily operations to describe an object. (e.g. account number for general ledger accounts, the VIN (Vehicle Identification Number) of cars)
    It's not always easy to determine the business key and sometimes you will have to use more than one key.
    (e.g. ZIP Code and House number or Last Name and First Name)
     
    By default the DWH Deck assigns the primary key of a table as a business key because these are unique. But that's often not the best choice! Speak to the business users and check the profile information of each column. Columns with distinct value percentage of 100% are good candidates for the business key of the table.
    You can easily change the business key by checking and unchecking the business key checkbox:
     
     
    Link Entities (Links)
    Links represent the relationships between hubs and other links. In the DWH Deck links are created based on the relationships in the source system. If these relationships are all well-defined and build in the source system you won't have to do anything. If not there are several ways to create them. (e.g. click the button "New Primary relation" or "New Foreign relation" and selected the relevant table and column to create the relationship.
     
    Satellite Entities (Satellites)
    Satellites contain descriptive information of the hubs and links. They provide context to the hubs and links. By default all candidate satellite columns are placed in their own satellite table. The DWH Deck uses the Satellite textbox for the satellite's table name. This results in a table for almost every column.
    Often it is better to group related satellites like address into the same table. This can be accomplished by using the same name in the Satellite Group textbox. (e.g. use address for street, housenumber, zipcode, city etc)