IN2BI, Microsoft Business Intelligence
data, stories & insights
rss | email | twitter

DWH DECK Part 5: Staging

Adapted for version 1.1 on 18/04/2011

This is the fifth part of a series I'm writing on the DWH DECK. A tool that allows you to build and maintain your data warehouse better and faster. This series will act as documentation and will give you some insides into the way the program  works.

Overview of this series

Pros and cons
There are several pros and cons for and against using a staging area to populate your data warehouse. Of course it's an extra step that could be abandoned but I decided for it based on the possibility to schedule your import of several sources on convenient times and to have a nice overview of the relevant data from different sources.
 
In this part I'll describe the impact on the choices you made in the main window on the import into the staging area.

Skipping tables and columns
When you check the Skip Table or Skip Column checkbox that table or column is not going to be imported into your data warehouse.
 
Delta column
Unlike the data vault and data mart area where we will only import the new or changed rows (by checking source against target) the staging area doesn't have this mechanism. And all data from the source is loaded each time. For some very large tables this can seriously affect the performance. For these tables look for a ‘Delta column’ like LastModifiedDate. The ETL to import handles the tracking of the maximum value in this column an will only import 'new' data. Of course you have to be sure that if data changes in this table the Delta column is also updated!

DataType, DWH Data type and DWHFormule
The DWH Deck does some minor transformations based on the input in these textboxes.
Consider the example of CategoryName:
 
 
The column will be converted into the DWHDataType and the formulae will be applied in the view that is used to import the source data:
 
CREATE VIEW [sales].[vwImport_Northwind_dbo_Categories]
AS
SELECT
  [CategoryID] = CONVERT( int, ISNULL([CategoryID],0))
, [CategoryName] = CONVERT( varchar(64), LTRIM(RTRIM([CategoryName])))
, [Description] = CONVERT( varchar(max), [Description])
FROM [YOUR_SERVER].[Northwind].[dbo].[Categories] WITH (NOLOCK)
 
The default values in DWHDataType, DWHFormulae and SkipColumn are based on the source columns data type and the values in the bridge table: dbo.DWHDataTypes like:

  • Unicode data type mapping to corresponding smaller data types (nvarchar to varchar)
  • Removing leading and trailing spaces (LTRIM(RTRIM(?)))
  • Replacing NULL Values for certain data types (ISNULL(?,0))
  • Converting dates to integer with format YYYMMDD

If you don't like these default values change the entries in the table dbo.DWHDataTypes before you import the source information.

» Similar Posts

  1. DWH DECK Part 3: Importing Source Information
  2. DWH DECK Part 6: Data Vault
  3. DWH DECK Part 7: Data Mart

» Trackbacks & Pingbacks

    No trackbacks yet.

» Comments

    There are no comments.

Comments are closed