IN2BI, Microsoft Business Intelligence
data, stories & insights
rss | email | twitter
  • Thu
    03
    Nov 11

    SQL Server 2012 Editions and Licensing

    image

    Microsoft published the licensing options and prices. Two aspects stand out:

    • A new edition: Business Intelligence (between Standard- and Enterprise Edition)
    • A move to core based licensing (from processor based)

    The main editions and the differences in functionality:


    Standard Edition:

    • Basic OLTP
    • Basic Reporting & Analytics
    • Programmability
    • Manageability
    • Basic High Availability


    Business Intelligence Edition:

    Standard Edition +:

    • Data Quality Services
    • Master Data Services
    • Power View
    • PowerPivot for Sharepoint Portal Server
    • Semantic model, advanced analytics


    Enterprise Edition

    Business Intelligence Edition +:

    • Advanced Security
    • Advanced High availability
    • ColumnStore indexing
    • Compression
    • Partitioning

     

    US Pricing:

    This is an overview of the expected prices:

    image

  • Mon
    24
    Oct 11

    SQL Server 2012 (Denali)

    Microsoft made significant Business Intelligence investments in SQL Server since 2000.

    The next (2012) release will also include many new BI features. In the next posts I will explore some of these features. This post will act as an overview of / index to these articles:

    • …..
  • Sat
    22
    Oct 11

    Sankey Diagrams in Google Analytics

    According to Visualization Expert Edward Tufte This Sankey diagram is “Probably the best statistical graphic ever drawn”

    image

    It tells the story of the losses suffered by Napoleon's army in the Russian campaign of 1812. Beginning at the Polish-Russian border, the thick band shows the size of the army at each position. The path of Napoleon's retreat from Moscow in the bitterly cold winter is depicted by the dark lower band.

    Google recently announced an addition to Google Analytics that uses the same visualization technique to show you how your users move across you site. The call it Flow visualization.

    Some pictures:

    image

    Browser – Product Catalog – Shopping Cart – Login – Order

    image

    Country (France selected) - Product Catalog – Shopping Cart – Login – Order

    A very nice addition with which we (website owners) get to play within the next weeks… Looking forward to that.

  • Mon
    17
    Oct 11

    TSQL Metadata Queries

    When you want to query SQL Server for the metadata you basically have two options:

    • Use the Catalog views
    • Use the Information Schema Views

    The last one are based on catalog view definitions in the ISO standard. You typically use them if you need to support more database systems.

    The first is one is recommended by Microsoft.

    In this blogpost I recapitulate some queries that I have used in the past and will probably use in the future.

    1. Get column information from all tables

    -----------------------------------------------------------
    -- Query: GetColumnInfoFromAllTables
    -- By Marco Schreuder 27-09-2011
    -----------------------------------------------------------
    SELECT SchemaName = SCHEMA_NAME(T.schema_id)
       ,
    TableName = T.name
      
    ,ColumnName = C.name
      
    ,ColumnID = C.column_id
      
    ,DataType = ty.name
      
    ,MaxLength = C.max_length
      
    ,PRECISION = C.PRECISION
      
    ,Scale = C.scale
      
    ,CondensedDataType=(CASE
          
    WHEN (ty.name IN ('char','nchar','varchar','nvarchar') AND C.max_length=-1)
                  THEN ty.name + '(MAX)'
          
    WHEN (ty.name IN ('char','nchar','varchar','nvarchar') AND C.max_length>0)
                 THEN ty.name + '(' + CONVERT(VARCHAR(20),C.max_length) + ')'
          
    WHEN ty.name IN ('decimal','numeric')
                 THEN ty.name + '(' + CONVERT(VARCHAR(20),C.PRECISION) + ',' +
                      
    CONVERT(VARCHAR(20),C.scale) + ')'
          
    WHEN ty.name IN ('datetime2','datetimeoffset')
                
    THEN ty.name + '(' + CONVERT(VARCHAR(20),C.scale) + ')'
          
    ELSE ty.name
          
    END )
       ,
    IsIdentity = c.is_identity
      
    ,IsNullable = c.is_nullable
      
    ,ModifiedDate = T.modify_date
    FROM sys.tables T
    INNER JOIN sys.columns C ON C.OBJECT_ID=T.OBJECT_ID
    INNER JOIN sys.types ty ON C.system_type_id=ty.user_type_id
    ORDER BY SCHEMA_NAME(T.schema_id),T.Name,C.column_id

    This will return this result set:

    image

    Notice the join between sys.types and sys.columns on type.user_type_id. Otherwise you would get duplicate rows if there are user types defined.

    2. Get column information from all views

    Almost the same as 1. Replacing sys.tables with sys.views:

    -----------------------------------------------------------
    -- Query: GetColumnInfoFromAllViews
    -- By Marco Schreuder 27-09-2011
    -----------------------------------------------------------
    SELECT SchemaName = SCHEMA_NAME(V.schema_id)
       ,
    ViewName = V.name
      
    ,ColumnName = C.name
      
    ,ColumnID = C.column_id
      
    ,DataType = ty.name
      
    ,MaxLength = C.max_length
      
    ,PRECISION = C.PRECISION
      
    ,Scale = C.scale
      
    ,CondensedDataType=(CASE
          
    WHEN (ty.name IN ('char','nchar','varchar','nvarchar') AND C.max_length=-1)
                
    THEN ty.name + '(MAX)'
          
    WHEN (ty.name IN ('char','nchar','varchar','nvarchar') AND C.max_length>0)
                
    THEN ty.name + '(' + CONVERT(VARCHAR(20),C.max_length) + ')'
          
    WHEN ty.name IN ('decimal','numeric') THEN ty.name + '(' + CONVERT(
                
    VARCHAR(20),C.PRECISION) + ',' + CONVERT(VARCHAR(20),C.scale) + ')'
          
    WHEN ty.name IN ('datetime2','datetimeoffset')
                
    THEN ty.name + '(' + CONVERT(VARCHAR(20),C.scale) + ')'
          
    ELSE ty.name
          
    END )
       ,
    IsIdentity = c.is_identity
      
    ,IsNullable = c.is_nullable
      
    ,ModifiedDate = V.modify_date
    FROM sys.views V
    INNER JOIN sys.columns C ON C.OBJECT_ID=V.OBJECT_ID
    INNER JOIN sys.types ty ON C.system_type_id=ty.user_type_id
    ORDER BY V.Name,C.column_id

    Resulting in:

    image

    3. Get details of foreign key constraints

    -----------------------------------------------------------
    -- Query: GetFKDetails
    -- By Marco Schreuder 27-09-2011
    -----------------------------------------------------------
    SELECT FKName = f.name
      
    ,FKSchema = OBJECT_SCHEMA_NAME(f.parent_object_id)
       ,
    FKTable = OBJECT_NAME(f.parent_object_id)
       ,
    FKColumn = COL_NAME(f.parent_object_id,fc.parent_column_id)
       ,
    RefSchema = OBJECT_SCHEMA_NAME(f.referenced_object_id)
       ,
    RefTable = OBJECT_NAME(f.referenced_object_id)
       ,
    RefColumn = COL_NAME(fc.referenced_object_id,fc.referenced_column_id)
       ,
    ConstraintColumnID = fc.constraint_column_id
      
    ,ModifiedDate = f.modify_date
    FROM sys.foreign_keys f
    INNER JOIN sys.foreign_key_columns fc
      
    ON f.OBJECT_ID = fc.constraint_object_id

    returns:

    image

    4. Get details of the indices created on user tables

    -----------------------------------------------------------
    -- Query: GetIndexDetails
    -- By Marco Schreuder 27-09-2011
    -----------------------------------------------------------
    SELECT IndexName = i.name
      
    ,SchemaName = OBJECT_SCHEMA_NAME(i.OBJECT_ID)
       ,
    ObjectName = OBJECT_NAME(i.OBJECT_ID)
       ,
    ColumnName = COL_NAME(i.OBJECT_ID,ic.column_id)
       ,
    IndexType = i.type_desc
      
    ,IndexID = i.index_id
      
    ,IndexColumnID = ic.index_column_id
      
    ,KeyOrdinal = ic.key_ordinal
      
    ,IsIncludedColumn = ic.is_included_column
      
    ,IsUnique = i.is_unique
      
    ,IsPrimaryKey = i.is_primary_key
    FROM sys.indexes i
    INNER JOIN sys.index_columns ic
      
    ON i.OBJECT_ID = ic.OBJECT_ID AND i.index_id=ic.index_id
    --limit to indexes on user tables
    INNER JOIN sys.tables t
      
    ON i.OBJECT_ID = t.OBJECT_ID

    which returns:

    image

  • Wed
    12
    Oct 11

    Biml versus CodeSmith and MyGeneration

    After a recent conference a participant pointed me to CodeSmith and MyGeneration as alternatives for BimlScript. And although I don’t have any experience with these tools from their website I understand he is right. You could use these tools in a similar fashion.

    Both are generic code generators and given a document with a certain xml or other format they’ll be able to loop through a dataset and inject values to the original document. So you could use a certain ssis package… Investigate the xml… Add placeholders… Loop through a dataset … replace the placeholders in the ssis package with the information from your dataset … and save the newly created documents.

    One important disadvantage of this approach is the scrutiny of the rather complex xml in an ssis package. Biml offers an easier/cleaner way to describe a package because it doesn’t need to hold design aspect of the BIDS interface of the package.

    An example of Biml with BimlScript:

    image

  • Tue
    11
    Oct 11

    Data Vault Automation Conference

    in2bi was one of the sponsors of the Data Vault Automation Conference which was held in Utrecht on October 6th.

    Data Vault has become pretty popular in the Netherlands the last years. Due to the rigid modular approach of Data Vault modeling source data can easily be divided in the three standard table types with standard load mechanisms which lead to easy automation.

    Both aspects contribute to a lot of innovation in the Netherlands with respect to data warehouse automation.

    During the conference more than 100 participants where kept up to date by:

    You can download the slides of my session by clicking on the powerpoint logo.PowerPoint

  • Mon
    10
    Oct 11

    SQL Bits 2011

    SQLBitsLogo

    From 29.09.2011 until 01.10.2011 I visited SQL Bits 9 in Liverpool. I had a great time: beautiful weather, well organized event and some great sessions… and I met some great- and nice people.

    In my own session “Creating a Meta Data Driven SSIS Solution with Biml” I tried to cover to much ground. I’ll try not to make that mistake again and focus more in depth on fewer aspects next time.

    You can download the slides of my session by clicking on the powerpoint logo.PowerPoint

  • Mon
    05
    Sep 11

    Dimensional modeling and Data Vault – a happy marriage?

    A lot of emphasis on the Internet lately has been placed on the difference between both methods. This blogpost from Dan Linstedt and this discussion on the Kimball forums are examples of this strange focus.

    A focus I don’t totally understand since both techniques serve different purposes and can happily live together in most data warehouse solutions:

    • Data Vault is all about the back-room, efficiently collecting, integrating and preserving data from the source systems.
    • Dimensional modeling is all about the front-room, publishing the organization’s data assets to effectively support decision making.

    From a Dimensional modeling (Kimball) perspective Data Vault is an implementation of a persistent staging area. Problems that some people have with this view/name seem to boil down to the definition of what a data warehouse is. I personally don’t have a problem with this name nor do I consider it to be less important. And yes it will qualify as a data warehouse.

    Furthermore Dimensional modeling is generally acknowledged to be the best way to publish data to the business users and has a wide-spread adoption. A focus for further acceptance of Data Vault could be that it is the best implementation of a persistent staging area.

    So let’s try to make this case … but first:

    What are the reasons for having a persistent staging area?

    Well …first of all it could be a demand from auditors or data governance initiatives. Possible driven by external regulations.(Sarbanes-Oxley, Basel I, Basel II, HIPAA, ..)

    A second reason which can be tied to the first has to do with keeping control of / reporting on data quality issues in the source systems and thus:

    • identify possible improvements in processes. (e.g. same data entered twice)
    • increase / decrease confidence in the data, information and decisions

    The third reason has to do with agility. The ability to respond to changes rapidly. Small changes like changing a dimension attribute from type 1 to 2 or adding additional attributes to a dimension. But also large changes like big organizational changes, mergers, new ERP implementations. By having a persistent staging area (preferably integrated on business keys) it’ll be far easier to respond to these changes and rebuild the data marts (with history). Resulting in quicker, better reporting after such a change.

    Of course you need to balance these advantages against the extra costs. Before we take a look at these extra costs and other raised disadvantages let’s examine some …

    Similarities between Data Vault and Dimensional modeling

    Let’s first focus on the methodology of both approaches. They both advocate a bottom-up iterative implementation: Identify scope, select business area, build the solution and .. focus on a new iteration.

    Both use a limited number of strict table types with a well-defined function: Dimension- and fact tables in Dimensional modeling and hub-, link- and satellite tables in Data Vault modeling. Where a hub tables and its satellites typically roll-up into a dimension and a link tables and its satellites typically roll-up into a fact table.

    Another “fit” is the use of a volatile staging area as a synchronization- and recovery point.

    Focus on the extra costs

    Some would argue that since you are building two data warehouses you’ll end up with twice the costs. That’s seems far fetched … remember both use a staging area and the existence of the Data Vault area will make it easier to construct the Dimensional model on top of that. Furthermore Data Vault modeling lends itself (as does Dimensional Modeling to some extent) to automation, which greatly reduces costs of building and maintaining the data warehouse solution.

    Others complain about the extra storage that is needed to support this solution. Why store an extra copy of the source? Well I hope the described reasons for having a persistent staging area answer these questions. However you should of course take the extra storage costs into account when evaluating your choices. But please look at opportunity costs: Do you need an extra DBA or an extra SAN to support the extra layer?

    A third complaint has to do with query performance. It is suggested that the extra joins introduced with Data Vault modeling will impact query performance. This of course depends on size, hardware, database and indexing strategy. But even if there is a significant impact in most cases it won’t have a major effect. After all we are only using these queries to populate the Dimensional model and this will only apply to new and changed rows.

    Conclusion

    Consider the combination of both methods when you are (re)building your data warehouse. In this blogpost I have given you some handles to evaluate this option. Do you need to keep control on data quality issues or do you want the agility to respond to changes quicker? Can you quantify these advantages? Compare them to the extra costs and make a better informed decision.

    And of course you can cont(r)act me if you want any help in this evaluation.

  • Mon
    22
    Aug 11

    SQLBits - Query across the Mersey

    I am very pleased my session "Creating a Meta Data Driven SSIS Solution with Biml" has been selected for SQLBits this year. And I liked to thank the voters and the organization to make this possible.

    SQLBits will be the largest SQL Server Conference in Europe this year with a lot of great technical sessions lined up on Friday September 30 and Saturday October 1. So if you didn't already do so register now and meet me in Liverpool.

    My talk on ETL Automation using Biml will expand on my earlier blog series on Creating a Meta Driven SSIS Solution with Biml.

     

    Liverpool_Waterfront_by_Night

  • Sat
    25
    Jun 11

    Creating a Meta Data Driven SSIS Solution with Biml - 5. Creating the Master Package

    In this last post in a series of 5 on creating a meta data driven ssis solution I''m going to create the master package. The master package will start the other table packages in the right order.

    10. Deploy the Table Packages

    Before you'll create the master package publish the packages to sql server:

    • In the Solution Explorer (BIDS) right click the project and select Properties from the context menu.
    • Select Deploy under Configuration Properties in the treeview.
    • Change the DeploymentType to SqlServerDestination and the DestinationServer to localhost.

    Project property Pages

    • In the Solution Explorer (BIDS) right click the project and select Deploy from the context menu.

     

    The Master Package

    I will group the Execute Package Tasks in Sequence Containers that correspond with the PackageGroup in the SsisPacakges Table.

    Final result will look like this:

    Masterpackage

    The Biml script

    The Biml Script uses two connection: one to the meta database and one to the MSDB database where the packages are stored. And before we loop through the rows in the SsisPackages table and create the tasks I will use Package Configurations to handle future deployment issues. To create this package I'll use the following script:

    <#@ template language="C#" hostspecific="true"#>
    <#@ import namespace="System.Data" #>
    <Biml xmlns="http://schemas.varigence.com/biml.xsd">
          <Connections>
                <OleDbConnection Name="metaDB"
                                       ConnectionString="Data Source=.;Initial Catalog=MyDwh_meta;Provider=SQLNCLI10.1;Integrated Security=SSPI;"
                                       CreatePackageConfiguration="true"></OleDbConnection>
                <OleDbConnection Name="msdb"
                                       ConnectionString="Data Source=.;Initial Catalog=msdb;Provider=SQLNCLI10.1;Integrated Security=SSPI;"
                                       CreatePackageConfiguration="true"></OleDbConnection>
          </Connections>
          <Packages>
                <Package Name="MasterPackage MyDwh" ConstraintMode="Linear" AutoCreateConfigurationsType="None">
                      <!--Package Configurations-->
                      <PackageConfigurations>
                            <PackageConfiguration Name="MetaDB" >
                                 <EnvironmentVariableInput EnvironmentVariable="Northwind_Config"></EnvironmentVariableInput>
                                 <ConfigurationValues>
                                       <ConfigurationValue DataType="String"
                                                                     PropertyPath="\Package.Connections[metaDB].Properties[ConnectionString]"
                                                                     Name="metaDB"
                                                                     Value="Data Source=.;Initial Catalog=ROCWB_MIS_config;Provider=SQLNCLI10.1;Integrated Security=SSPI;">
                                       </ConfigurationValue>
                                 </ConfigurationValues>
                            </PackageConfiguration>
                            <PackageConfiguration Name="msdb" ConnectionName="metaDB">
                                 <ExternalTableInput Table="[dbo].[SsisConfiguration]" />
                            </PackageConfiguration>
                      </PackageConfigurations>
                      <!--Tasks-->
                      <Tasks>
                            <# string sPackageGroup=""; #>
                            <# string IsFirstRow="True"; #>
                            <# string sConn = "Provider=SQLNCLI10;Server=.;Initial Catalog=MyDwh_meta;Integrated Security=SSPI;"; #>
                            <# string sSQL = "SELECT PackageGroup, PackageName FROM SsisPackages"; #>
                            <# DataTable tblPackages = ExternalDataAccess.GetDataTable(sConn,sSQL); #>
                            <# foreach (DataRow row in tblPackages.Rows){ #>
                            <# if(sPackageGroup!=row["PackageGroup"].ToString()) { #>
                                 <# sPackageGroup=row["PackageGroup"].ToString();#>
                                 <# if(IsFirstRow=="True") { #>
                                       <# IsFirstRow="False"; #>
                                       <!--Start new container-->
                                       <Container Name="SEQC - <#=row["PackageGroup"]#>" ConstraintMode="Parallel">
                                             <Tasks>
                                 <# }else { #>
                                       <!--Close container and start new container-->
                                             </Tasks>
                                       </Container>
                                       <Container Name="SEQC - <#=row["PackageGroup"]#>" ConstraintMode="Parallel">
                                             <Tasks>
                            <# }}#>
                                       <ExecutePackage Name="EPT - <#=row["PackageName"]#>">
                                             <SqlServer ConnectionName="msdb" PackagePath="\<#=row["PackageName"]#>" />
                                       </ExecutePackage>
                            <# }#>     
                                 <!--Close container-->
                                 </Tasks>
                            </Container>
                      </Tasks>
                </Package>
          </Packages>
    </Biml>

     

    11. Final steps: Create the masterpackage

    • In Solution Explorer (BIDS) right click the project and select Add New Biml File
    • Rename the Biml File to Create master package.biml
    • Copy the biml script above to the new biml file
    • In Solution Explorer (BIDS) right click the Create master package.biml file and select Generate SSIS Packages.

     

    Final thoughts

    In this series we created a complete SSIS Solution with

    • a logging and lineage method
    • package configurations to easily port the package to another environment (development, test, production)
    • an enforced standard way to import data: (Try Bulk-load, Try any error row RowByRow, and write any remaining errors to an error file)
    • a master package that can start the packages in the correct order

    You could easily change this to fit your needs...