1 Dimensional Data Model Prototyping With PowerPivotSet Up Run ZoomIt Open Select Customer for Extract Demo.sql Open DealerMargin DAX.txt Open File Demo Extract Transform Steps – Blank
2 Scope Audience is BI professionals building an enterprise data warehouse solution Familiarity with SQL Server and allied technologies is assumed Familiarity with dimensional modeling techniques is assumed No previous experience with PowerPivot is required Not for power users who want to create local content Allied technologies – Excel, Access, etc. Based on Kimball approach – fact tables and dimensions
3 About Me – Eric Ness Data Warehouse Developer at C. H. Robinson Worldwide (Fortune 300 Transportation Logistics Company) Microsoft shop – use MS BI stack Used techniques on POC projects and data mart design Education Officer for DAMA-MN Earned CDMP in 2011
4 Outline Intro to PowerPivot Data Modeling ProcessImporting Data Into PowerPivot Creating Data Models In PowerPivot Business User Acceptance Proposed Data Warehouse Lifecycle
5 Introduction to PowerPivotMicrosoft In-Memory Analytics Tool Microsoft Excel Plug-In Pivot tables to view data Free download Self-service BI Combines Heterogeneous Data Sources Can analyze large volumes of data Similar level of expertise needed to Excel and Access Can be created without the help of the IT department Data compression not part of presentation Users may only see Pivot table, not all behind scenes work
6 Kimball Data Warehouse LifecycleProblem was that you had to get to the end of the first lifecycle before users could give you feedback Dimensional modeling was divorced from manipulating real data The Data Warehouse Lifecycle Toolkit, Ralph Kimball, Margy Ross, and Warren Thornthwaite, Wiley, 2008.
7 Reasons For PrototypingQuick Feedback From Users Users see results and can confirm needs met Discover data quality issues faster Identify all data sources and confirm connectivity Can start with next round of business requirement elicitation once prototype is accepted Can make changes and experiment quickly Promotes agility and iterative data warehouse construction Most work on back end, users don’t see Scrum story – speaker ??? DQ – may just maps one field to another without examining Too many nulls Data integrity issues If can connect with PowerPivot then can connect with SSIS Iterations shrink from six months down to weeks
8 Why was prototyping not common?Needed to build Extract-Transform-Load (ETL) to extract data from source systems Needed to create data structures in staging and presentation area to store data Changes to ETL or table design required much re-work Major impediment for previous prototyping was simulating ETL process
9 PowerPivot Simulates ETLExtract Text files Excel Worksheets SQL Server Transform DAX Load Full Refresh Any data source that supports ODBC DAX is expression language Do calculations and manipulations on imported data Load step isn’t needed because PowerPivot like doing full refresh of tables No merge necessary Restrict data set brought in for performance
10 ETL Steps Demo Import DimProduct.txt Import DimWeather from WorksheetImport DimCustomer from SQL Server Create DealerMargin Calculation Open File Demo Extract Transform Steps – Blank Open PowerPivot window PowerPivot 2012 vs 2008 R2 Show list of data connectors Data exported from AdventureWorks2012 Home - Import From Text – Select Data\DimProduct.txt – Tab separator – Show column selection – Rename DimProduct Data in Tab Can’t change data in PowerPivot tab Return to Excel Window – Go to DimWeather tab – Create Linked Table – Rename to DimWeather – Show Update All for refresh Go to PowerPivot window – Design tab – Existing Connections – Select localhost connection – Open – Write a Query – Copy and Paste from Select Customer for Extract Demo.sql – Rename DimCustomer – Finish Rules similar to views Can’t use temp tables DimProduct tab – Change data types of ListPrice and Dealer Price to Currency - Add new column – Copy and paste from DealerMargin DAX.txt
11 PowerPivot Simulates Data StoreCreate tables inside of PowerPivot Includes rudimentary data types Specify relationships (foreign keys) Always many-to-one From fact table to dimensions Doesn’t include all SQL Server data types
12 Data Model Gotchas Dimensions need to be clean otherwise PowerPivot won’t create relationship Role Playing Dimensions Can only have one active relationship between tables Can be modeled by importing the same dimension multiple times Can also use USERELATIONSHIP() function as part of measures Only appropriate for prototyping dimenionsal models Doesn’t support relational model Example Order, Ship and Deliver dates
13 Demo Creating composite key with DAX Using diagram view in PowerPivotOpen Demo Fact Table Filtering – Show filtering for Without Promotion – Show for With Promotion Choose all Touring-3000 Products, Austrailia country, one of promotion names Open Demo ProductSubCategory Composite Key – Show composite key in Dimension – Show composite Key in Fact – open diagram view – drag key from fact to dimension Describe calculation
14 Business User AcceptanceFinal step before beginning to build next iteration of data warehouse Display final prototype to users in a pivot table Gives users a good idea of what data will be available Can slice and dice in ad hoc way Can generate new ideas for next iteration of data warehouse development
15 Demo Viewing More Complex Data Model Inserting Pivot TableOpen Demo Diagram View – Pivot Table – Open PowerPivot Window – Go to diagram view – show more complex data model – Go back to data view - insert pivot table – play with data Format may not be as nice as final report Includes all dimensions Includes all measures
16 Importing Prototypes into SSAS 2012 – Just Say NoPrototype may be snowflaked May be using DAX calculations to simulate measures and composite keys No ETL exists to do key lookups and data transformation Data source will be relational data warehouse, not source systems Want to do DAX calculations in staging area Want DW layer between source systems and presentation layer
17 Proposed DW Lifecycle Elicit business requirementsIdentify data sources Assess data quality Build data model prototype using PowerPivot User sign-off Build data warehouse ETL and data store using SQL Server, SSIS and SSAS Use completed DW as source for future PowerPivot iterations
18 Proposed DW Lifecycle
19 Review Intro to PowerPivot Data Modeling ProcessImporting Data Into PowerPivot Creating Data Models In PowerPivot Business User Sign Off Proposed Data Warehouse Lifecycle
20 Thank You Eric Ness, Questions?
21 Additional Material
22 Examples From CH RobinsonUsed to create POC data models for two projects Navigator Metrics Many choices for fact table grain Needed to support Qlikview in-memory analytic tool Account Management Combine old and new data warehouses Business requirements unclear Give business users real numbers for one financial quarter to validate business rules NM Had to choose between one fact table supporting multiple grains or three fact tables Would support all measures and aggregations needed by Qlikview AM
23 Data Modeling: Design vs. DescribeData modeling is a design process – no fixed solution Graeme Simsion “Data Modeling: Theory and Practice”, Technics Publications Especially in dimensional modeling multiple possible solutions Fields in fact or dimensions Mini-dimensions Fact table grain Can use PowerPivot to play with possibilities Describe vs. design Typical process Gather requirements then build data model Assuming one correct model that will work when built creating a lot of risk Iterate through failures faster and with less rework
24 Tips For Designing Fact TablesCreating fake measure values using either SQL or DAX Can be used if heavy duty processing needed to calculate real values Need to communicate to business users that the data is only for testing Experiment with different grains Using Excel spreadsheet with filters How to leverage features of PowerPivot Create DAX measure – similar to Dealer Margin in previous example
25 Tips For Designing DimensionsKeys Use CheckSum/HashBytes to create surrogate keys in T-SQL Can be called on either strings or numbers CheckSum unique enough for the purposes of a prototype Composite Keys Used when there is a one-to-many relationship with sequence numbers Can be created using DAX Example OrderID OrderLineNumber Keys are main issue because there’s no ETL to do key assignments Need to reverse strings, ‘KS’ ‘NC’ have same checksum. SELECT CHECKSUM('KSSK'), CHECKSUM('NCCN') Looks like you can use HASHBYTES with MD5 algorithm to get unique values, then cast to BIGINT.
26 More Tips For Designing DimensionsSnowflaking okay in prototype Dimensions may come from different data sources Should be combined in final data model Kimball discourages snowflaking Says put all hierarchies in one dimension
27 Bridge Table Data Model PatternVery complex in PowerPivot Need to use DAX
28 Bridge Table Links Links