Wednesday, March 10, 2010

Devils and Angels of BI Architecture


Devil: You don't need to set up a separate star schema database


Angel: Yes you do. We don't want to  interfere with the operational systems


Devil: It's sooo much work


Angel: We want to create a simplified schema to benefit our end users


Devil: Phooeh. ETL is BORING


Devil: Just query the source system directly


Devil: What could be better than fresh data


Devil: C'mon the reports not that big


Devil: Everybody's doing it


Intrepid BI Architect: sigh...

Case in point:
 This is from an actual email exchange. Names have been changed to protect the innocent.






From: Brave and True BI Developer
Sent: Friday, March 05, 2010 9:33 AM
To: BI Technical Staff
Subject: SSRS Not Responding

Well, this is a new one.  I’m on-site at [Dunder-Mifflin] today to do a quick fix on a report and the SSRS instance seems to be refusing to actually render a report.  I click on a report that’s been untouched for months with an ultra simple SELECT behind it and the browser just spins with the “Reporting being generated” message in the middle of the browser.


From: Brave and True BI Colleague
Sent: Friday, March 05, 2010 9:34 AM
To: BI Technical Staff
Subject: RE: SSRS Not Responding

Check SQL Profiler as you run the report, verify that it is actually reaching the database (with the correct user information, etc.)


From: Brave and True BI Developer
Sent: Friday, March 05, 2010 10:00 AM
To: BI Technical Staff
Subject: RE: SSRS Not Responding

Got it.  There was a transaction open on a table that’s part of a filter on most of the reports. That’s all it was.

______________________________




[My emphasis]




This is why we go to the trouble of setting up an entirely separate system. There should never be a "transaction open on a table that’s part of a filter on most of the reports."


There are times even now when the voices of temptation ring really loud. Usually this occurs in the form of a client or colleague looking at me like I've got two heads when I tell them that we need to spend time building a separate redundant database and ETL to replace a currently running (but poorly performing or problematic) report.


So to all the intrepid BI Architects out there: fight the good fight!




Friday, February 5, 2010

The "Million Dollar" BI Architecture Diagram

Anyone who has worked with me or taken training from me has seen some variant of this diagram. It's not an original concept, although this version has been refined to include my personal experience with many Business Intelligence implementations, as well as my bias towards both the Kimball Business Dimensional methodology and using Microsoft SQL Server (especially Analysis Services) as part of successful deployments.




This diagram is conceptual, although I have designed systems that did have physical servers for each of the servers represented, and these worked very, very well.


I have lots to say about this, and will continue to do so on this blog. Perhaps the most significant concept to mention at the outset is that this is a recipe for success with Business Intelligence. Put another way, the level of success I've had with BI environments corresponds pretty closely with how well the architecture of these environments matched this diagram. (Also note that I said "a" recipe for success - not "the" recipe for success. I acknowledge that there are other approaches, but this is the one I've used successfully for over a decade.)


Two important concepts inherent in this diagram are:


a) each layer in this diagram has a primary purpose. From an architecture standpoint, designing these functional layers to do only a single thing very well creates a robust, yet simple, environment.


b) There are well-defined interfaces between each functional layer. Having clearly defined boundaries allows the processes to be moved from server to server (which enables important operational benefits like high-availability, disaster recovery, and performance tuning)


What are these primary purposes mentioned above, and how do they benefit the overall architecture?
  1. Source Systems  OLTP, Line of Business, ERP systems, and other sources - All BI systems are 100% dependent upon being able to source data from somewhere. The systems that run the business are our sources. All the rich data that we analyze and/or aggregate comes from there.
  2. Data Quality Extraction, Transformation, and Loading - Because the systems in #1 are generally concerned with running the business and not with pristine and perfect data, there must be intentional processes that clean and consolidate data from these sources. That happens in layer 2
  3. Single Vision Enterprise Data Warehouse - This is what an organization "knows" about itself, and enables a single vision of the truth. The is the consolidated, consistent historical data store, ideally becoming the "system of record" after the operational processes are no longer active. Note that under this model, this layer does not have to be tuned for interactive query performance - the primary purpose here is to act as a sound repository of enterprise information.
  4. Business Intelligence - Unified Dimensional Model - Nearly all successful implementations include a layer like this, either informally or formally. In my view, formally is the superior approach. In this layer, performance is improved by creating locally (or proximally) cached copies of information. This is where "Business Intelligence" is exposed - i.e. the metrics, KPI's that provide useful information. All presentation tools should source from this database.
  5. Presentation - By keeping the Business Intelligence in layer 4, the presentation tools really just become about delivering information to the various audiences in the format and vehicle most convenient for each. It allows different tools to be used as needed. 
BTW - I call this the "Million Dollar" BI Architecture Diagram because:

  • if you're an organization using BI this effectively, it should represent (at least) a million dollars in increased revenue or reduced cost. 
  • If you're a BI practitioner, it should be worth that in revenue or perhaps even more significantly, a million dollars in reduced stress - because it represents a plan that is proven and works.

Monday, January 25, 2010

Example Code to build and populate a Date Dimension


OK, I've found myself searching for this code a half dozen times over as many weeks. That tells me that I :
a) Need to keep better track of my source code
b) Need to share it on my blog


Caveats & Comments: 

  • This is "warts and all" utility code. It could be optimized, but for the few rows that a Date Dimension represents, why?
  • One of the few cases where Dr. Kimball and his recommends meaningful surrogate keys is in the case of the date dimension. Our keys here use a YYYYMMDD or YYYYQQ format. 
  • This builds a Date Snowflake, which provides correctly ordered keys at any level in the dimension. 
  • I usually rejoin these tables in either the DSV or a SQL View to create the structure SSAS actually consumes.
  • There is a stub for a Week Hierarchy, which sometimes is called for, and sometimes not.
  • This was built in and is compatible with SQL 2005. It runs in SQL 2008 as well. In 2008, you could change the date type from DateTime to Date and save yourself some formatting overhead down the road.









/* DDL Begin */
/****** Object:  Table [dbo].[dim_CalendarYear]    Script Date: 11/02/2009 12:34:56 ******/
CREATE TABLE [dbo].[dim_CalendarYear](
      [CalendarYearID] [int] NOT NULL,
      [CalendarYearName] [varchar](10) NULL,
 CONSTRAINT [PK_dim_CalendarYear] PRIMARY KEY CLUSTERED
(
      [CalendarYearID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

/****** Object:  Table [dbo].[dim_CalendarQuarter]    Script Date: 11/02/2009 12:34:56 ******/
CREATE TABLE [dbo].[dim_CalendarQuarter](
      [CalendarQuarterID] [int] NOT NULL,
      [CalendarQuarterNumber] [int] NULL,
      [CalendarQuarterName] [varchar](10) NULL,
      [CalendarYearID] [int] NULL,
 CONSTRAINT [PK_dim_CalendarQuarter] PRIMARY KEY CLUSTERED
(
      [CalendarQuarterID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

/****** Object:  Table [dbo].[dim_CalendarMonth]    Script Date: 11/02/2009 12:34:56 ******/
CREATE TABLE [dbo].[dim_CalendarMonth](
      [CalendarMonthID] [int] NOT NULL,
      [CalendarMonthNumber] [int] NULL,
      [CalendarMonthName] [varchar](20) NULL,
      [CalendarQuarterID] [int] NULL,
 CONSTRAINT [PK_dim_CalendarMonth] PRIMARY KEY CLUSTERED
(
      [CalendarMonthID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

/****** Object:  Table [dbo].[dim_CalendarDay]    Script Date: 11/02/2009 12:34:56 ******/
CREATE TABLE [dbo].[dim_CalendarDay](
      [TimeID] [int] NOT NULL,
      [SQLDate] [datetime] NULL,
      [CalendarMonthID] [int] NULL,
      [WeekID] [int] NULL,
 CONSTRAINT [PK_dim_Date] PRIMARY KEY CLUSTERED
(
      [TimeID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/* DDL End */


/* Populate Begin */

DECLARE  @StartSeedDate DATETIME
DECLARE  @EndSeedDate DATETIME

SET @StartSeedDate = '01 Jan 2007'
SET @EndSeedDate   = '31 Dec 2009'
                  
-- Set defaults if none are supplied
IF @StartSeedDate IS NULL
  SET @StartSeedDate = DATEADD(YYYY,-1,CONVERT(CHAR,GETDATE(),112))

IF @EndSeedDate IS NULL
  SET @EndSeedDate = DATEADD(YYYY,1,CONVERT(CHAR,GETDATE(),112))
                    
CREATE TABLE #TMP_DIMDATE (
  SQLDATE DATETIME NULL)

WHILE @StartSeedDate <= @EndSeedDate
  BEGIN
    INSERT #TMP_DIMDATE
          (SQLDATE)
    VALUES(@StartSeedDate)
    SELECT @StartSeedDate = DATEADD(DD,1,@StartSeedDate)
  END
 
-- Year
INSERT dbo.dim_CalendarYear
SELECT DISTINCT
      DATEPART(YEAR,SQLDATE) AS [CalendarYearID]
      , DATENAME(YEAR,SQLDATE) AS [CalendarYearName]
FROM   #TMP_DIMDATE
-- SELECT * FROM dbo.dim_CalendarYear

-- Quarter
INSERT dbo.dim_CalendarQuarter
SELECT DISTINCT
      DATEPART(YYYY,SQLDATE) * 100 + DATEPART(QUARTER,SQLDATE)     AS [CalendarQuarterID]
,     DATEPART(YYYY,SQLDATE) * 100 + DATEPART(QUARTER,SQLDATE)     AS [CalendarQuarterNumber]
,     CAST(DATEPART(YYYY,SQLDATE) AS char(4)) + ' Q' + CAST(DATEPART(QUARTER,SQLDATE) AS char(1))     AS [CalendarQuarterName]
,     DATEPART(YEAR,SQLDATE) AS [CalendarYearID]
FROM   #TMP_DIMDATE
-- SELECT * FROM dbo.dim_CalendarQuarter

-- Month
INSERT dbo.dim_CalendarMonth
SELECT DISTINCT
      DATEPART(YYYY,SQLDATE) * 100 + DATEPART(MONTH,SQLDATE)     AS [CalendarMonthID]
,     DATEPART(YYYY,SQLDATE) * 100 + DATEPART(MONTH,SQLDATE)     AS [CalendarMonthNumber]
,     DATENAME(YYYY,SQLDATE) + ' ' + DATENAME(MM,SQLDATE)     AS [CalendarMonthName]
,     DATEPART(YYYY,SQLDATE) * 100 + DATEPART(QUARTER,SQLDATE)     AS [CalendarQuarterID]
FROM   #TMP_DIMDATE
-- SELECT * FROM dbo.dim_CalendarMonth

-- Day
INSERT dbo.dim_CalendarDay
SELECT DISTINCT
      DATEPART(YYYY,SQLDATE) * 10000 + DATEPART(MONTH,SQLDATE) * 100 +  DATEPART(DAY,SQLDATE)     AS [TimeID]
,     SQLDATE
,     DATEPART(YYYY,SQLDATE) * 100 + DATEPART(MONTH,SQLDATE)     AS [CalendarMonthID]
,   NULL as [ShopLocalMonthID]
FROM   #TMP_DIMDATE
-- SELECT * FROM dbo.dim_CalendarDay


/* Populate End */


Friday, January 22, 2010

Is Business Intelligence your company's "Tree of Souls" ?

(Minor Avatar spoilers below - in case you're one of the 3 people who hasn't seen it yet)


I was driving to a clients this past week and just free-associating, listening to an NPR segment about the movie Avatar and also thinking about this upcoming Business Intelligence implementation.


In the movie, "Avatar", the "Tree of Souls" is a nexus where sentient beings can connect and understand the will of that planet's deity. At some point the "AllMother", through the "Tree of Souls", acts on the different populations to coordinate their efforts and accomplish something truly remarkable.


It occurred to me that that is what we are trying to do with Business Intelligence: allow people to connect to information and by virtue of the information revealed, align and influence their actions to accomplish something truly remarkable.


So you see, the movie Avatar is really about Business Intelligence.


Is Business Intelligence your company's "Tree of Souls" ?

Monday, December 14, 2009

Dec 8th, 2009 ITA BI Roundtable Presentation


On Dec 8th, I was fortunate to present an overview of SQL Server 2008 Integration Services to the Illinois Technology Association's Business Intelligence Roundtable. The actively-involved participants had many questions. Here are links to all of the prerequisite, supplemental, and follow-on resources we discussed:

Dimension Design (aka Star Schema) Methodology:



Code Examples:


While these examples are in SSIS 2005, they can be easily ported to SQL 2008. Most of the main techniques employed are still valid in SSIS 2008.

Development Utilities:

The Presentation:
  • There were only a few slides since I wanted to focus primarily on demonstrations, but you can find the PDF linked here:
If you attended and I missed posting something, please add a blog comment or email me via the contact information in the PDF.

Tom

Thursday, December 10, 2009

Creating great courseware

I was recently surveyed by Microsoft about the Microsoft Learning: Microsoft Certified Trainer program. One of the questions was about the Microsoft Official Curriculum (MOC). I've been using MOC in one form or another for about 15 years. Although I was only answering a question in the moment, I grabbed my comment about how to create courseware into a text file, and it seemed worth keeping. Rather than keep it in a text file, I'll "keep it" on my blog. My comment was: 
Creating great courseware is hard work, but not impossible: 1) Decide on learning objectives 2) Vet learning objectives with MCT's and experienced practitioners 3) CREATE LAB EXERCISES FIRST 4) Test lab exercises 5) Test them again 6) With lab exercises complete and FROZEN, composed supporting slides and course materials
I actually started writing courseware a few years before my first exposure to MOC. The method described above was how we did it. It makes sense because you want the practical exercises to support the learning objective and these are the hardest to get correct. Isn't it irritating when you are learning a new technology and the lab exercises don't work as designed?


I have noticed that there seem to be more problems with MOC courseware lately, especially with how labs work. I wonder if they follow the above methodology.


Don't get me wrong - the MOC courseware is not terrible - I still think there are a lot of good things to be learned and valuable experiences to be had in the courses I teach. But it seems to me that the overall quality used to be higher, and I'd really like to see the quality of the MOC improve. 


BTW, I don't teach every single MOC course. The above comments apply to course: 2778, 2779, 2780, 2784, 2790, 2791, 2792, 2793, 6158, 6231, 6232, 6234, 6235, 6236, 6317 and maybe a few others.

Followers