I have had the pleasure (or pain, depending on your point of view) of working on several large data warehousing projects over the years and I have found that there are many key ingredients to a successful project. Most of these are well documented by Ralph Kimball and Bill Inmon so I won't reiterate those here. However, as an architect I have come to really value the role of the Business Analyst in these projects. It is a no brainer that the Business Analyst is the go-to person for requirements but I believe that there should be a closer tie in the project between the architect(s) and the analyst(s). Often, we architects tend to guide questions by our knowledge of the data and infrastructure. This by itself may not lead to the best solution. While I have learned to put on the "business" hat over the years, having an analyst (or more than one) on the project who can focus on asking questions from a business perspective not only frees me (the architect) up to focus on making a scalable, maintainable and extensible solution. The resulting brainstorming with the analysts fosters an environment where we can say "how do we solve for this?" instead of "we can't do that because the data...." Most questions can be answered and most problems can be solved. Having the analysts and architects working closely on the same team (not just treating the analyst as a "customer") results in a solution that answers the business's questions more completely while still maintaining a good solution. BTW - every team needs a Workdog!
Many legacy Analysis Services (or OLAP Services) implementations use ProClarity as a querying tool and publishing framework. The server infrastructure of ProClarity allows crafty implementers to push much of the customization in KPI’s and additional measures into the “Presentation” layer of the infrastructure. This is easier to manage and change in many cases than creating named sets and calculated members in the cubes as this generally takes a SSAS administrator.
With Microsoft purchasing ProClarity and its inclusion into PerformancePoint leads to a quandary for many solution designers. While ProClarity still works well as it was designed and is available as a report type in PerformancePoint, the ProClarity views have a very different look and feel than the native PerformancePoint report views. Also, due to the fact that the report views are basically ProClarity Web Standard views, the drill-from-anywhere-to-anywhere capabilities are not available directly though the PerformancePoint interface for these views. Additionally, the library of calculated measures and sets created and published in ProClarity are not available to PerformancePoint and thus cannot be used in scorecards and report views.
In many cases, the optimum solution here is to build these ProClarity sets and measures into their respective cubes as named sets and calculated members. CAVEAT – this is not a silver bullet solution. There is a cost in terms of SSAS processing and resources for named sets and calculated measures so this should not be done without ample testing prior to implementation in production environments.
Let’s first talk about calculated measures. Existing ProClarity solutions may have a laundry list of calculated measures in their libraries. Many times these may contain nested measures (i.e. a calculated measure that uses another calculated measure) and as these get more they tend to not perform as well as desired. First, it is important to identify which measures will get heavy use in other applications such as PerformancePoint (both scorecards and reports) and Excel. Those that have a wide range of applicability justify their creation and maintenance as calculations in their respective cubes. These calculation (and set) definitions can be extracted from the ProClarity repository database for those of you who are familiar with it. This is not a “recommended” approach in much the same way as using the “sys” tables in production code for SQL Server is discouraged. As a rule, I like to take these calculations and run through extensive performance analysis using Query Analyzer and Profiler. For instance, let’s convert a calculation that reflects sales growth over the prior week:
[Measures].[Sales]/paralleperiod([Date].[Week].[Week],1,[Date].[Week].currentmember)
The query would look something like:
With member [Measures].[Sales Growth] as ‘[Measures].[Sales]/paralleperiod([Date].[Week].[Week],1,[Date].[Week].currentmember)’
Select [Org].members on rows
,[Date].[Week].&[01/01/2007 – 01/07/2007] on columns
From sales
Where [Measures].[Sales Growth]
Make sure to test the query using dimension that can take advantage of your cube partitioning strategy (in this case [Date]). Test the query with the measure as a filter, on rows and on columns. Checking profiler to make sure that the query does not cause a scan of all process partitions or excessive subcubing. Once you have settled on a calculation formula that performs well in these cases it is now time to create the calculated member through Business Intelligence Studio. When you create the calculated member in the cube, make sure you define the format and especially the non-empty behavior. If the calculation involves division, I tend to use one of the measures from the denominator as the non-empty behavior measure. I realize that this may not always be feasible if the resulting non-empty behavior is not correct so do not do this thoughtlessly.
Next, let’s talk about named sets. Many times users and/or publishers have created dimensional sets in ProClarity to ease reporting and facilitate exception-based metrics (I will talk to this in a later post). Putting these into the cube involves much the same process as the calculated measures. First I test the named set through Query Analyzer watching with Profiler. Let’s consider a named set for a dimension Sales Type that identifies gift card sales:
{[Sales Type].[Sales Type].&[Gift Card], [Sales Type].[Sales Type].&[Vendor Card]}
Next we use this in the query:
With set [Gift Card] as ‘{[Sales Type].[Sales Type].&[Gift Card], [Sales Type].[Sales Type].&[Vendor Card]}’
member [Measures].[Sales Growth] as ‘[Measures].[Sales]/paralleperiod([Date].[Week].[Week],1,[Date].[Week].currentmember)’
Select [Org].members on rows
,[Date].[Week].&[01/01/2007 – 01/07/2007] on columns
From sales
Where ([Measures].[Sales Growth], [Gift Card])
Move the set around in the query to test it from all angles and try using it as part of an aggregate measure as well:
With set [Gift Card] as ‘{[Sales Type].[Sales Type].&[Gift Card], [Sales Type].[Sales Type].&[Vendor Card]}’
member [Measures].[Gift Card Sales] as ‘sum([Gift Card],[Measures].[Sales])’
Select [Org].members on rows
,[Date].[Week].&[01/01/2007 – 01/07/2007] on columns
From sales
Where ([Measures].[Gift Card Sales])
If the results are satisfactory then create the set in the cube through Business Intelligence Studio. Keep in mind that sets are cached in memory for each connection so extremely large sets are not recommended in some cases as are too many sets.
I tend to define the named sets at the top of my calculations definition so they can be used in all the subsequent calculations, if necessary. The sets and calculations created in Business Intelligence Studio are appended to the cube definition as an MDXScript so order of creation in the Studio is important as you must define parent sets and calculations before their dependent sets and calculations. Adding and modifying these generally does not require cube reprocessing but I have seen it happen in rare instances. Make sure you test the cube modifications first in a non-production environment to make sure that they do not require reprocessing. You wouldn’t want to do something that would cause reprocessing the cube partitions as the source data may have been purged and unrecoverable (hopefully you have planned for this with a good archive process). Since named sets are cached in memory and calculated members are generally resolved in memory, often the combination of these results in better query performance. Using named sets in calculated measures often requires aggregate functions (such as SUM) which have a higher CPU requirement so consider the calculation mdx definition carefully to find the correct balance of query performance and scalability considering your server resources. On last advantage of calculated measures is that the mdx definition is static so you lessen the likelihood of the same measure being calculated multiple, different ways which can not only end up with different results but vastly different queries and performance.
With these high-use sets and calculations defined in the cube(s) in an optimum way they are now available to all applications that connect to the cube such as PerformancePoint, Excel and Reporting Services as well as ProClarity and the resulting query performance should be more predictable than when they were in the ProClarity repository and assembled on the fly by ProClarity.