Friday, July 24, 2015 #

DAX Studio 2.2.6 released

Just release an update to DAX Studio bundling up the following fixes and enhancements:

  • Support for the upcoming release of Power BI Desktop
  • Dynamic Ribbon resizing
  • Remembering of Window size and location
  • fixing display of hierarchies in Excel 2013
  • fixing quoting of keywords in table names
  • fixing an error when duplicate column names are returned in a query
  • fixing an issue when changing databases when connected to a Multi-Dim server
  • fixing an issue when connecting as a non-admin

I’m particularly happy with the dynamic resizing that is part of the Fluent.Ribbon control. It took a bit of effort to figure out, but it provides a much better experience than having to scroll the ribbon when window size is reduced.

So on a wide screen display you still get the full ribbon in all it’s glory

image

Then on an older 1024px wide screen a few of the labels in the Edit and Find groups drop off

image

Then things dynamically shrink then the groups finally start to collapse into dropdown menus. So if you want to go split screen on a small display everything still works even at 500px wide without needing to scroll the ribbon.

image

Posted On Friday, July 24, 2015 7:27 AM | Comments (0)

Wednesday, June 24, 2015 #

Announcing DAX Studio 2.2.0

Today we released DAX Studio 2.2 which includes a number of big enhancements.

You can download the new release from http://daxstudio.codeplex.com/releases

Below is an outline of the new functionality you will find in this release.

Intellisense

DAX Studio now has intellisense support, it's a fairly basic implementation, but I think I've got it working fairly smoothly. Note that it’s based on the best practice of always prefixing columns with the tablename and never prefixing a measure with a table name. So if you type ‘table name’[  you will get intellisense for all the columns in the ‘table name’ table. But if you just type [ you will be presented with a list of all the measures in the model.

The intellisense brings up table, column and measure metadata information from the current model:

image (2)

As well as functions and keywords, including the descriptions for functions. The function metadata is coming from the DICOVER_FUNCTIONS dmv so it should accurately reflect the capabilities of the current data source (So you will see some additional functions when connected to newer data sources such as Power BI Designer):

image (1)

Tracing (Query Plans & Server Timings)

One of the most significant new features is support for the tracing features (Server TIming & Query Plans) against PowerPivot. This has involved a significant amount of work re-building the internal code which handles this, but the end effect is that the Query Plan and Server Timings buttons are now enabled when connected to a PowerPivot model. 

image

We've also altered the layout of the server timings information

image

I found when I was reading the results that I was always comparing Total Duration vs Total CPU and comparing the ratio of these two figures, so I've added a little "parallelism factor" under the Total CPU. I also found that I was looking to compare FE vs SE duration and the number of SE Queries to the number of SE Cache hits, so I've moved them onto the same lines.

Support for Multi-Dimensional SSAS Servers

Although it’s a little bit of a novelty, DAX Studio now properly supports connecting to multi-dimensional servers. This kind of worked previously if the default database for your multi-dimensional instance only had a single cube, but threw an error if it had multiple cubes. I’m now injected in a CUBE= parameter in the connection string when connected to a multi-dimensional server so this should no longer be an issue.

Support for Multiple Power BI Designer instances

We now have support for multiple instances of Power BI Designer, so if you have more than one copy of the designer open you can see the names of the .pibx files that you have open and choose which one to connect to.

image (3)

Highlighting of unnatural hierarchies

Unnatural hierarchies can have a negative impact on performance so we highlight those in the metadata pane.

image

Bug Fixes

There have been a number of minor bugs that have been fixed along the way, including a number that were not officially reported, but were discovered during testing.

  • FIXED - Doubling up of backslash characters when using DAX Formatter
  • FIXED - Unable to connect to SSAS servers when not an administrator
  • FIXED - changing the connection for a query window would sometimes throw an error
  • FIXED - source database changing on metadata refresh when connected to a SSAS server
  • FIXED - cannot connect to a PowerPivot model where the file path contains single quotes

Known Issues

The following issue which was first identified in 2.1.1 still remains:

  • "Linked Table" output for the Excel add-in only works for PowerPivot models if there is at least one table in the model that was created by linking data from an Excel table. We have not been able to find a workaround to this yet, it looks like it might be due to a bug in the Excel object model.

Enjoy!

Posted On Wednesday, June 24, 2015 6:58 AM | Comments (1)

Wednesday, May 13, 2015 #

SSAS Tabular–per table LastProcessed dates

I saw a question yesterday on the MSDN forums ask how to go about exposing the LastProcessed datetime per table. Marco has a great post here (https://www.sqlbi.com/articles/last-process-date-in-ssas-tabular/) about a number of ways to get the last processed date at the database level. But none of these techniques work at the table level.

Having a look through the various DMVs that are available and none of them seemed to have this information apart from DISCOVER_XML_METADAT which returns a large XML document which can be tricky to query. However the information in this DMV is also used to populate the AMO object model. So it should be possible to use AMO to pull out the last processed datetime per table and then push it in using a variation of the processAdd push mode processing.

I started by creating a new table in my database called LastProcessed using the following query as a source

SELECT
CAST('' as varchar(256)) as TableName,
Getdate() as LastProcessed
WHERE 1=2

This generates an empty table with a string column and a datetime column which will be the target for our processAdd push processing.

I was not exactly sure how to construct the XMLA for the processAdd command so I create a simple SSIS package using a Dimension Processing Destination in a DataFlow task and captured the XMLA it produced using SQL Profiler. The command that Profiler captured was missing the <parameters> section that you would use for pushing a batch of data so I adapted one of Greg Galloway’s excellent XMLA processAdd examples and then wrapped that in a powershell script.

The powershell script I created can be found here https://gist.github.com/dgosbell/8f4e2a6aa022a6fdd288

And can be run with a simple command like the following:

.\update-ssaslastupdated -serverName "localhost" -databaseName "advwrksLastProcessed" -tableName "LastProcessed"

Once that succeeds you can either expose the whole table or you could hide it and build measures to just expose the dates for specific tables.

Below is an example of querying this table in my adjusted Adventure Works database. You can see that ‘Internet Sales’ and ‘Reseller Sales’ have been processed at different times to the other tables in the database.

image

Posted On Wednesday, May 13, 2015 9:52 PM | Comments (0)

Friday, April 24, 2015 #

BIDS Helper 1.7.0 released

A few days ago the BIDS Helper team released of v1.7.0 which you can download from here: https://bidshelper.codeplex.com/releases

This version comes with a few bug fixes, and some new functionality, but the most significant enhancements are to the BIML Engine thanks to our friends at Varigence.

I’ve copied and pasted the release notes below. Note that there are some possible breaking changes in the new BIML Engine, so if you have existing BIML script please read this section carefully.

====== Release Notes =======

Fixed Issues:

  • Fixes to all features so they work when the project is inside a solution folder.
  • Fixed Biml project connections in Visual Studio 2013.
  • Reports launch in the proper version of the Report Viewer control in Visual Studio 2012 and 2013.

Improvements
  • Biml updates to bring it up-to-date with Mist 4.0
  • Biml updates to support emitting SSIS 2014 packages from Visual Studio 2013.
  • Biml brings the Output window to the front, displays a progress bar, and displays a more useful errors dialog
  • Enabling the ability to add a translation for the default language in Tabular Translations Editor to makefrictionless renaming easier.
  • Improvements to Tabular Actions Editor:
    • Adding button to test all MDX expressions
    • ensuring the drillthrough column dropdowns only show columns from related tables
    • Adding a button to let you copy an existing action
    • Adding a button to populate the drillthrough action columns with all columns in related tables (since it's quicker to manually delete than manually add columns from this list)
    • Allowing MeasureGroupMeasures to apply to any action type
  • Printer Friendly Dimension Usage for Tabular

Biml Details

There are a huge number of changes to Biml in this release. Below are highlights of some of the more notable changes and those changes that were most requested by users.
BREAKING CHANGES
  • RootNode.PackageProjects has now moved entirely into RootNode.Projects, including all namespaces. This is done to better support other project types such as analysis services projects. The mitigation is to change all instances of the <PackageProjects> tag to the <Projects> tag.
    • MITIGATION: If backwards compatibility with older versions of Mist is required, the PackageProjects/Projects element name can be conditionalized based on the tools version.
  • Removed QuerySource and Destination. These were abstractions that wrapped OleDbSource/AdoNetSource and OleDbDestination/AdoNetDestination in a single interface that offered the intersection of their features and automatically chose the correct component type to emit based on the referenced connection type. These language elements were not particularly useful, were very narrowly used, and their removal will prevent unintentional use by new customers.
    • MITIGATION: Change any instances of QuerySource or Destination to the appropriate OleDbSource/AdoNetSource and OleDbDestination/AdoNetDestination
  • All instances of the word Hadron have been removed from the project. Most notably, namespaces that included “Hadron” have 1:1 replacements with “Biml”
    • MITIGATION: Empty classes were added in the removed namespaces to prevent code with legacy namespace imports from breaking scripts
  • AstDimensionNode and AstFactNode have been removed. SSAS metadata is now added underneath the AnalysisMetadata element of table objects.
    • This is unlikely to affect most BIDSHelper users, since SSAS emission is only supported in Mist.
  • AnalysisServices connection property OverriddenConnectionString is renamed to ConnectionString. The Provider property is removed as it is no longer needed
    • This is unlikely to affect most BIDSHelper users, since SSAS emission is only supported in Mist.

Biml Language Changes
  • Directives
    • Added code directive to reference C# or VB files with code directly from Biml files. This enables a variety of scenarios around creating more complex and reusable .NET code for use within BimlScripts
    • Nested include files now work
  • Other
    • Added ObjectTag property to all Biml API classes. This is essentially a .NET Dictionary object that enables storage of .NET objects in a similar way to annotations in Biml.
    • Added support for external script project files in the language
    • Added GetPropertyValue method to all Biml API objects. This provides another option to get object property values when doing dynamic code generation.
    • Added Parse methods to all Biml API objects. This enables easy parsing of Biml code snippets to create Biml API objects. It is the inverse of the GetBiml functionality.
    • ConstraintMode has been set to default to Parallel, so that it no longer needs to be specified for most package objects
    • Added BuildSettings object to AstRootNode to access build settings in a central location
    • Added FriendlyName convenience property to columns
    • Fixed Biml reference resolution code so that references are updated even when DisableAllChangedEventInfrastructure is set to true. This is essential to enable the use of utility methods like IsUsedInPrimaryKey within live scripts on tables also imported within the live scripts
    • Added an IsDatabaseCharSetUnicode override property to Oracle connections

Biml API Changes
  • Added timeout support to all of the database providers and external data access methods
  • Fixed issue where properties changed via API were not being reflected in GetBiml output
  • Added GetDataTable support for most major DbConnection types
  • Added AstRootNode convenience collections for FlatFileFormats, RawFileFormats, PackageProjects, and CubeProjects
  • Fixed issue where AstRootNode convenience collections did not support name-based indexing in Live mode
  • Fixed an issue where StructureEquals was producing incorrect results due to a missing negation operator in autogenerated code

Biml SSIS Emission Changes
  • SSIS 2014 Support
  • New Tasks and Components
    • CDC Control Task
    • CDC Source
    • CDC Splitter
    • DQS Cleansing
  • Added SSIS Path Annotations to dataflow and control flow paths
  • Added support to SSIS emitter for emitting external columns without the corresponding output columns. This is primarily used for database datatypes that are unsupported by SSIS.
  • Changed DbType to DataType in column specifications
  • Fixed issue where a variable defined in an event handler could not be referenced from a descendent of that event handler.
  • Fixed build crash when cache connections are used on lookups
  • Improved emission of connection managers so that the individual connection properties are supplied, rather than just the connection string property
  • Fixed code generation issue reported to support with the Fuzzy Lookup component
  • Added support for the new AutoPassThroughInputColumns property on Fuzzy Lookup to give it a bit more flexibility
  • Added ServerExecutionID to pre-defined variables list for packages
  • Added support for expressions on project connections
  • Added PackagePassword property to AstPackageNode for encryption support

Biml Errors/Warnings
  • Fixed an issue with ValidationReporter where it would occasionally report line numbers into our temp files. This usually happened with mismatched end braces
  • Improved validator for SSIS data flow columns to show the duplicate column name when the same column is mapped multiple times
  • Fixed an issue where transformer generated errors were not being reported correctly. They previously appeared as Null Reference Exceptions because of an issue in the reporting mechanism.
  • Fixed issue where wrong column name was being shown in Script Component column type mismatch errors
  • Clarified error message when “Build and Run” or “Build and Execute” was performed without the correct version of SSIS installed
  • Added an error for duplicate dataflow column nodes. This usually arises with "reserved" column names "ErrorCode" and "ErrorText"
  • Added a warning and an automatic patch of import statements that use the old Hadron namespaces

Posted On Friday, April 24, 2015 7:02 AM | Comments (0)

Friday, April 10, 2015 #

SSRS–adding a pipe delimited text export option in Sharepoint Integrated mode

We are currently looking decommission an old reporting portal based on another technology stack and replace it with SSRS running in Sharepoint integrated mode. However one of the requirements was to support a “pipe delimited” CSV export format.

There are a number of posts you can find about changing reportserver.config when running SSRS in standalone mode, but I could not find much information about how to do this when running in Sharepoint Integrated mode.

MSDN documentation lists the various cmdlets, https://msdn.microsoft.com/en-us/library/gg492249.aspx but does not show examples of the parameter values which is pretty useless as some of the parameters require XML fragments.

This blog got me close http://blog.kuppens-switsers.net/it/customize-ssrs-extension-settings-sharepoint-integrated-mode/ and when combined with the information from this one http://www.mssqltips.com/sqlservertip/3379/modify-sql-server-reporting-services-rendering-format-in-sharepointintegrated-mode/ I was able to come up with the following powershell script which does what I needed.

You’ll need to run this from the Sharepoint Management Shell (I logged on to the console of one of our app servers to do this) using an account that has farm admin rights.

$svrDirectives = @"
<OverrideNames>
    <Name Language='en-US'>CSV (Pipe Delimited)</Name>
</OverrideNames>
"@
$extConfig = @“
<DeviceInfo>
    <FieldDelimiter>|</FieldDelimiter>
    <NoHeader>false</NoHeader>
    <FileExtension>csv</FileExtension>
    <ExcelMode>False</ExcelMode>
    <Encoding>ASCII</Encoding>
</DeviceInfo>
"@
$typeName = "Microsoft.ReportingServices.Rendering.DataRenderer.CsvReport,Microsoft.ReportingServices.DataRendering"

$apps = get-sprsserviceapplication
foreach ($app in $apps)
{
  New-SPRSExtension -identity $app -ExtensionType "Render" -name "CSVPipeDelimited" -TypeName $typeName –ServerDirectives $svrDirectives –ExtensionConfiguration $extConfig
}

This code loops over all the service applications in the current farm so even if you have a multi tenancy type situation it *should* work. Although if you are in this situation you should do your own testing. Our farm only has the one SSRS service application.

Posted On Friday, April 10, 2015 8:37 PM | Comments (2)

Wednesday, March 18, 2015 #

DAX Studio 2.1 Released

Today I am pleased to announce the release of the latest update to DAX Studio – v2.1.0

You can get it from the releases page on codeplex: http://daxstudio.codeplex.com/releases

Below is an outline of what’s new in 2.1. A big thanks to Daniele Perilli for his assistance with the graphics and Marco Russo for his work on the Query Plans and Server Timings tabs plus his help with testing this release.

UI Refresh

Thanks to assistance from Daniele Perilli on the graphics side we now have a lot more consistency in our ribbon and we’ve moved all the buttons onto the Home tab and moved the database dropdown to the metadata pane. This has enabled us to fit all the buttons you’d use regularly onto the home ribbon.

SNAGHTML17e91a9f

Metadata Search

We now support searching in the metadata pane. If you hover over the little magnifying glass in the top right of the metadata pane the search control will fly out and as you type the metadata will filter down to only show objects containing the characters you’ve typed.

The icon changes to a green colour imagewhen a search is active so that you know that you are not looking at the full set of metadata.

image

Search and Replace

We now support searching for text within a query document, including highlighting all matching text.

image

And we support replacing of text

image

Both search and replace include the ability to do case-sensitive searches, regular expressions, wildcards and full word matches. These dialogs use the same hotkeys as Visual Studio Ctrl-F for Find and Ctrl-H for Replace.

Improved Server Timings Tab

We now show a much nicer view of the aggregate timing details as well as showing the detailed scan events with their timings. You also have the option of showing cache and internal events although these are hidden by default.

image

Improved Query Plan tab

The QueryPlans are now pre-processed to make them easier to read and the total number of records for each line is split out so that it can be clearly seen and so that you can sort by this column to find the operations which are traversing large numbers or rows.

image

Save Query Results to a File

Thanks to the codeplex user mogular who submitted an initial code sample for this feature. You can now export query results to either a comma-separated (csv) file or a tab-delimited (txt) file by choosing the File output option.

SNAGHTML1ff334e4

Using Semantic Versioning for releases

I’ve never really had a strict way of assigning version numbers, but I was not really happy with using the full 4 part version that is generated by .Net. So I’ve decided to start using a variation of Semantic Versioning for the releases. This uses a 3 part version number <major>.<minor>.<patch> where:

<major> – gets updated for significant or breaking changes

<minor> – gets updated for new features

<patch> – gets updated for fixes to existing features

and updates to a higher number reset the lower ones.

Fixes

There have been numerous small fixes but some of the notable ones are:

  • Excel Add-in Regional Support – there was a bug in v2.0 which caused the Excel add-in to fail to load correctly on PCs that had regional settings that did not use a period (.) as the decimal separator. This has now been fixed and the Excel add-in should work regardless of the regional settings on the PC
  • New Version notification – this was actually silently failing in v2.0
  • Updates to syntax highlighting definitions
  • Improvements to the Excel “Linked Table” output 

Known Issues

  • "Linked Table" output for the Excel add-in only works for PowerPivot models if there is at least one table in the model that was created by linking data from an Excel table. We have not been able to find a workaround to this yet, it looks like it might be due to a bug in the Excel object model.
  • I’ve temporarily removed the “Quick Access” buttons in the title bar as they are not working consistently at the moment.

Posted On Wednesday, March 18, 2015 7:17 AM | Comments (4)

Monday, February 2, 2015 #

The Care and Feeding of SSAS in Production - SQL Saturday 365 Melbourne

SQLSAT365_SPEAKINGWe are not far away from SQL Saturday #365 which is occurring on Feb 7th, 2015. If you live in or around Melbourne it’s worth considering making the effort to come along. We have a great line up of 30 different sessions with speakers from 7 different countries including 16 MVPs.

This year I’m doing a talk called “The Care and Feeding of Analysis Services in Production”. I was partly inspired by the “Accidental DBA” topics that I’ve seen people doing for the relational engine and figured it was time someone did something similar for SSAS.

I’m going to talk about both Tabular and Multi-Dimensional, so there should be something in there for everyone.

Below is the session abstract:image

A lot of the information you'll find on Analysis Services is focused around the initial creation of databases and models, but once you have a solution deployed to production then what? In this session we will look at what it takes to run an Analysis Services server in production. What are the basics that you need to know about how the server works. Including things like threading, memory usage and locking. How can you monitor the health of your server? What tools can you use to find out what's happening on your server? We'll have a look at what you should be monitoring to make sure your system is running properly and run through what to investigate went things don't run as smoothly as you'd like.

Posted On Monday, February 2, 2015 6:31 AM | Comments (0)

Monday, December 8, 2014 #

DAX Studio v2–Christmas comes early

Read the following in your best movie trailer voice….

It’s been a long time coming….

19 months in the making….

5,000 lines of XAML code….

7,000 lines of C# code…

With the ability to run both inside of Excel and as a Standalone program

Version 2 of DAX Studio is finally here.

Maybe that’s a bit melodramatic, but maybe you get a hint of how exciting it is to finally be able to share this with you all. Version 2 is pretty close to full re-write of the user interface. And in the process there have been a number of difficult hurdles to overcome. So it’s taken a bit longer than anticipated, but it’s finally ready.

Below is a screenshot of the new user interface, which I think you’ll agree looks pretty slick.

hero_image 

You can download the latest version from http://daxstudio.codeplex.com under the downloads tab. The documentation tab has also been updated to cover all the new features in in v2.

Some of the exciting new features are:

  • An “Office 2013” ribbon window
  • True tabular metadata (modelled of the PowerView metadata pane)
  • Integrated tracing support
  • Bracket matching
  • Version update notification
  • comment / uncomment support
  • a single universal installer
  • plus many more 

But probably more importantly the code has been re-architected in a more modular structure so it should be easier to extend and improve going forward. There are still plenty of features that I’d like to add and it should be possible to do some smaller releases now that the major re-structuring is complete.

enjoy :)

Posted On Monday, December 8, 2014 11:18 PM | Comments (6)

Monday, November 17, 2014 #

SQL Saturday #365–Melbourne Feb 7 2015

Things have been a little quite around here recently, one of the reasons for that is that I’ve been hard at work as a committee member for SQL Saturday 365 which will be held in Melbourne, Australia on Feb 7th 2015.

Following on from Melbourne’s first highly successful event earlier this year, next year's event promises to be bigger and better.

Located again at the Caulfield Campus of Monash University, the event will have a mixture of local, interstate and international speakers. And for the very first time there will be full day pre-con sessions available for a very reasonable price on the Friday before the event.

You can find out more information about the event here:
https://www.sqlsaturday.com/365/eventhome.aspx

You can register to attend here:
https://www.sqlsaturday.com/365/register.aspx

If you would like to present a session at SQL Saturday you can submit a proposal here:
https://www.sqlsaturday.com/365/callforspeakers.aspx

Pre Cons
This year there will also be 3 Pre-Con Full Day Sessions on the Friday.  These present amazing value for money with a full day training by top experts for only $315 ($265 for early-bird registrations). The session details and registration links are as follows:

BIML Bootcamp
with Reeves Smith, Peter Avenant, Warwick Rudd and Paul Schmidt

Mastering Execution Plan Analysis
with Paul White

Practical Power BI
with Peter ter Braake

Full details of the PreCon sessions and how to register are in the links.

Posted On Monday, November 17, 2014 9:37 PM | Comments (0)

Monday, July 28, 2014 #

The perils of calculating an Average of Averages

I've seen questions around issues calculating averages come up a few times in various forums and it came up again last week and I feel that there is some benefit in walking through the details of this issue. For many of you the following will be nothing new, but I'm hoping that this may serve as a reference that you can point to when you get requests for this sort of calculation.

The core issue here is really a fundamental mathematical one. Personally I see it surfacing most often in DAX and MDX as those are languages that I spend a lot of time with, but also because of their multi-dimensional natures you need to be able to write generic calculations that will work regardless of how the end users slice and dice the data.

The discussions invariably start with a statement like the following:

"I have a calculated measure that an average, but my totals are calculating incorrectly"

There are 2 different issues I see relating to this.

The first one is trying to use the AVG() function in MDX. Basically if you want an average calculation that works with all your different dimensions then avoid this function. The AVG function in MDX calculates the average over a fixed set. You may be able to use it in a static query, but to calculate an average in your MDX script simply create the two base measures - a sum and a count, then divide the sum by the count. This is not as much of an issue in DAX as the built-in AVERAGE, AVERAGEA and AVERAGEX generally work as expected.

The other sort of question that I see is related to how the totals are calculated and the question is usually something like the following:

"I have an average measure calculated by doing sum / count - which produces the correct average for each row, but the total is calculated as "sum of sums" / "sum of counts" and my user wants to see it as the average of all the averages."

And to put it bluntly this requirement is invalid. You should never "total" a series of averages by averaging them. The easiest way to explain why this is the case is to illustrate with some data. So let's have a look at a few scenarios.

The first problem you will see with the "average of averages" approach is that it gives too much weight to outlying amounts.

Category

Amount

Count

Average

Bikes

1,000

1

1,000

Helmets

10,000

1,000

10

TOTAL

11,000

1,001

???

Given the data above how should we calculate the total average? if we do the "average of averages" approach we have:

(1000 + 10) / 2 = 505

If we take the SUM(Amount) / SUM(Count) approach we get the following:

11000 / 1001 = 10.99

This is an extreme example to prove a point, but which do you think is correct? Should the 1 bike we sold for $1000 skew the average to $505 or should the fact that it was just one product out of 1001 mean that the average should only be $10.99?

Your business user might be happy seeing a higher average amount, but what if the situation was reversed and we had sold 1000 bikes and just one helmet? This would make the "average of averages" still equal 505 while recalculating the average at the total level would give us $999.01 - I know which calculation I think is giving a better indication of the total average sales.

It's possible that you may be thinking at this point that this is not so much of a big deal for you because you don't have that sort of variability in your data. However that is only the start of the issues. If you are still unsure about the evils of averaging averages then read on because it only gets worse.

To show the next nasty side effect we need to look at just a little bit more data. Take the following 4 records for example where we have data split between 2 cities and 2 product categories

City Category

Amount

Count

Melbourne Bikes

18

3

Melbourne Helmets

25

5

Seattle Bikes

21

3

Seattle Helmets

16

4

When we group the data by City we get the following results. The "Total" line is where the average is recalculated at the total level. Where as the "Avg of Averages" line is where I've take the average of the 2 City averages.

City

Amount

Count

Average

Melbourne

43

8

5.38

Seattle

37

7

5.29

Avg of Averages    

5.34

Total

80

15

5.33

Now lets have a look at what happens to the figures when we group the data by the product category. Notice that the Total line has remained unchanged, but the "Avg of Averages" is now different!

Category

Amount

Count

Average

Bikes

39

6

6.5

Helmets

41

9

4.56

Avg of Averages    

5.53

Total

80

15

5.33

This sort of behaviour - where the figures reported for total and sub-totals will vary depending on how the data is sliced and diced - will be the death of your BI project.

Trust - the most important "feature" of any BI project

I would argue that possibly the most important "feature" of any BI solution is trust. You can always add features and missing functionality, but it can be very difficult to win back the trust of your users once it's been lost. And nothing will erode the trust of your users than seeing inconsistent results.

It's not just straight Averages that are the issue

Anytime you are mixing calculations that do sums and divisions you need to be careful of the order of operations. Ratios, Percentages and moving averages are just a few of the examples of other calculation types for which you need to take care of the order in which add and divide things.

Posted On Monday, July 28, 2014 7:18 AM | Comments (2)

Thursday, July 3, 2014 #

The case of the vanishing KPIs

I was contacted today with an interesting issue, we had a tabular model that had some KPIs which were not showing up in Power View.

The first thing I checked was the version setting on the model. KPI support was not added to tabular models in SP1. If your model is set to a compatibility version of RTM (1100) Power View will detect this and will effectively not ask for metadata about the KPIs.

However in this case when we checked the database properties from SSMS the compatibility setting appeared to be correctly set to SP1 (1103)

image

So the next thing I did was to open a profiler trace and look at the metadata queries that Power View executed as it started up. Excel treats SSAS Tabular models as if they were multi-dimensional models and queries the metadata using a number of different DISCOVER queries against different schema rowsets. When SSAS Tabular was developed a new schema rowset was introduced called DISCOVER_CSDL_METADATA which is what DAX clients like Power View use to populate their field browser windows.

Checking the command I could see that it was correctly requesting a version 2.0 recordset. If the model was set to a compatibility setting of RTM (1100) or if there was a problem detecting the compatibility setting of the model you may see a 1.0 in the version restriction. Version 1.0 CSDL will not include KPI information. This is so that client tools can specify the version of metadata which they know how to handle.

image

At this point it looks like Power View and SSAS are correctly talking to each other, but we are still no closer to understanding why the KPIs are visible in Excel, but not in Power View.

The next thing to looks at was the actual response returned by the metadata query to see if there was anything strange in there. To do that I took the RestrictionList and PropertyList elements from the profiler trace and inserted them into the Restrictions and Properties elements in the query below. I also had to remove the LocaleIdentifier and DataSourceInfo elements from the PropertyList as these related to a different session. Below is an example of a DISCOVER_CSDL_METADATA command which can be run from an XMLA window in SSMS.

<Discover xmlns="urn:schemas-microsoft-com:xml-analysis">

    <RequestType>DISCOVER_CSDL_METADATA</RequestType>
   
    <Restrictions>
        <RestrictionList xmlns="urn:schemas-microsoft-com:xml-analysis">
            <CATALOG_NAME>Adventure Works Tabular</CATALOG_NAME>
            <VERSION>2.0</VERSION>
        </RestrictionList>
    </Restrictions>
    <Properties>
        <PropertyList xmlns="urn:schemas-microsoft-com:xml-analysis">
            <Catalog>Adventure Works Tabular</Catalog>
            <Content>SchemaData</Content>
            <Format>Tabular</Format>
        </PropertyList>
    </Properties>
</Discover>

(you can simply replace the 2 references to the Catalog name in the XMLA below to run this against one of your models)

When I searched in the results for "KPI" I came across the following interesting piece of data.

SNAGHTML18885ef1

Notice the Hidden="true" attribute? It turns out that the original developer decided to hide the measures before creating the KPI which resulted in the KPI itself being hidden. Setting the Hidden property to false on the measure fixed this issue. Mystery solved.

So although the end solution turned out to something simple I thought it might be interesting to share the process

A footnote

Note that we still have one minor issue, now in Excel we can now see both the KPI and the measure, while in Power View we only see the KPI. My suspicion is that this may be a bug in the MDSCHEMA_MEASURES rowset which Excel uses to find out what measures a model has. My opinion is that in order to be consistent with Power View that measures which are used for KPI values should not also be displayed as "normal" measures.

Posted On Thursday, July 3, 2014 7:33 AM | Comments (0)

Friday, May 23, 2014 #

BI Survey 14

It's BI Survey time again :)

If you haven't done this before here is a little background on it from the guys that run it:

The BI Survey, published by BARC, is the world's largest and most comprehensive annual survey of the real world experiences of business intelligence software users. Now in its fourteenth year, The BI Survey regularly attracts around 3000 responses from a global audience. It provides an invaluable resource to companies deciding which software to select and to vendors who want to understand the needs of the market.


The Survey is funded by its readers, not by the participant vendors. As with the previous thirteen editions, no vendors have been involved in any way with the formulation of The BI Survey. Unlike most other surveys, it is not commissioned, sponsored or influenced by vendors.

Here is a link to the survey:

https://digiumenterprise.com/answer/?link=1981-ZYQSEY8B

If you take the survey you will get access to a summary of the results. By helping to promote the survey here I'll get access to some more detailed results including some country specific analysis so it will be interesting to see the results.

Posted On Friday, May 23, 2014 6:45 AM | Comments (0)

Friday, May 16, 2014 #

Function Folding in #PowerQuery

Looking at a typical Power Query query you will noticed that it's made up of a number of small steps. As an example take a look at the query I did in my previous post about joining a fact table to a slowly changing dimension. It was roughly built up of the following steps:

  1. Get all records from the fact table
  2. Get all records from the dimension table
  3. do an outer join between these two tables on the business key (resulting in an increase in the row count as there are multiple records in the dimension table for each business key)
  4. Filter out the excess rows introduced in step 3
  5. remove extra columns that are not required in the final result set.

If Power Query was to execute a query like this literally, following the same steps in the same order it would not be overly efficient. Particularly if your two source tables were quite large. However Power Query has a feature called function folding where it can take a number of these small steps and push them down to the data source. The degree of function folding that can be performed depends on the data source, As you might expect, relational data sources like SQL Server, Oracle and Teradata support folding, but so do some of the other sources like OData, Exchange and Active Directory.

To explore how this works I took the data from my previous post and loaded it into a SQL database. Then I converted my Power Query expression to source it's data from that database. Below is the resulting Power Query which I edited by hand so that the whole thing can be shown in a single expression:

let
    SqlSource = Sql.Database("localhost", "PowerQueryTest"),
    BU = SqlSource{[Schema="dbo",Item="BU"]}[Data],
    Fact = SqlSource{[Schema="dbo",Item="fact"]}[Data],
    Source = Table.NestedJoin(Fact,{"BU_Code"},BU,{"BU_Code"},"NewColumn"),
    LeftJoin = Table.ExpandTableColumn(Source, "NewColumn"
                                  , {"BU_Key", "StartDate", "EndDate"}
                                  , {"BU_Key", "StartDate", "EndDate"}),
    BetweenFilter = Table.SelectRows(LeftJoin, each (([Date] >= [StartDate]) and ([Date] <= [EndDate])) ),
    RemovedColumns = Table.RemoveColumns(BetweenFilter,{"StartDate", "EndDate"})
in
    RemovedColumns

If the above query was run step by step in a literal fashion you would expect it to run two queries against the SQL database doing "SELECT * …" from both tables. However a profiler trace shows just the following single SQL query:

select [_].[BU_Code],
    [_].[Date],
    [_].[Amount],
    [_].[BU_Key]
from
(
    select [$Outer].[BU_Code],
        [$Outer].[Date],
        [$Outer].[Amount],
        [$Inner].[BU_Key],
        [$Inner].[StartDate],
        [$Inner].[EndDate]
    from [dbo].[fact] as [$Outer]
    left outer join
    (
        select [_].[BU_Key] as [BU_Key],
            [_].[BU_Code] as [BU_Code2],
            [_].[BU_Name] as [BU_Name],
            [_].[StartDate] as [StartDate],
            [_].[EndDate] as [EndDate]
        from [dbo].[BU] as [_]
    ) as [$Inner] on ([$Outer].[BU_Code] = [$Inner].[BU_Code2] or [$Outer].[BU_Code] is null and [$Inner].[BU_Code2] is null)
) as [_]
where [_].[Date] >= [_].[StartDate] and [_].[Date] <= [_].[EndDate]

The resulting query is a little strange, you can probably tell that it was generated programmatically. But if you look closely you'll notice that every single part of the Power Query formula has been pushed down to SQL Server. Power Query itself ends up just constructing the query and passing the results back to Excel, it does not do any of the data transformation steps itself.

So now you can feel a bit more comfortable showing Power Query to your less technical Colleagues knowing that the tool will do it's best fold all the  small steps in Power Query down the most efficient query that it can against the source systems.

Posted On Friday, May 16, 2014 7:40 AM | Comments (0)

Monday, May 5, 2014 #

#PowerQuery – Joining to a Slowly Changing Dimension

I blogged previously about how to look up a surrogate key for a slowly changing dimension using DAX. This post is about how to do the same thing using Power Query.

I'm going to start off with the same 2 tables that I used in the previous blog post. One is a fact table and the other is my BU (Business Unit) table. I started by clicking on each table of data in Excel and choosing the "From Table" data source option.

image

And for each table I unchecked the "Load to worksheet" option and then clicked apply & save.

SNAGHTML179a06bf

Once I had done that for both tables my Power Query tool pane looked like the following, I have two queries defined, but neither of them is loading any data directly.

image

Now that we have our two source queries we want to use the Merge option in Power Query to join them together

image

The Merge option in Power Query is how you join matching rows in two tables together. I chose "Fact" as my first table as for each row in the Fact I want to find the matching BU_Key from the BU table.

image

You'll notice that at this point we can only choose columns for an equality match, there are no options for us to test that the Date in Fact is between the StartDate and EndDate in the BU table.

When we click on OK we end up with a result like the following which has our original rows from the Fact table and then a column called "NewColumn" which contains the 1 or more rows from the BU table which matched on the BU_Code column.

image

If we click on the little double arrow button in the header of the NewColumn column you get the following options:

image

We can choose to either expand or aggregate the rows in the nested table. Because we want to lookup the BU_Key we tick that as well as the StartDate and EndDate columns as we will need those later.

That gives us a result like the following:

image

Now we are getting close, but we still have one major issue. We now have 16 rows instead of our original 8 because each row in the Fact table is matching to multiple rows in the BU table as we have not done any filtering based on the start and end dates yet. Clicking on the filter button at the top of the "Date" column it initially looks like doing a date filter and choosing the "Between" option would be a solution.

image

But that only gives us the option to select fixed dates values from our data, not references to another column.

image

One solution would be to put in fixed dates and then manually edit the filter in the formula bar, but I wanted to see how far I could get without resorting to doing any advanced editing. The solution I came up with involved some minor code, but it can be done without manually editing the formula.

What I ended up doing was inserting a new custom column which we can then use to filter out the rows we don't want. So from the "Insert" tab on the ribbon I chose the "Insert Custom Column" option:

image

Then I entered the following expression to create a new column called "DateFilter" which will return a value of True if the Date from the current Fact row was between the StartDate and EndDate from the BU table.

= ( ( [Date] >= [NewColumn.StartDate] ) and ( [Date] <= [NewColumn.EndDate] ) )

image

That gives us the following result:

image

Then to filter down to just the "True" values we just need to click on the dropdown in the header of the "DateFilter" column and select the "TRUE" value in our filter.

image

We are now back to our original 8 rows.

image

Then we just need to do a little clean up. By holding the Ctrl key while clicking on the green columns above we can remove those columns. Then I just renamed "NewColumn.BU_Key" to BU_Key and clicked on the "Date" column and set it's type as date (which somehow did not get correctly get detected) we now end up with our finished table which we could choose to load into Excel or directly into a Power Pivot model.

image

Below is the Power Query Formula that was created as a result of the above steps. (this is just the merge query excluding the 2 source queries for "BU" and "Fact")

let
    Source = Table.NestedJoin(Fact,{"BU_Code"},BU,{"BU_Code"},"NewColumn"),
    #"Expand NewColumn" = Table.ExpandTableColumn(Source
            , "NewColumn"
            , {"BU_Key", "StartDate", "EndDate"}
            , {"NewColumn.BU_Key", "NewColumn.StartDate", "NewColumn.EndDate"}),
    InsertedCustom = Table.AddColumn(
            #"Expand NewColumn", "DateFilter"
            , each ( ( [Date] >= [NewColumn.StartDate] ) and ( [Date] <= [NewColumn.EndDate] ) )),
    FilteredRows = Table.SelectRows(InsertedCustom, each ([DateFilter] = true)),
    RemovedColumns = Table.RemoveColumns(
            FilteredRows,{"BU_Code", "NewColumn.StartDate", "NewColumn.EndDate", "DateFilter"}),
    RenamedColumns = Table.RenameColumns(RemovedColumns,{{"NewColumn.BU_Key", "BU_Key"}})
in
    RenamedColumns

If you want to manually tweak things you can go into the Advanced Editor and manually edit the formula to combine all three queries into one and you can also do away with the custom column and just do the between filtering inline. The following query shows the single query solution.

let
    Fact1 = Excel.CurrentWorkbook(){[Name="Fact"]}[Content],
    BU1 = Excel.CurrentWorkbook(){[Name="BU"]}[Content],
    Join = Table.NestedJoin(Fact1,{"BU_Code"},BU1,{"BU_Code"},"NewColumn"),
    #"Expand NewColumn" = Table.ExpandTableColumn(Join
            , "NewColumn"
            , {"BU_Key", "StartDate", "EndDate"}
            , {"NewColumn.BU_Key", "NewColumn.StartDate", "NewColumn.EndDate"}),
    FilteredRows = Table.SelectRows(#"Expand NewColumn"
            , each ( ( [Date] >= [NewColumn.StartDate] ) and ( [Date] <= [NewColumn.EndDate] ) )),

    RemovedColumns = Table.RemoveColumns(
            FilteredRows,{"BU_Code", "NewColumn.StartDate", "NewColumn.EndDate"}),
    RenamedColumns = Table.RenameColumns(RemovedColumns,{{"NewColumn.BU_Key", "BU_Key"}}),
    ChangedType = Table.TransformColumnTypes(RenamedColumns,{{"Date", type datetime}})
in
    ChangedType

If you are curious you can download the workbook I used for this blog post from my OneDrive: http://1drv.ms/1mux5O5

Posted On Monday, May 5, 2014 9:53 PM | Comments (2)

Tuesday, April 22, 2014 #

Implementing Column Security with #SSAS Tabular and #DAX

Out of the box Analysis Services (both Tabular and Multi-dimensional) has great support for horizontal or row based security. An example of this is where you would give User1 access to all data where the Country is “Australia” and give User2 access to all data where the country = “United States”. This covers a large percentage of the security requirements that most people have.

But neither technology has great support for vertical or column based security. This sort of requirement is most common in privacy scenarios. One example of this would be a model with medical data. It may be acceptable to show all your users demographic data such as the state they live in or their gender. But only a specific subset of users should have access to see individual patient details such as their name or phone number.

One approach would be to simply create 2 models, one with the secure information and one without. While this works, it doubles your processing time and doubles any maintenance activities and takes up double the storage.

Looking at the features in SSAS you may be tempted to try using perspectives. At first glance they appear to do what we want - allowing us to hide a subset of columns. But perspectives are not a security feature. All they do is to show a subset of the metadata to the user, but the user still has to have access to the full model and the hidden columns are still fully query-able from DAX and MDX. Trying to use perspectives for security is like putting a "Keep Out" sign on your front door, but then not actually locking it…

To explore this issue further I created a very simple database in SQL Server which has a Patient table and a FactPatient table which look like the following:

SNAGHTML94b4182

What I want to do is to create a model where only specific people can see the PatientName column. So because we can't restrict access to specific columns in a single table I created 2 views over the Patient table - vPatient which has every column except the PatientName and vPatientSensitive which has the PatientID and PatientName

SNAGHTML950446f

At this point I then created a tabular model bringing in FactPatient, vPatient and vPatientSensitive.

If you create your relationships in the default manner you will end up with something like the following:

image

This works great for the role which has access to the sensitive information, but if you create a role which does not give access to any of the rows in vPatientSensitive, these users can't see any data.

The reason for this is that the Filter Context flows down through the chain of one to many realtionships

image

So if a role has no access to any rows in vPatientSensitive, this flows through the relationships to also filter vPatient and FactPatient resulting in this role not being able to see any data.

Because the relationship between vPatient and vPatientSensitive is actually a 1:1 we can reverse the direction of the relationship as per the following diagram:

image

Now we are getting closer. Our secured role works again, but we've now introduced a problem with our role that has full access. When they browse the data they see the following with the same amounts repeated for every patient name.

SNAGHTML97def30

If you take another look at our relationships you'll notice that it now looks like a many to many relationship. And there well established pattern of dealing with many to many relationships using CALCULATE( <expression>, <intermediate table> ).

So we could try something like CALCULATE( SUM( FactPatient[Amount], vPatientSensitive ) - however we can't just use this expression as if the vPatientSensitive is restricted then we will be back to our original scenario where restricted people can't see any data. So we need to check if the current user has access to the sensitive data before applying this expression. We can do this with COUNTROWS( ALL( vPatientSensitive ) ).

Then our full expression for a measure over the FactPatient[Amount] column becomes:

Total Amount :=
IF (
COUNTROWS ( ALL ( vPatientSensitive ) ) > 0,
CALCULATE ( SUM ( FactPatient[Amount] ), vPatientSensitive ),
SUM ( FactPatient[Amount] )
)

To test this design I setup a number of different roles.

The FullAccess role has no filters applied on any tables.

image

and can see all the patient data including the PatientName.

image

The NoSensitive role can see all the facts, but cannot see any columns from the vPatientSensitive table

image

So when they run the same query as the FullAccess role all they see is the following where the PatientName column from vPatientSensitive only shows blank values:

image

It's also possible to mix and match this approach with standard row based security. So we could limit a role to only seeing data from a particular state and also given them access to the sensitive data:

image

image

Or we could restrict the role to a particular state and deny access to the sensitive information

image

image

If you want to have a play with this solution yourself I've uploaded both the tabular project and a T-SQL script which will build the source database to a folder on my OneDrive.

Posted On Tuesday, April 22, 2014 11:20 PM | Comments (13)