Saturday, June 4, 2016 #

DAX - Creating a measure for Top 5 and Others

Often when you have a requirement to show a measure for the TopN of a given attribute the users will want to see the subtotal of all the other items.

With multi-dimensional cubes and MDX you can create calculated members in a dimension. But you can’t do this in a tabular model using DAX . You could create a calculated column, but then your TopN would be fixed and could not respond dynamically to changes in filters as calculated columns are calculated and stored at processing time.

So one way to make this work is to have a placeholder row (or rows) with "Others" in the table which you are doing the TopN over.

The first step in this technique is to create a Rank measure (this saves having to re-type the rank logic multiple times). In this example I’m doing a top 5 Brands.

Brand Rank:=RANKX ( ALL ( Brand[BrandName] ), [Sales Amount] )

Then we create a top 5 measure that returns the sales amount if the rank is less than or equal to 5 and will return the sum of all ranks if the current brand is "Others" otherwise it will return BLANK()

Top5 := 
IF ( 
    [Brand Rank] <= 5, 
    [Sales Amount], 
    IF ( 
        HASONEVALUE ( Brand[BrandName] ), 
        IF ( 
            VALUES ( Brand[BrandName] ) = "Others", 
            SUMX ( FILTER ( ALL ( Brand[BrandName] ), [Brand Rank] > 5 ), [Sales Amount] ) 
        ) 
    ) 
)
 

Then if we create a pivot table with this measure we get the following result. In the picture below I’m showing the raw Sales amount so you can see how the amounts get re-assigned in the Top5 calculation.

This also works as you change the pivot table. When we introduce the Segment on the rows we can show a figure for “Others” within each segment. (Note that for this to work I had to create 2 placeholder “Others” rows, one in each segment). There is a link at the bottom of this post to the file that these screenshots were taken from.

 

At this point, I’m pretty happy. It seems to be working well although calculating ranks is rather expensive so I’m wondering how it will work against larger data sets. I know that there are definitely ways to make this faster if you are using the current generation of DAX engines (Excel 2016, Power BI or SQL 2016) as we have access to things like variables and the INTERSECTS function which we could use in conjunction with TOPN. But I think there are also opportunities to make this faster with “v1” DAX

Running this algorithm against the Adventure Works Product Name column results in a performance profile like the following (using the server timings feature of DAX Studio)

Which shows 86% of the time being spent in the formula engine. I’m pretty sure that a fair bit of this time is spent calculating ranks greater than 5 for the “Others” calculation. So I thought it might be faster to a TOPN in descending order and get everything except for the first 5. The formula for that is the following

Top5v2 := 
IF ( 
    [Brand Rank] <= 5, 
    [Sales Amount], 
    IF ( 
        HASONEVALUE ( Brand[BrandName] ), 
        IF ( 
            VALUES ( Brand[BrandName] ) = "Others", 
            SUMX ( 
                TOPN ( 
                    COUNTROWS ( ALL ( Brand[BrandName] ) ) - 5, 
                    ALL ( Brand[BrandName] ), 
                    [Sales Amount], 1 
                ), 
                [Sales Amount] 
            ) 
        ) 
    ) 

The modified version of the calculation has a much better performance profile, taking about 1/3 of the time and with an almost 50/50 split between the Formula Engine and the Storage Engine.

You can download the a sample workbook with these formulas from the link below:

 https://onedrive.live.com/redir?resid=98546E1B65A78A74!14166&authkey=!AK3SDGu-_aLAXHo&ithint=file%2cxlsx

Posted On Saturday, June 4, 2016 3:38 PM | Comments (0)

Tuesday, November 3, 2015 #

Dax Studio 2.3.2 released

This latest release includes a lot of UI polish features and we also now have support for SQL 2016 and Excel 2016 which is mainly an update to the installer and some changes to the Query Plan and Server Timings features to deal with some changes to the xmlSql that comes back in the trace events.

Following the theory that a picture is worth a thousand words – below are screenshots of the changes in this release.

The File –> Open menu now includes a list of recently opened files.

image

For performance testing you can now set the run button to always doe a clear cache before executing a query. This mode is easily selectable using the new arrow menu on the run button.

image

The model dialogs all have updated styling and now including a shaded overlay so that the active portions of the screen are clearly visible.

image

An options pane has been added to the File menu for setting global program options

image

A Query History pane has been added which records all of the queries run by Dax Studio. If you have the Server Timings feature enabled the Server duration and the FE / SE timing are also tracked. You can double click on an item in the query history to insert it back into the editor. This is great for performance tuning as you can easily see which variation of the query was fastest and returned the expected number of rows and then bring it back into the editor.

image

The metadata pane now loads asynchronously. In earlier versions the loading of the metadata pane was a blocking operation and the user interface could go unresponsive for a short time while loading large models. Now the metadata is loaded on a background thread so the interface remains responsive and the pane that is updating is greyed out to indicate that the load is still in progress. .

image

The new “Define Measure” feature, which is a right-click option on a calculated measure, is a great way to either see how a measure was defined without opening up the whole model. Or you can use it as a starting point to test some variations on the logic.

SNAGHTML44d892ee

There are also a number of small bug fixes and tweaks and a number of issues that were raised on codeplex that have been fixed (we always tag closed issues with the release they were fixed in)

Posted On Tuesday, November 3, 2015 11:10 AM | Comments (1)

Friday, July 24, 2015 #

DAX Studio 2.2.6 released

Just release an update to DAX Studio bundling up the following fixes and enhancements:

  • Support for the upcoming release of Power BI Desktop
  • Dynamic Ribbon resizing
  • Remembering of Window size and location
  • fixing display of hierarchies in Excel 2013
  • fixing quoting of keywords in table names
  • fixing an error when duplicate column names are returned in a query
  • fixing an issue when changing databases when connected to a Multi-Dim server
  • fixing an issue when connecting as a non-admin

I’m particularly happy with the dynamic resizing that is part of the Fluent.Ribbon control. It took a bit of effort to figure out, but it provides a much better experience than having to scroll the ribbon when window size is reduced.

So on a wide screen display you still get the full ribbon in all it’s glory

image

Then on an older 1024px wide screen a few of the labels in the Edit and Find groups drop off

image

Then things dynamically shrink then the groups finally start to collapse into dropdown menus. So if you want to go split screen on a small display everything still works even at 500px wide without needing to scroll the ribbon.

image

Posted On Friday, July 24, 2015 7:27 AM | Comments (0)

Wednesday, June 24, 2015 #

Announcing DAX Studio 2.2.0

Today we released DAX Studio 2.2 which includes a number of big enhancements.

You can download the new release from http://daxstudio.codeplex.com/releases

Below is an outline of the new functionality you will find in this release.

Intellisense

DAX Studio now has intellisense support, it's a fairly basic implementation, but I think I've got it working fairly smoothly. Note that it’s based on the best practice of always prefixing columns with the tablename and never prefixing a measure with a table name. So if you type ‘table name’[  you will get intellisense for all the columns in the ‘table name’ table. But if you just type [ you will be presented with a list of all the measures in the model.

The intellisense brings up table, column and measure metadata information from the current model:

image (2)

As well as functions and keywords, including the descriptions for functions. The function metadata is coming from the DICOVER_FUNCTIONS dmv so it should accurately reflect the capabilities of the current data source (So you will see some additional functions when connected to newer data sources such as Power BI Designer):

image (1)

Tracing (Query Plans & Server Timings)

One of the most significant new features is support for the tracing features (Server TIming & Query Plans) against PowerPivot. This has involved a significant amount of work re-building the internal code which handles this, but the end effect is that the Query Plan and Server Timings buttons are now enabled when connected to a PowerPivot model. 

image

We've also altered the layout of the server timings information

image

I found when I was reading the results that I was always comparing Total Duration vs Total CPU and comparing the ratio of these two figures, so I've added a little "parallelism factor" under the Total CPU. I also found that I was looking to compare FE vs SE duration and the number of SE Queries to the number of SE Cache hits, so I've moved them onto the same lines.

Support for Multi-Dimensional SSAS Servers

Although it’s a little bit of a novelty, DAX Studio now properly supports connecting to multi-dimensional servers. This kind of worked previously if the default database for your multi-dimensional instance only had a single cube, but threw an error if it had multiple cubes. I’m now injected in a CUBE= parameter in the connection string when connected to a multi-dimensional server so this should no longer be an issue.

Support for Multiple Power BI Designer instances

We now have support for multiple instances of Power BI Designer, so if you have more than one copy of the designer open you can see the names of the .pibx files that you have open and choose which one to connect to.

image (3)

Highlighting of unnatural hierarchies

Unnatural hierarchies can have a negative impact on performance so we highlight those in the metadata pane.

image

Bug Fixes

There have been a number of minor bugs that have been fixed along the way, including a number that were not officially reported, but were discovered during testing.

  • FIXED - Doubling up of backslash characters when using DAX Formatter
  • FIXED - Unable to connect to SSAS servers when not an administrator
  • FIXED - changing the connection for a query window would sometimes throw an error
  • FIXED - source database changing on metadata refresh when connected to a SSAS server
  • FIXED - cannot connect to a PowerPivot model where the file path contains single quotes

Known Issues

The following issue which was first identified in 2.1.1 still remains:

  • "Linked Table" output for the Excel add-in only works for PowerPivot models if there is at least one table in the model that was created by linking data from an Excel table. We have not been able to find a workaround to this yet, it looks like it might be due to a bug in the Excel object model.

Enjoy!

Posted On Wednesday, June 24, 2015 6:58 AM | Comments (1)

Wednesday, May 13, 2015 #

SSAS Tabular–per table LastProcessed dates

I saw a question yesterday on the MSDN forums ask how to go about exposing the LastProcessed datetime per table. Marco has a great post here (https://www.sqlbi.com/articles/last-process-date-in-ssas-tabular/) about a number of ways to get the last processed date at the database level. But none of these techniques work at the table level.

Having a look through the various DMVs that are available and none of them seemed to have this information apart from DISCOVER_XML_METADAT which returns a large XML document which can be tricky to query. However the information in this DMV is also used to populate the AMO object model. So it should be possible to use AMO to pull out the last processed datetime per table and then push it in using a variation of the processAdd push mode processing.

I started by creating a new table in my database called LastProcessed using the following query as a source

SELECT
CAST('' as varchar(256)) as TableName,
Getdate() as LastProcessed
WHERE 1=2

This generates an empty table with a string column and a datetime column which will be the target for our processAdd push processing.

I was not exactly sure how to construct the XMLA for the processAdd command so I create a simple SSIS package using a Dimension Processing Destination in a DataFlow task and captured the XMLA it produced using SQL Profiler. The command that Profiler captured was missing the <parameters> section that you would use for pushing a batch of data so I adapted one of Greg Galloway’s excellent XMLA processAdd examples and then wrapped that in a powershell script.

The powershell script I created can be found here https://gist.github.com/dgosbell/8f4e2a6aa022a6fdd288

And can be run with a simple command like the following:

.\update-ssaslastupdated -serverName "localhost" -databaseName "advwrksLastProcessed" -tableName "LastProcessed"

Once that succeeds you can either expose the whole table or you could hide it and build measures to just expose the dates for specific tables.

Below is an example of querying this table in my adjusted Adventure Works database. You can see that ‘Internet Sales’ and ‘Reseller Sales’ have been processed at different times to the other tables in the database.

image

Posted On Wednesday, May 13, 2015 9:52 PM | Comments (0)

Friday, April 24, 2015 #

BIDS Helper 1.7.0 released

A few days ago the BIDS Helper team released of v1.7.0 which you can download from here: https://bidshelper.codeplex.com/releases

This version comes with a few bug fixes, and some new functionality, but the most significant enhancements are to the BIML Engine thanks to our friends at Varigence.

I’ve copied and pasted the release notes below. Note that there are some possible breaking changes in the new BIML Engine, so if you have existing BIML script please read this section carefully.

====== Release Notes =======

Fixed Issues:

  • Fixes to all features so they work when the project is inside a solution folder.
  • Fixed Biml project connections in Visual Studio 2013.
  • Reports launch in the proper version of the Report Viewer control in Visual Studio 2012 and 2013.

Improvements
  • Biml updates to bring it up-to-date with Mist 4.0
  • Biml updates to support emitting SSIS 2014 packages from Visual Studio 2013.
  • Biml brings the Output window to the front, displays a progress bar, and displays a more useful errors dialog
  • Enabling the ability to add a translation for the default language in Tabular Translations Editor to makefrictionless renaming easier.
  • Improvements to Tabular Actions Editor:
    • Adding button to test all MDX expressions
    • ensuring the drillthrough column dropdowns only show columns from related tables
    • Adding a button to let you copy an existing action
    • Adding a button to populate the drillthrough action columns with all columns in related tables (since it's quicker to manually delete than manually add columns from this list)
    • Allowing MeasureGroupMeasures to apply to any action type
  • Printer Friendly Dimension Usage for Tabular

Biml Details

There are a huge number of changes to Biml in this release. Below are highlights of some of the more notable changes and those changes that were most requested by users.
BREAKING CHANGES
  • RootNode.PackageProjects has now moved entirely into RootNode.Projects, including all namespaces. This is done to better support other project types such as analysis services projects. The mitigation is to change all instances of the <PackageProjects> tag to the <Projects> tag.
    • MITIGATION: If backwards compatibility with older versions of Mist is required, the PackageProjects/Projects element name can be conditionalized based on the tools version.
  • Removed QuerySource and Destination. These were abstractions that wrapped OleDbSource/AdoNetSource and OleDbDestination/AdoNetDestination in a single interface that offered the intersection of their features and automatically chose the correct component type to emit based on the referenced connection type. These language elements were not particularly useful, were very narrowly used, and their removal will prevent unintentional use by new customers.
    • MITIGATION: Change any instances of QuerySource or Destination to the appropriate OleDbSource/AdoNetSource and OleDbDestination/AdoNetDestination
  • All instances of the word Hadron have been removed from the project. Most notably, namespaces that included “Hadron” have 1:1 replacements with “Biml”
    • MITIGATION: Empty classes were added in the removed namespaces to prevent code with legacy namespace imports from breaking scripts
  • AstDimensionNode and AstFactNode have been removed. SSAS metadata is now added underneath the AnalysisMetadata element of table objects.
    • This is unlikely to affect most BIDSHelper users, since SSAS emission is only supported in Mist.
  • AnalysisServices connection property OverriddenConnectionString is renamed to ConnectionString. The Provider property is removed as it is no longer needed
    • This is unlikely to affect most BIDSHelper users, since SSAS emission is only supported in Mist.

Biml Language Changes
  • Directives
    • Added code directive to reference C# or VB files with code directly from Biml files. This enables a variety of scenarios around creating more complex and reusable .NET code for use within BimlScripts
    • Nested include files now work
  • Other
    • Added ObjectTag property to all Biml API classes. This is essentially a .NET Dictionary object that enables storage of .NET objects in a similar way to annotations in Biml.
    • Added support for external script project files in the language
    • Added GetPropertyValue method to all Biml API objects. This provides another option to get object property values when doing dynamic code generation.
    • Added Parse methods to all Biml API objects. This enables easy parsing of Biml code snippets to create Biml API objects. It is the inverse of the GetBiml functionality.
    • ConstraintMode has been set to default to Parallel, so that it no longer needs to be specified for most package objects
    • Added BuildSettings object to AstRootNode to access build settings in a central location
    • Added FriendlyName convenience property to columns
    • Fixed Biml reference resolution code so that references are updated even when DisableAllChangedEventInfrastructure is set to true. This is essential to enable the use of utility methods like IsUsedInPrimaryKey within live scripts on tables also imported within the live scripts
    • Added an IsDatabaseCharSetUnicode override property to Oracle connections

Biml API Changes
  • Added timeout support to all of the database providers and external data access methods
  • Fixed issue where properties changed via API were not being reflected in GetBiml output
  • Added GetDataTable support for most major DbConnection types
  • Added AstRootNode convenience collections for FlatFileFormats, RawFileFormats, PackageProjects, and CubeProjects
  • Fixed issue where AstRootNode convenience collections did not support name-based indexing in Live mode
  • Fixed an issue where StructureEquals was producing incorrect results due to a missing negation operator in autogenerated code

Biml SSIS Emission Changes
  • SSIS 2014 Support
  • New Tasks and Components
    • CDC Control Task
    • CDC Source
    • CDC Splitter
    • DQS Cleansing
  • Added SSIS Path Annotations to dataflow and control flow paths
  • Added support to SSIS emitter for emitting external columns without the corresponding output columns. This is primarily used for database datatypes that are unsupported by SSIS.
  • Changed DbType to DataType in column specifications
  • Fixed issue where a variable defined in an event handler could not be referenced from a descendent of that event handler.
  • Fixed build crash when cache connections are used on lookups
  • Improved emission of connection managers so that the individual connection properties are supplied, rather than just the connection string property
  • Fixed code generation issue reported to support with the Fuzzy Lookup component
  • Added support for the new AutoPassThroughInputColumns property on Fuzzy Lookup to give it a bit more flexibility
  • Added ServerExecutionID to pre-defined variables list for packages
  • Added support for expressions on project connections
  • Added PackagePassword property to AstPackageNode for encryption support

Biml Errors/Warnings
  • Fixed an issue with ValidationReporter where it would occasionally report line numbers into our temp files. This usually happened with mismatched end braces
  • Improved validator for SSIS data flow columns to show the duplicate column name when the same column is mapped multiple times
  • Fixed an issue where transformer generated errors were not being reported correctly. They previously appeared as Null Reference Exceptions because of an issue in the reporting mechanism.
  • Fixed issue where wrong column name was being shown in Script Component column type mismatch errors
  • Clarified error message when “Build and Run” or “Build and Execute” was performed without the correct version of SSIS installed
  • Added an error for duplicate dataflow column nodes. This usually arises with "reserved" column names "ErrorCode" and "ErrorText"
  • Added a warning and an automatic patch of import statements that use the old Hadron namespaces

Posted On Friday, April 24, 2015 7:02 AM | Comments (0)

Friday, April 10, 2015 #

SSRS–adding a pipe delimited text export option in Sharepoint Integrated mode

We are currently looking decommission an old reporting portal based on another technology stack and replace it with SSRS running in Sharepoint integrated mode. However one of the requirements was to support a “pipe delimited” CSV export format.

There are a number of posts you can find about changing reportserver.config when running SSRS in standalone mode, but I could not find much information about how to do this when running in Sharepoint Integrated mode.

MSDN documentation lists the various cmdlets, https://msdn.microsoft.com/en-us/library/gg492249.aspx but does not show examples of the parameter values which is pretty useless as some of the parameters require XML fragments.

This blog got me close http://blog.kuppens-switsers.net/it/customize-ssrs-extension-settings-sharepoint-integrated-mode/ and when combined with the information from this one http://www.mssqltips.com/sqlservertip/3379/modify-sql-server-reporting-services-rendering-format-in-sharepointintegrated-mode/ I was able to come up with the following powershell script which does what I needed.

You’ll need to run this from the Sharepoint Management Shell (I logged on to the console of one of our app servers to do this) using an account that has farm admin rights.

$svrDirectives = @"
<OverrideNames>
    <Name Language='en-US'>CSV (Pipe Delimited)</Name>
</OverrideNames>
"@
$extConfig = @“
<DeviceInfo>
    <FieldDelimiter>|</FieldDelimiter>
    <NoHeader>false</NoHeader>
    <FileExtension>csv</FileExtension>
    <ExcelMode>False</ExcelMode>
    <Encoding>ASCII</Encoding>
</DeviceInfo>
"@
$typeName = "Microsoft.ReportingServices.Rendering.DataRenderer.CsvReport,Microsoft.ReportingServices.DataRendering"

$apps = get-sprsserviceapplication
foreach ($app in $apps)
{
  New-SPRSExtension -identity $app -ExtensionType "Render" -name "CSVPipeDelimited" -TypeName $typeName –ServerDirectives $svrDirectives –ExtensionConfiguration $extConfig
}

This code loops over all the service applications in the current farm so even if you have a multi tenancy type situation it *should* work. Although if you are in this situation you should do your own testing. Our farm only has the one SSRS service application.

Posted On Friday, April 10, 2015 8:37 PM | Comments (2)

Wednesday, March 18, 2015 #

DAX Studio 2.1 Released

Today I am pleased to announce the release of the latest update to DAX Studio – v2.1.0

You can get it from the releases page on codeplex: http://daxstudio.codeplex.com/releases

Below is an outline of what’s new in 2.1. A big thanks to Daniele Perilli for his assistance with the graphics and Marco Russo for his work on the Query Plans and Server Timings tabs plus his help with testing this release.

UI Refresh

Thanks to assistance from Daniele Perilli on the graphics side we now have a lot more consistency in our ribbon and we’ve moved all the buttons onto the Home tab and moved the database dropdown to the metadata pane. This has enabled us to fit all the buttons you’d use regularly onto the home ribbon.

SNAGHTML17e91a9f

Metadata Search

We now support searching in the metadata pane. If you hover over the little magnifying glass in the top right of the metadata pane the search control will fly out and as you type the metadata will filter down to only show objects containing the characters you’ve typed.

The icon changes to a green colour imagewhen a search is active so that you know that you are not looking at the full set of metadata.

image

Search and Replace

We now support searching for text within a query document, including highlighting all matching text.

image

And we support replacing of text

image

Both search and replace include the ability to do case-sensitive searches, regular expressions, wildcards and full word matches. These dialogs use the same hotkeys as Visual Studio Ctrl-F for Find and Ctrl-H for Replace.

Improved Server Timings Tab

We now show a much nicer view of the aggregate timing details as well as showing the detailed scan events with their timings. You also have the option of showing cache and internal events although these are hidden by default.

image

Improved Query Plan tab

The QueryPlans are now pre-processed to make them easier to read and the total number of records for each line is split out so that it can be clearly seen and so that you can sort by this column to find the operations which are traversing large numbers or rows.

image

Save Query Results to a File

Thanks to the codeplex user mogular who submitted an initial code sample for this feature. You can now export query results to either a comma-separated (csv) file or a tab-delimited (txt) file by choosing the File output option.

SNAGHTML1ff334e4

Using Semantic Versioning for releases

I’ve never really had a strict way of assigning version numbers, but I was not really happy with using the full 4 part version that is generated by .Net. So I’ve decided to start using a variation of Semantic Versioning for the releases. This uses a 3 part version number <major>.<minor>.<patch> where:

<major> – gets updated for significant or breaking changes

<minor> – gets updated for new features

<patch> – gets updated for fixes to existing features

and updates to a higher number reset the lower ones.

Fixes

There have been numerous small fixes but some of the notable ones are:

  • Excel Add-in Regional Support – there was a bug in v2.0 which caused the Excel add-in to fail to load correctly on PCs that had regional settings that did not use a period (.) as the decimal separator. This has now been fixed and the Excel add-in should work regardless of the regional settings on the PC
  • New Version notification – this was actually silently failing in v2.0
  • Updates to syntax highlighting definitions
  • Improvements to the Excel “Linked Table” output 

Known Issues

  • "Linked Table" output for the Excel add-in only works for PowerPivot models if there is at least one table in the model that was created by linking data from an Excel table. We have not been able to find a workaround to this yet, it looks like it might be due to a bug in the Excel object model.
  • I’ve temporarily removed the “Quick Access” buttons in the title bar as they are not working consistently at the moment.

Posted On Wednesday, March 18, 2015 7:17 AM | Comments (4)

Monday, February 2, 2015 #

The Care and Feeding of SSAS in Production - SQL Saturday 365 Melbourne

SQLSAT365_SPEAKINGWe are not far away from SQL Saturday #365 which is occurring on Feb 7th, 2015. If you live in or around Melbourne it’s worth considering making the effort to come along. We have a great line up of 30 different sessions with speakers from 7 different countries including 16 MVPs.

This year I’m doing a talk called “The Care and Feeding of Analysis Services in Production”. I was partly inspired by the “Accidental DBA” topics that I’ve seen people doing for the relational engine and figured it was time someone did something similar for SSAS.

I’m going to talk about both Tabular and Multi-Dimensional, so there should be something in there for everyone.

Below is the session abstract:image

A lot of the information you'll find on Analysis Services is focused around the initial creation of databases and models, but once you have a solution deployed to production then what? In this session we will look at what it takes to run an Analysis Services server in production. What are the basics that you need to know about how the server works. Including things like threading, memory usage and locking. How can you monitor the health of your server? What tools can you use to find out what's happening on your server? We'll have a look at what you should be monitoring to make sure your system is running properly and run through what to investigate went things don't run as smoothly as you'd like.

Posted On Monday, February 2, 2015 6:31 AM | Comments (1)

Monday, December 8, 2014 #

DAX Studio v2–Christmas comes early

Read the following in your best movie trailer voice….

It’s been a long time coming….

19 months in the making….

5,000 lines of XAML code….

7,000 lines of C# code…

With the ability to run both inside of Excel and as a Standalone program

Version 2 of DAX Studio is finally here.

Maybe that’s a bit melodramatic, but maybe you get a hint of how exciting it is to finally be able to share this with you all. Version 2 is pretty close to full re-write of the user interface. And in the process there have been a number of difficult hurdles to overcome. So it’s taken a bit longer than anticipated, but it’s finally ready.

Below is a screenshot of the new user interface, which I think you’ll agree looks pretty slick.

hero_image 

You can download the latest version from http://daxstudio.codeplex.com under the downloads tab. The documentation tab has also been updated to cover all the new features in in v2.

Some of the exciting new features are:

  • An “Office 2013” ribbon window
  • True tabular metadata (modelled of the PowerView metadata pane)
  • Integrated tracing support
  • Bracket matching
  • Version update notification
  • comment / uncomment support
  • a single universal installer
  • plus many more 

But probably more importantly the code has been re-architected in a more modular structure so it should be easier to extend and improve going forward. There are still plenty of features that I’d like to add and it should be possible to do some smaller releases now that the major re-structuring is complete.

enjoy :)

Posted On Monday, December 8, 2014 11:18 PM | Comments (6)

Monday, November 17, 2014 #

SQL Saturday #365–Melbourne Feb 7 2015

Things have been a little quite around here recently, one of the reasons for that is that I’ve been hard at work as a committee member for SQL Saturday 365 which will be held in Melbourne, Australia on Feb 7th 2015.

Following on from Melbourne’s first highly successful event earlier this year, next year's event promises to be bigger and better.

Located again at the Caulfield Campus of Monash University, the event will have a mixture of local, interstate and international speakers. And for the very first time there will be full day pre-con sessions available for a very reasonable price on the Friday before the event.

You can find out more information about the event here:
https://www.sqlsaturday.com/365/eventhome.aspx

You can register to attend here:
https://www.sqlsaturday.com/365/register.aspx

If you would like to present a session at SQL Saturday you can submit a proposal here:
https://www.sqlsaturday.com/365/callforspeakers.aspx

Pre Cons
This year there will also be 3 Pre-Con Full Day Sessions on the Friday.  These present amazing value for money with a full day training by top experts for only $315 ($265 for early-bird registrations). The session details and registration links are as follows:

BIML Bootcamp
with Reeves Smith, Peter Avenant, Warwick Rudd and Paul Schmidt

Mastering Execution Plan Analysis
with Paul White

Practical Power BI
with Peter ter Braake

Full details of the PreCon sessions and how to register are in the links.

Posted On Monday, November 17, 2014 9:37 PM | Comments (0)

Monday, July 28, 2014 #

The perils of calculating an Average of Averages

I've seen questions around issues calculating averages come up a few times in various forums and it came up again last week and I feel that there is some benefit in walking through the details of this issue. For many of you the following will be nothing new, but I'm hoping that this may serve as a reference that you can point to when you get requests for this sort of calculation.

The core issue here is really a fundamental mathematical one. Personally I see it surfacing most often in DAX and MDX as those are languages that I spend a lot of time with, but also because of their multi-dimensional natures you need to be able to write generic calculations that will work regardless of how the end users slice and dice the data.

The discussions invariably start with a statement like the following:

"I have a calculated measure that an average, but my totals are calculating incorrectly"

There are 2 different issues I see relating to this.

The first one is trying to use the AVG() function in MDX. Basically if you want an average calculation that works with all your different dimensions then avoid this function. The AVG function in MDX calculates the average over a fixed set. You may be able to use it in a static query, but to calculate an average in your MDX script simply create the two base measures - a sum and a count, then divide the sum by the count. This is not as much of an issue in DAX as the built-in AVERAGE, AVERAGEA and AVERAGEX generally work as expected.

The other sort of question that I see is related to how the totals are calculated and the question is usually something like the following:

"I have an average measure calculated by doing sum / count - which produces the correct average for each row, but the total is calculated as "sum of sums" / "sum of counts" and my user wants to see it as the average of all the averages."

And to put it bluntly this requirement is invalid. You should never "total" a series of averages by averaging them. The easiest way to explain why this is the case is to illustrate with some data. So let's have a look at a few scenarios.

The first problem you will see with the "average of averages" approach is that it gives too much weight to outlying amounts.

Category

Amount

Count

Average

Bikes

1,000

1

1,000

Helmets

10,000

1,000

10

TOTAL

11,000

1,001

???

Given the data above how should we calculate the total average? if we do the "average of averages" approach we have:

(1000 + 10) / 2 = 505

If we take the SUM(Amount) / SUM(Count) approach we get the following:

11000 / 1001 = 10.99

This is an extreme example to prove a point, but which do you think is correct? Should the 1 bike we sold for $1000 skew the average to $505 or should the fact that it was just one product out of 1001 mean that the average should only be $10.99?

Your business user might be happy seeing a higher average amount, but what if the situation was reversed and we had sold 1000 bikes and just one helmet? This would make the "average of averages" still equal 505 while recalculating the average at the total level would give us $999.01 - I know which calculation I think is giving a better indication of the total average sales.

It's possible that you may be thinking at this point that this is not so much of a big deal for you because you don't have that sort of variability in your data. However that is only the start of the issues. If you are still unsure about the evils of averaging averages then read on because it only gets worse.

To show the next nasty side effect we need to look at just a little bit more data. Take the following 4 records for example where we have data split between 2 cities and 2 product categories

City Category

Amount

Count

Melbourne Bikes

18

3

Melbourne Helmets

25

5

Seattle Bikes

21

3

Seattle Helmets

16

4

When we group the data by City we get the following results. The "Total" line is where the average is recalculated at the total level. Where as the "Avg of Averages" line is where I've take the average of the 2 City averages.

City

Amount

Count

Average

Melbourne

43

8

5.38

Seattle

37

7

5.29

Avg of Averages    

5.34

Total

80

15

5.33

Now lets have a look at what happens to the figures when we group the data by the product category. Notice that the Total line has remained unchanged, but the "Avg of Averages" is now different!

Category

Amount

Count

Average

Bikes

39

6

6.5

Helmets

41

9

4.56

Avg of Averages    

5.53

Total

80

15

5.33

This sort of behaviour - where the figures reported for total and sub-totals will vary depending on how the data is sliced and diced - will be the death of your BI project.

Trust - the most important "feature" of any BI project

I would argue that possibly the most important "feature" of any BI solution is trust. You can always add features and missing functionality, but it can be very difficult to win back the trust of your users once it's been lost. And nothing will erode the trust of your users than seeing inconsistent results.

It's not just straight Averages that are the issue

Anytime you are mixing calculations that do sums and divisions you need to be careful of the order of operations. Ratios, Percentages and moving averages are just a few of the examples of other calculation types for which you need to take care of the order in which add and divide things.

Posted On Monday, July 28, 2014 7:18 AM | Comments (8)

Thursday, July 3, 2014 #

The case of the vanishing KPIs

I was contacted today with an interesting issue, we had a tabular model that had some KPIs which were not showing up in Power View.

The first thing I checked was the version setting on the model. KPI support was not added to tabular models in SP1. If your model is set to a compatibility version of RTM (1100) Power View will detect this and will effectively not ask for metadata about the KPIs.

However in this case when we checked the database properties from SSMS the compatibility setting appeared to be correctly set to SP1 (1103)

image

So the next thing I did was to open a profiler trace and look at the metadata queries that Power View executed as it started up. Excel treats SSAS Tabular models as if they were multi-dimensional models and queries the metadata using a number of different DISCOVER queries against different schema rowsets. When SSAS Tabular was developed a new schema rowset was introduced called DISCOVER_CSDL_METADATA which is what DAX clients like Power View use to populate their field browser windows.

Checking the command I could see that it was correctly requesting a version 2.0 recordset. If the model was set to a compatibility setting of RTM (1100) or if there was a problem detecting the compatibility setting of the model you may see a 1.0 in the version restriction. Version 1.0 CSDL will not include KPI information. This is so that client tools can specify the version of metadata which they know how to handle.

image

At this point it looks like Power View and SSAS are correctly talking to each other, but we are still no closer to understanding why the KPIs are visible in Excel, but not in Power View.

The next thing to looks at was the actual response returned by the metadata query to see if there was anything strange in there. To do that I took the RestrictionList and PropertyList elements from the profiler trace and inserted them into the Restrictions and Properties elements in the query below. I also had to remove the LocaleIdentifier and DataSourceInfo elements from the PropertyList as these related to a different session. Below is an example of a DISCOVER_CSDL_METADATA command which can be run from an XMLA window in SSMS.

<Discover xmlns="urn:schemas-microsoft-com:xml-analysis">

    <RequestType>DISCOVER_CSDL_METADATA</RequestType>
   
    <Restrictions>
        <RestrictionList xmlns="urn:schemas-microsoft-com:xml-analysis">
            <CATALOG_NAME>Adventure Works Tabular</CATALOG_NAME>
            <VERSION>2.0</VERSION>
        </RestrictionList>
    </Restrictions>
    <Properties>
        <PropertyList xmlns="urn:schemas-microsoft-com:xml-analysis">
            <Catalog>Adventure Works Tabular</Catalog>
            <Content>SchemaData</Content>
            <Format>Tabular</Format>
        </PropertyList>
    </Properties>
</Discover>

(you can simply replace the 2 references to the Catalog name in the XMLA below to run this against one of your models)

When I searched in the results for "KPI" I came across the following interesting piece of data.

SNAGHTML18885ef1

Notice the Hidden="true" attribute? It turns out that the original developer decided to hide the measures before creating the KPI which resulted in the KPI itself being hidden. Setting the Hidden property to false on the measure fixed this issue. Mystery solved.

So although the end solution turned out to something simple I thought it might be interesting to share the process

A footnote

Note that we still have one minor issue, now in Excel we can now see both the KPI and the measure, while in Power View we only see the KPI. My suspicion is that this may be a bug in the MDSCHEMA_MEASURES rowset which Excel uses to find out what measures a model has. My opinion is that in order to be consistent with Power View that measures which are used for KPI values should not also be displayed as "normal" measures.

Posted On Thursday, July 3, 2014 7:33 AM | Comments (1)

Friday, May 23, 2014 #

BI Survey 14

It's BI Survey time again :)

If you haven't done this before here is a little background on it from the guys that run it:

The BI Survey, published by BARC, is the world's largest and most comprehensive annual survey of the real world experiences of business intelligence software users. Now in its fourteenth year, The BI Survey regularly attracts around 3000 responses from a global audience. It provides an invaluable resource to companies deciding which software to select and to vendors who want to understand the needs of the market.


The Survey is funded by its readers, not by the participant vendors. As with the previous thirteen editions, no vendors have been involved in any way with the formulation of The BI Survey. Unlike most other surveys, it is not commissioned, sponsored or influenced by vendors.

Here is a link to the survey:

https://digiumenterprise.com/answer/?link=1981-ZYQSEY8B

If you take the survey you will get access to a summary of the results. By helping to promote the survey here I'll get access to some more detailed results including some country specific analysis so it will be interesting to see the results.

Posted On Friday, May 23, 2014 6:45 AM | Comments (0)

Friday, May 16, 2014 #

Function Folding in #PowerQuery

Looking at a typical Power Query query you will noticed that it's made up of a number of small steps. As an example take a look at the query I did in my previous post about joining a fact table to a slowly changing dimension. It was roughly built up of the following steps:

  1. Get all records from the fact table
  2. Get all records from the dimension table
  3. do an outer join between these two tables on the business key (resulting in an increase in the row count as there are multiple records in the dimension table for each business key)
  4. Filter out the excess rows introduced in step 3
  5. remove extra columns that are not required in the final result set.

If Power Query was to execute a query like this literally, following the same steps in the same order it would not be overly efficient. Particularly if your two source tables were quite large. However Power Query has a feature called function folding where it can take a number of these small steps and push them down to the data source. The degree of function folding that can be performed depends on the data source, As you might expect, relational data sources like SQL Server, Oracle and Teradata support folding, but so do some of the other sources like OData, Exchange and Active Directory.

To explore how this works I took the data from my previous post and loaded it into a SQL database. Then I converted my Power Query expression to source it's data from that database. Below is the resulting Power Query which I edited by hand so that the whole thing can be shown in a single expression:

let
    SqlSource = Sql.Database("localhost", "PowerQueryTest"),
    BU = SqlSource{[Schema="dbo",Item="BU"]}[Data],
    Fact = SqlSource{[Schema="dbo",Item="fact"]}[Data],
    Source = Table.NestedJoin(Fact,{"BU_Code"},BU,{"BU_Code"},"NewColumn"),
    LeftJoin = Table.ExpandTableColumn(Source, "NewColumn"
                                  , {"BU_Key", "StartDate", "EndDate"}
                                  , {"BU_Key", "StartDate", "EndDate"}),
    BetweenFilter = Table.SelectRows(LeftJoin, each (([Date] >= [StartDate]) and ([Date] <= [EndDate])) ),
    RemovedColumns = Table.RemoveColumns(BetweenFilter,{"StartDate", "EndDate"})
in
    RemovedColumns

If the above query was run step by step in a literal fashion you would expect it to run two queries against the SQL database doing "SELECT * …" from both tables. However a profiler trace shows just the following single SQL query:

select [_].[BU_Code],
    [_].[Date],
    [_].[Amount],
    [_].[BU_Key]
from
(
    select [$Outer].[BU_Code],
        [$Outer].[Date],
        [$Outer].[Amount],
        [$Inner].[BU_Key],
        [$Inner].[StartDate],
        [$Inner].[EndDate]
    from [dbo].[fact] as [$Outer]
    left outer join
    (
        select [_].[BU_Key] as [BU_Key],
            [_].[BU_Code] as [BU_Code2],
            [_].[BU_Name] as [BU_Name],
            [_].[StartDate] as [StartDate],
            [_].[EndDate] as [EndDate]
        from [dbo].[BU] as [_]
    ) as [$Inner] on ([$Outer].[BU_Code] = [$Inner].[BU_Code2] or [$Outer].[BU_Code] is null and [$Inner].[BU_Code2] is null)
) as [_]
where [_].[Date] >= [_].[StartDate] and [_].[Date] <= [_].[EndDate]

The resulting query is a little strange, you can probably tell that it was generated programmatically. But if you look closely you'll notice that every single part of the Power Query formula has been pushed down to SQL Server. Power Query itself ends up just constructing the query and passing the results back to Excel, it does not do any of the data transformation steps itself.

So now you can feel a bit more comfortable showing Power Query to your less technical Colleagues knowing that the tool will do it's best fold all the  small steps in Power Query down the most efficient query that it can against the source systems.

Posted On Friday, May 16, 2014 7:40 AM | Comments (0)