At a client, they are interested in trying out some data cubes, so I spun up an instance of SSAS and gave them a demo of some of the awesome opportunities that cubes unlock – but found an interesting puzzle trying to cube-ify some of their data.
The client sells subscriptions for their services – and their subscriptions can be in different statuses at different times. Whether or not they knew it, their subscriptions where a slowly changing dimension. And various departments wanted to know how many of their subscriptions might be in a given status at a given time.
With SQL (or MDX), it is fairly straightforward to right to right queries that use can return what rows call within a start and end date, but with a self-service BI style pivot table in Excel, that may not be so easy – so my first goal was to change from date ranges to effective dates.
I created a view of the data, where one row for a one year subscription becomes 365 rows – one for every day of the year. I then created a measure that does a distinct count of Subscription IDs - This means that in Excel you can filter the data on a given day and know what subscriptions apply to that day.
However, with 200,000 active subscriptions on any given day, and 365 days in a year, that means 73 million rows of data per year. Sadly – the distinct count on all that data was less than speedy. At that point, the suggestion was to change from a measure to a calculation.
Instead of a distinct count of the Subscription ID in the subscription fact table, a Subscription Dimension was created and a calculated measure added which does a distinct count on Subscription ID.
Where the distinct count was scanning through millions of rows in the fact table for an answer, the calculation can focus on only the thousands of rows that match the given query. The run time on more complex queries now shifted from minutes to seconds – not ideal, but definitely tolerable.