Archive for the ‘PowerPivot’ Category
A New Events-In-Progress DAX Pattern
I’ve been working on a very complex SSAS Tabular implementation recently, and as a result I’ve learned a few new DAX tricks. The one that I’m going to blog about today takes me back to my old favourite, the events-in-progress problem. I’ve blogged about it a lot of times, looking at solutions for MDX and DAX (see here and here), and for this project I had to do some performance tuning on a measure that uses a filter very much like this.
Using the Adventure Works Tabular model, the obvious way of finding the number of Orders on the Internet Sales table that are open on any given date (ie where the Date is between the dates given in the Order Date and the Ship Date column) is to write a query something like this:
EVALUATE
ADDCOLUMNS (
VALUES ( 'Date'[Date] ),
"OpenOrders",
CALCULATE (
COUNTROWS ( 'Internet Sales' ),
FILTER( 'Internet Sales', 'Internet Sales'[Ship Date] > 'Date'[Date] ),
FILTER( 'Internet Sales', 'Internet Sales'[Order Date] <= 'Date'[Date] )
)
)
ORDER BY 'Date'[Date]
On my laptop this executes in around 1.9 seconds on a cold cache. However, after a bit of experimentation, I found the following query was substantially faster:
EVALUATE
ADDCOLUMNS (
VALUES ( 'Date'[Date] ),
"OpenOrders",
COUNTROWS(
FILTER(
'Internet Sales',
CONTAINS(
DATESBETWEEN('Date'[Date]
, 'Internet Sales'[Order Date]
, DATEADD('Internet Sales'[Ship Date],-1, DAY))
, [Date]
, 'Date'[Date]
)
)
)
)
ORDER BY 'Date'[Date]
On a cold cache this version executes in just 0.2 seconds on my laptop. What’s different? In the first version of the calculation the FILTER() function is used to find the rows in Internet Sales where the Order Date is less than or equal to the Date on rows, and where the Ship Date is greater than the Date. This is the obvious way of solving the problem. In the new calculation the DATESBETWEEN() function is used to create a table of dates from the Order Date to the day before the Ship Date for each row on Internet Sales, and the CONTAINS() function is used to see if the Date we’re interested in appears in that table.
I’ll be honest and admit that I’m not sure why this version is so much faster, but if (as it seems) this is a generally applicable pattern then I think this is a very interesting discovery.
Thanks to Marco, Alberto and Marius for the discussion around this issue…
Comments And Descriptions In DAX
With my Technitrain hat on I’m sitting in on Marco’s Advanced DAX course in London today, and the question of comments in DAX came up – which reminded me that this is something I’ve been meaning to blog about. DAX as a language supports comments, but unfortunately it’s not possible to add comments inside a DAX measure or calculated column expression in either PowerPivot or SSAS Tabular right now (which is where they’re most needed – I hope this changes in the future). That said, there are some other things you can do to add textual explanations and descriptions to your DAX measure code.
Before we get onto the workarounds, a quick word about comments in DAX. These can only be used in DAX queries, and the types of comment supported are the same as in MDX: double-dashes and double-forward-slashes for single line comments, and forward-slash-asterisk to start a multi-line comment and asterisk-forward-slash to close a multi-line comment. Here’s an example:
--single line comment
//another single line comment
/*a multi-line
comment*/
evaluate table1
What can be done with measures though? After all, that’s where the most complex DAX is usually written.
First of all, you can add a description to a measure by right-clicking on it in the measure grid and selecting Description:
Unfortunately this description is not easily accessible to end users anywhere (it would be great if it appeared as a tooltip in a PivotTable, for example) but it can be seen in an Excel worksheet by running a DMV query. DMV queries can be run in Excel 2013 in the same way as DAX queries, using a query table as described here; the DMV query to use is:
select
measure_name as [Measure Name], [description], measure_is_visible
from $system.mdschema_measures
Unfortunately all hidden and implicit measures are returned, and even when the table is filtered so that only measure_is_visible=true there are still a lot of measures that probably shouldn’t be shown.
Similarly, descriptions can be added to any column (calculated or not) in your model, again by right-clicking on it and selecting Description.
This description can be displayed in the worksheet using the following DMV query:
select
hierarchy_name as [Column Name], [description] as [Description]
from $system.mdschema_hierarchies
where cube_name='model'
You can also write text direct to cells in the measure grid too. When I first saw a customer do this I was worried that it might not be supported, but I’ve been told that it is; so long as you don’t use the =: used for defining measures then you should be ok.
This is probably the best way to add comments to your code, if only because it’s the most visible to anyone looking at your PowerPivot/SSAS Tabular model. Of course, for it to be effective you’ll need to have a system for arranging your measures in the measure grid; in “SQL Server Analysis Services 2012: The BISM Tabular Model”, Marco, Alberto and I recommended that you arrange all your measures in the top-left hand corner of the measure grid and I think that’s still a good idea, but the use of text in cells to create headings for groups of measures as well as descriptions can help a lot too.
Accumulating Data In An Excel Table Using Data Explorer and PowerPivot
One of the first questions I get asked after showing someone PowerPivot for the first time is “Can I add new data to a PowerPivot table that already has data in it?”. Out of the box, of course, the answer is no: when you process a table in PowerPivot you have to reload all the data from your data source, you can’t just append new data (unless you’re using copy/paste to load data, which isn’t a good idea). However, there are a lot of self-service BI scenarios where the ability to do this would be extremely useful: for example, you might want to scrape stock quotes from a web page every day and then, in an Excel workbook, accumulate that data in a table so you can analyse historical stock prices with PowerPivot. I ran into a scenario very much like this last week and I thought that Data Explorer should be able to help here. It can, but it’s not obvious how to do it – hence this blog post!
Here’s a super-simple example of how to accumulate data in a table then. Let’s start with a csv file that contains the following data:
Product,Sales
Apples,1
Oranges,2
It’s straightforward to import this data into Excel using Data Explorer and the ‘From csv’ data source:
Here’s the code that Data Explorer generates:
let
Source = Csv.Document(File.Contents("C:\InputData.csv")),
FirstRowAsHeader = Table.PromoteHeaders(Source),
ChangedType = Table.TransformColumnTypes(FirstRowAsHeader,
{{"Product", type text}, {"Sales", type number}})
in
ChangedType
Now, let’s imagine that you want to keep the data from this file in Excel and every time you click Refresh in Data Explorer you add the data from the file onto the end of the existing data you’ve already captured. The first thing you’ll probably want to do in this scenario is add a new column to the data that gives the date and time that the data was loaded, and you can do that quite easily in Data Explorer using the DateTimeZone.UtcNow() function as follows:
Table.AddColumn(ChangedType, "Load Date", each DateTimeZone.UtcNow())
Data Explorer has functionality to append the data from one query onto the end of another query, but the problem you have to solve now is that when you click Refresh you want the new data to be appended onto the end of the data that has already been collected. It’s a recursive scenario not unlike the one I grappled with here. The solution to this problem is to first of all load the data into the PowerPivot (ie what we should be calling the Excel Data Model now) by clicking on the Load To Data Model link in the Data Explorer query pane:
Then, on a new sheet, create an Excel query table that returns all the data from the PowerPivot table that you’ve just loaded data into. Kasper shows how to do this here; there’s no need for any special DAX, you just need to connect to the PowerPivot table in the Existing Connections dialog:
At this point you should have two tables on two sheets that contain the same data. The next step is to modify the original Data Explorer query so that it contains a new step that appends data from the table you’ve just created (ie the table getting the data from PowerPivot) onto the data from the csv file. This can be done with three new steps, first to get the data from the new Excel table:
Excel.CurrentWorkbook(){[Name="ExistingData"]}[Content]
Then to make sure the Load Date is treated as a DateTimeZone type:
Table.TransformColumnTypes(GetExistingData,{{"Load Date", type datetimezone}})
Then finally to combine the two tables:
Table.Combine({ChangedType1,InsertedCustom})
Now, whenever you Refresh your Data Explorer query, you will see the data from the csv file appended to the data that has already been loaded:
Here’s the complete code:
let
Source = Csv.Document(File.Contents("C:\InputData.csv")),
FirstRowAsHeader = Table.PromoteHeaders(Source),
ChangedType = Table.TransformColumnTypes(FirstRowAsHeader,
{{"Product", type text}, {"Sales", type number}}),
InsertedCustom = Table.AddColumn(ChangedType, "Load Date", each DateTimeZone.UtcNow()),
Custom1 = Excel.CurrentWorkbook(){[Name="Table_Input_Data"]}[Content],
ChangedType1 = Table.TransformColumnTypes(Custom1,{{"Load Date", type datetimezone}}),
Custom2 = Table.Combine({ChangedType1,InsertedCustom})
in
Custom2
Now as I said, this is just a super-simple example and in the real world you’d need extra functionality to do things like delete rows you’ve already loaded and so on; but that’s all doable I think. It’s also worth mentioning that I encountered some strange errors and behaviour when implementing this, partly due to Data Explorer still being in preview I guess, so if you want to recreate this query you’ll need to follow my instructions exactly.
You can download the sample workbook here, and the csv file here.
PowerPivot Workbook Size Optimizer
Browsing through my RSS feeds this morning, I saw a new download on the Microsoft site: an Excel addin (Excel 2013 only, I think) called the PowerPivot Workbook Size Optimizer. You can get it here:
http://www.microsoft.com/en-us/download/details.aspx?id=38793
Here’s the blurb from the site:
The Workbook Size optimizer for Excel can better compress data inside workbooks that use PowerPivot or PowerView if this data comes from external data sources. The best size compression can be achieved for workbooks based on SQL Server databases and there are a few tricks we can do for other SQL datasources as well. The optimizer will install as an add in to excel and will provide you with a nice wizard to better compress the size of your workbook. Using the optimizer you can often get more than 1,000,000 rows datasets in a workbook under 10 MB, share it in SharePointOnline and interact withit using the Excel Web App in any browser.
Here’s a screenshot:
Despite a testing a few models with data from Adventure Works I couldn’t get it to suggest any changes (it didn’t spot that I had imported a column containing binary data, hmmm) but I guess it needs more testing on larger/more diverse data sources. Maybe there’s a blog post coming from the PowerPivot team coming soon explaining how to use this?
UPDATE: after playing around with it a bit more, I was able to get it to suggest some changes to tables. Marco has some more details:
http://sqlblog.com/blogs/marco_russo/archive/2013/04/30/powerpivot-workbook-size-optimizer-powerpivot-tabular.aspx
And there’s a white paper on the rules that it uses:
http://office.microsoft.com/en-gb/excel-help/create-a-memory-efficient-data-model-using-excel-2013-and-the-powerpivot-add-in-HA103981538.aspx
LightSwitch and Self-Service BI
Visual Studio LightSwitch has been on my list of Things To Check Out When I Have Time for a while now; my upcoming session on the uses of OData feeds for BI at the PASS BA Conference (which will be a lot more exciting than it sounds – lots of cool demos – please come!) has forced me to sit down and take a proper look at it. I have to say I’ve been very impressed with it. It makes it very, very easy for people with limited coding skills like me to create data-driven line-of-business applications, the kind that are traditionally built with Access. Check out Beth Massi’s excellent series of blog posts for a good introduction to how it works.
How does LightSwitch relate to self-service BI though? The key thing here is that aside from its application-building functionality, LightSwitch 2012 automatically publishes all the data you pull into it as OData feeds; it also allows you to create parameterisable queries on that data, which are also automatically published as OData. Moreover, you can publish a LightSwitch app that does only this – it has no UI, it just acts as an OData service.
This is important for self-service BI in two ways:
- First of all, when you’re a developer building an app and need to provide some kind of reporting functionality, letting your end users connect direct to the underlying database can cause all kinds of problems. For example, if you have application level security, this will be bypassed if all reporting is done from the underlying database; it makes much more sense for the reporting data to come from the app itself, and LightSwitch of course does this out of the box with its OData feeds. I came across a great post by Paul van Bladel the other day that sums up these arguments much better than I ever could, so I suggest you check it out.
- Secondly, as a BI Pro setting up a self-service BI environment, you have to solve the problem of managing the supply of data to your end users. For example, you have a PowerPivot user that needs sales data aggregated to the day level, but only for the most recent week, plus a few other dimension tables to with it, but who can’t write the necessary SQL themselves. You could write the SQL for them but once that SQL is embedded in PowerPivot it becomes very difficult to maintain – you would want to keep as much of the complexity out of PowerPivot as possible. You could set up something in the source database – maybe a series of views – that acts as a data supply layer for your end users. But what if you don’t have sufficient permissions on the source database to go in and create the objects you need? What if your source data isn’t actually in a database, but consists of other data feeds (not very likely today, I concede, but it might be in the future)? What if you’re leaving the project and need to set up a data supply layer that can be administered by some only-slightly-more-technical-than-the-rest power user? LightSwitch has an important role to play here too I think: it makes it very extremely easy to create feeds for specific reporting scenarios, and to apply security to those feeds, without any specialist database, .NET coding or SQL knowledge.
These are just thoughts at this stage – as I said, I’m going to do some demos of this in my session at the PASS BA Conference, and I’ll turn these demos into blog posts after that. I haven’t used LightSwitch as a data provisioning layer in the real world, and if I ever do I’m sure that will spur me into writing about it too. In the meantime, I’d be interested in hearing your feedback on this…
UK/US Date Format Bug in PowerPivot and SSAS Tabular
I don’t usually blog about bugs, but this one has been irritating me no end for the last year – so I thought it deserved some publicity…
In Excel 2010 PowerPivot and and in SSAS 2012 Tabular models (but not the Excel 2013 Data Model interestingly), if you have an English locale that is not US English (eg UK or Australian English), you may find that date columns appear to be formatted correctly as dd/mm/yyyy inside the PowerPivot window or in SSDT, but when you get to Excel you see the dates formatted in the US mm/dd/yyyy format. So, for example, on my laptop if I import the DimDate table from Adventure Works into Excel 2010 then I see dates formatted as dd/mm/yyyy as I’ve specified in the Formatting section of the ribbon in the PowerPivot window:
However, in an Excel PivotTable, I see dates formatted as mm/dd/yyyy:
There is a workaround though, which I found on the PowerPivot forum (thank you Steve Johnson, if you’re reading) – you can get the dates to format correctly if you go to More Date Formats and choose dd/MM/yy or one of the other formats from the dropdown list that appears:
Here are the correctly formatted dates in a PivotTable:
It seems like there is already a Connect open on this issue here, so please vote to get it fixed!
Dynamic DAX Query Tables in Excel 2013
PivotTables are all well and good, but sometimes when you’re building reports you just want a plain old list of things. Excel tables are perfect for this, and in Excel 2013 you can bind a table to the results of a static DAX query against the Excel Data Model. Unfortunately it’s not possible to make this query dynamic without a bit of VBA – so in this post I’ll show you how to do it.
Before I start, though, you may be thinking “What’s the point of this?”. After all, if you have too much data for the native Excel table functionality to handle, you can always use the Excel Data Model and make a PivotTable look just like a table, and when you do that you can use filters, slicers and so on to control what gets displayed. This is certainly a valid approach but the big disadvantage of a PivotTable is that it doesn’t always give you the best possible performance because of the way it generates its MDX, and because DAX queries are anyway faster than MDX queries for this kind of detail-level reporting. For large tables with lots of columns then a hand-rolled DAX query might give you significantly better performance than a PivotTable, as well as more control over the filtering logic.
Let’s look at a worked example…
Step 1: Import some data into a table
For my example, I have imported the DimDate table from the Adventure Works DW database in SQL Server into a table in Excel.
The key thing to remember at this point is to make sure you check the box to add the data to the Excel Data Model:
Step 2: Define a DAX query for this table
Kasper shows here how to use a static DAX query to populate a table in Excel, so I won’t repeat what he says. All I’ve done in my example is to change the table to use the following DAX query:
evaluate DimDate
…which returns the whole contents of the DimDate table, so in fact at this point the table looks exactly the same as it did before I made this change.
Step 3: Add some UI to allow the user to filter the data
Now I want the user to be able to filter this table in two ways:
1. By using a slicer to control which days of the week are displayed
2. By entering a value into a cell, and filtering the table so only the rows where the day number of the month is greater than that value
Here’s what this looks like:
I’ve also added a ‘Run Report’ button onto the worksheet for the user to press when they want to refresh the data in the query
Step 4: Use VBA to dynamically generate the query used by the table
The challenge is now to take the selection in the slicer and the value entered for the day number of month filter and use that to construct a DAX query.
Here’s an example of what one of these DAX queries might look like:
evaluate
Filter(
DimDate
, DimDate[DayNumberOfMonth]>21
&& (DimDate[EnglishDayNameOfWeek]=”Monday” || DimDate[EnglishDayNameOfWeek]=”Saturday”))
order by DimDate[DateKey]
Here I’m filtering the DimDate table so that the only rows displayed are where day number of month is greater than 21, and day name of week is either Monday or Saturday. If you’re interested in learning more about writing DAX queries, check out the series of blog posts I wrote on this topic here.
Paul te Braak has a great post here on how to work out what has been selected in a slicer using VBA, and I need to acknowledge the fact I’ve borrowed some of his code! Here’s my VBA routine, called by the button on the worksheet, to build and run the query:
Sub RunReport()
Dim SC As SlicerCache Dim SI As SlicerItem
Dim SelectedList As String Dim DayNumberOfMonthFilter As String
Dim DAXQuery As String Dim DemoWorksheet As Worksheet
Dim DAXTable As TableObject
Set DemoWorksheet = Application.Worksheets("TableDemo")
'Find the value of the cell containing the Day Number Of Month filter value DayNumberOfMonthFilter = DemoWorksheet.Range("DayNumberOfMonthFilter").Value
'Find what is selected in the slicer Slicer_EnglishDayNameOfWeek
Set SC = ActiveWorkbook.SlicerCaches("Slicer_EnglishDayNameOfWeek") SelectedList = ""
'Loop through each item in the slicer and if it is selected
'add it to a string that will be used in the filter condition For Each SI In SC.SlicerCacheLevels(1).SlicerItems
If SI.Selected Then If SelectedList <> "" Then
SelectedList = SelectedList & " || " End If
SelectedList = SelectedList & "DimDate[EnglishDayNameOfWeek]=""" & SI.Caption & """" End If
Next
'Construct the DAX query DAXQuery = "evaluate Filter(DimDate, DimDate[DayNumberOfMonth]>" & DayNumberOfMonthFilter
DAXQuery = DAXQuery & " && (" & SelectedList & ")) order by DimDate[DateKey]"
'Bind the table to the DAX query Set DAXTable = DemoWorksheet.ListObjects("Table_DimDate").TableObject
With DAXTable.WorkbookConnection.OLEDBConnection
.CommandText = Array(DAXQuery)
.CommandType = xlCmdDAX
End With
'Run the query
ActiveWorkbook.Connections("ModelConnection_DimDate").Refresh
End Sub
And so there we go, a dynamic DAX table report in Excel 2013. If you’d like to download my example and check it out in detail, you can get hold of it here.
Building Relative Date Reports in PowerPivot
It’s a very common requirement when you’re building a report in PowerPivot (or indeed in any BI tool) for it to automatically show data for today’s date, the current week or month (plus maybe a set number of preceding days, weeks or months), without the user needing to change anything when they open the workbook. There are a number of ways of achieving this, but in this post I’m going to focus on one: building relative date columns in your Date dimension table. This stuff is by no means new and ground-breaking and I’ve seen this particular technique implemented many, many times, but it’s also something I get asked about fairly frequently and I can’t find any other blog posts detailing it so I thought I’d write it up.
To show how this works I’ve built a sample PowerPivot model in Excel. An important part of this sample model is a proper Date dimension table of course, and if you don’t have one in your data source there are plenty of ways of generating one automatically (Boyan Penev’s DateStream dataset in the Azure Marketplace, for instance, or this cool new Excel 2013 app I found today in the Office Store). Here’s the example I’ll be working with which has a Date dimension table and a Sales fact table with some values in it:
On the Date dimension table I’ve added four new columns, two to handle relative dates and two to handle relative months:
Relative Date Offset
=INT([Date] – TODAY())
Relative Month Offset
=((12 * YEAR([Date])) + MONTH([Date])) – ((12 * YEAR(TODAY())) + MONTH(TODAY()))
Relative Date
=IF([Relative Date Offset]=0
, "Today"
, "Today " & IF([Relative Date Offset]>0, "+", "") & [Relative Date Offset])
Relative Month
=IF([Relative Month Offset]=0
, "Current Month"
, "Current Month " & IF([Relative Month Offset]>0, "+", "") & [Relative Month Offset])
The first two of these columns contain integer values that are the number of days and months between today’s date and the date in the [Date] column on the dimension table. I’ve hidden these from client tools, and then then used them in the expressions for (and as the Sort Columns for) the next two columns which show the same values in a more human-readable form. Here’s what the results look like:
These new columns can be used in a variety of ways. For instance, I can now put my Sales measure in a PivotTable, put Relative Date in the Filter and select the ‘Today’ value, and then put Date on columns in the PivotTable and I’ll only see today’s date:
This is because, of course, selecting ‘Today’ on Relative Date automatically filters the [Date] column down to one value – today’s date (ie January 24 2013).
I can now also build reports that show data for the current month and previous month, without showing any dates at all:
There’s one final problem that needs to be solved though: the relative dates are calculated when the Date dimension is loaded and the calculated columns evaluated, but what happens tomorrow when the relative dates need recalculating? If I was building this solution in SSAS Tabular and reprocessing your model every night automatically then I wouldn’t have this issue; in PowerPivot I need to make sure I handle this. In Excel 2010 there’s no way to automate loading data into a table, alas, so the user would have to do the refresh manually alas. In Excel 2013 I can do this using VBA very easily, by putting the following code in the WorkBook_Open() event:
ActiveWorkbook.Model.ModelTables("Date").Refresh
Refreshing the Date table also automatically refreshes your PivotTables too, which is handy. This means that when I open the workbook tomorrow (ie January 25 2013), the relative dates will have shifted accordingly and my report will show data as of January 25 2013 and not January 24 2013.
You can download my Excel 2013 sample workbook here.
Parameterising PowerPivot Connection Strings in Excel 2013
One of the things I’ve always wanted to do with PowerPivot is to parameterise the connections used to import data. Despite PowerPivot’s ability to handle massive data volumes, most of the time you want your end users to import only the data they actually need – and for them to be able to specify filters on the data themselves somehow. The only way to do this in Excel 2010 was to have them go into the PowerPivot model and change the connection themselves, which is not very user-friendly, but now we have a proper PowerPivot object model in 2013 we can modify connection strings in VBA (so we can take filtering information direct from the worksheet) and this post shows how. I’d like to acknowledge the help I got from Kasper’s post here which covers very similar ground, but I came across a few interesting things while building the example here so I thought it was worth a post on its own.
Let’s say we want to build our own Bing search engine client in Excel 2013, where a user can enter a search term in a cell, click a button to run the search and then not only be able to see the search results but analyse them in PowerPivot and Power View. The first step is to sign up to the Bing search API in the Azure Marketplace and then import some search results with a hard-coded search term (this tutorial will help if you’re not sure how to do this); I used Bing image search to return some results with urls pointing to images on the web. This will create a connection in the Workbook which we can then modify programmatically. However I ran into a problem at this point: I found that only connections created on the Data tab on the ribbon can be modified in VBA, whereas connections created in the PowerPivot addin cannot. This means I had to click here:
..to create my connection, and NOT here:
When you open connections created in the PowerPivot window in the Connections dialog from the Data tab, you see the following message: “Some properties cannot be changed because the connection was modified using the PowerPivot Add-In”
Trying to edit connections created in PowerPivot using VBA just gave me an error.
Not much of a issue though. With the connection in place, here’s the VBA code that’s needed to alter the connection string and replace the search term with a value from a cell in the worksheet:
Sub RunImageSearch()
Dim mdl As ModelTable Dim wcon As WorkbookConnection
Dim cs As String Dim ss As String
Dim azurekey As String
azurekey = "Insert your Azure Marketplace account key here"
Set mdl = ActiveWorkbook.Model.ModelTables("Image")
Set wcon = mdl.SourceWorkbookConnection
cs = "DATAFEED;" & _ "Data Source=https://api.datamarket.azure.com/Bing/Search/v1/" & _
"Image?Query=%27ReplacePlaceholder%27;" & _ "Namespaces to Include=*;Max Received Message Size=4398046511104;Integrated Security=Basic;" & _
"User ID=AccountKey;Password=" & azurekey & _ ";Persist Security Info=false;" & _
"Base Url=https://api.datamarket.azure.com/Bing/Search/v1/Image?Query=%27ReplacePlaceholder%27"
ss = WorksheetFunction.EncodeURL(CStr(ActiveWorkbook.Sheets("Search Term").Cells(2, 3).Value))
wcon.DataFeedConnection.Connection = Replace(cs, "ReplacePlaceholder", ss)
mdl.SourceWorkbookConnection.Refresh
End Sub
Three points to note here:
- If you’re copying this code, you need to enter your own Azure Marketplace account key in the place specified
- The search term needs to be url encoded, and luckily there’s a new function to do this in 2013: EncodeURL()
- If you’re using a different data source then obviously the connection string will be different. Also, because I’m using data from the Azure Marketplace my SourceWorkbookConnection object has a connection of type DataFeedConnection – other data sources will have different connection types, so check the value returned by SourceWorkbookConnection.Type (the list of values in the XLConnectionType enumeration on the web is, at the time of writing, out of date but Object Explorer is up-to-date)
That’s all there is to it. The macro can be bound to a button on the worksheet like so:
And we can then do all kinds of clever things to analyse the search results. For example, with the image search results we can show thumbnails in a Power View report (see Jeremy Kashel’s post for details):
You can download the whole example workbook here, although again you’ll have to edit the VBA to enter your own Azure Marketplace account key if you want it to work.
Introduction to MDX for PowerPivot Users, Part 5: MDX Queries
In part 4 of this series (sorry for the long wait since then!) I finished off looking at what you can do with named sets. Now, before I go on to more important topics like Excel cube functions and calculated members I’d like to take a high-level look at what you can do with MDX queries running against PowerPivot – high level, because there’s much more to MDX queries than can be covered in a single post and, as I explain below, you probably won’t want to do this very often.
So why would I need to write whole queries against a PowerPivot model?
This is a very good question, given that in my opinion 99% of the time you can achieve what you want when building Excel reports using either PivotTables (either with or without named sets) or Excel cube functions. Having said that, the post I wrote a few years ago about binding a table in Excel to an MDX query has been one of the most popular I’ve ever written, so maybe I’m wrong about how frequently people need to do this…
I’d say that you would probably only want to write your own queries when you needed complete control over the MDX and didn’t mind that it made linking the query up to filters or slicers very difficult – for example, if you wanted a list of unpaid invoices, or a list of customers that met some specific criteria, in a dashboard.
Also, when you run MDX queries in Excel you’re going to use an Excel table to show the results rather than a PivotTable. This is actually the format you need to use to pass data to other Excel-based tools like like Excel Data Mining Addins (as well as PowerPivot), so writing your own MDX queries might actually save you having to convert to formulas, as Kasper does here, or cutting/pasting in cases like this.
Why use MDX instead of DAX?
From PowerPivot V2, PowerPivot models can be queried in either MDX or the DAX query language (if you want to learn about DAX queries take a look at the posts I wrote on this topic last year, starting here), and if you’ve already learned a lot of DAX for PowerPivot you’re probably going to be more comfortable using DAX queries. However, I know there are a lot of old SSAS-fans out there doing work with PowerPivot who prefer MDX, and there are still a few things that MDX can do that PowerPivot can’t, so choosing MDX over DAX is a legitimate choice. Examples would be when you want to pivot your resultset and put something other than measures on columns, or show a calculated member on rows, and I show how to do both of these things below.
How do I display the results of an MDX query in Excel?
As I said, when you display the results of an MDX query in Excel you’ll need to use an Excel table to do so. I blogged about a few ways to do this here but there’s actually a better way now: using DAX Studio. DAX Studio is a free Excel addin for people who want to write DAX queries against a PowerPivot model, but it can run MDX queries too. Unfortunately it doesn’t display any MDX metadata for you to use – only DAX metadata – but it’s still a much more convenient way of running MDX queries than doing a drillthrough and then editing the query property of a table.
The DAX Studio documentation gives you a good overview of how to use the tool and I won’t repeat that here, but to prove it does work here’s a screenshot of an MDX query run against a PowerPivot model:
OK, so get on with it and tell me how to write an MDX query…
The basic MDX query is quite simple. Books online has all the details:
http://msdn.microsoft.com/en-us/library/ms146002.aspx
http://msdn.microsoft.com/en-us/library/ms144785.aspx
…but really all you need to know is this:
Each MDX query needs a SELECT clause. Inside the SELECT clause you need to define one or two axes, either just a columns axis or a columns axis or a rows axis, and the way you define what appears on an axis is using a set, an object we’ve seen a lot of in the last few posts in this series. Each MDX query also needs a FROM clause, with the name of the cube that is to be queried; for PowerPivot the name of the ‘cube’ is always [Model].
Here’s an example of a simple MDX query on a PowerPivot model built on Adventure Works DW that returns a measure on columns and three years on rows:
SELECT
{[Measures].[Sum of SalesAmount]}
ON COLUMNS,
{[DimDate].[CalendarYear].&[2005]
, [DimDate].[CalendarYear].&[2006]
, [DimDate].[CalendarYear].&[2007]}
ON ROWS
FROM [Model]
Everything you do on columns, you can do on rows, and vice versa, so:
SELECT
{[DimDate].[CalendarYear].&[2005]
, [DimDate].[CalendarYear].&[2006]
, [DimDate].[CalendarYear].&[2007]}
ON COLUMNS,
{[Measures].[Sum of SalesAmount]}
ON ROWS
FROM [Model]
Returns this:
Using a set of tuples on rows and/or columns gives a crosstabbed effect:
SELECT
{[Measures].[Sum of SalesAmount]}
*
{[DimProductCategory].[EnglishProductCategoryName].[EnglishProductCategoryName].&[Bikes]
, [DimProductCategory].[EnglishProductCategoryName].[EnglishProductCategoryName].&[Clothing]}
ON COLUMNS,
{[DimDate].[CalendarYear].&[2005]
, [DimDate].[CalendarYear].&[2006]
, [DimDate].[CalendarYear].&[2007]} *
{[DimProduct].[Color].&[Black]
, [DimProduct].[Color].&[Red]}
ON ROWS
FROM [Model]
After the FROM clause, you can add a WHERE clause to slice the resultset. Do not confuse the MDX WHERE clause with the SQL WHERE clause: it does something similar but it doesn’t directly affect what appears on rows or columns, it filters the values returned inside the query. For example:
SELECT
{[Measures].[Sum of SalesAmount]}
ON COLUMNS,
{[DimDate].[CalendarYear].&[2005]
, [DimDate].[CalendarYear].&[2006]
, [DimDate].[CalendarYear].&[2007]}
ON ROWS
FROM [Model]
WHERE(
[DimProductCategory].[EnglishProductCategoryName].[EnglishProductCategoryName].&[Bikes]
, [DimProduct].[Color].&[Black])
…returns sales for Black Bikes for the years 2005 to 2007:
Notice that the Colour Black and the Product Category Bikes don’t appear anywhere on rows or columns, but the values that are shown are for Black Bikes nonetheless.
The WITH clause
You can define your own calculated members (which I’ll talk about in a future post) and named sets inside a query if you add a WITH clause before your SELECT clause. Here’s an example of this:
WITH
SET [MY YEARS] AS
{[DimDate].[CalendarYear].&[2005]
, [DimDate].[CalendarYear].&[2006]
, [DimDate].[CalendarYear].&[2007]}
MEMBER [DimDate].[CalendarYear].[Total 2005-7] AS
AGGREGATE([MY YEARS])
MEMBER [Measures].[Percent of Total] AS
([Measures].[Sum of SalesAmount])
/
([Measures].[Sum of SalesAmount]
, [DimDate].[CalendarYear].[Total 2005-7])
, FORMAT_STRING=’PERCENT’
SELECT
{[Measures].[Sum of SalesAmount]
,[Measures].[Percent of Total]}
ON COLUMNS,
{[MY YEARS], [DimDate].[CalendarYear].[Total 2005-7]}
ON ROWS
FROM [Model]
Here I’ve defined a named set called [MY YEARS] which I’ve then used to define what goes on the rows axis, and two calculated members, [Total 2005-7] which returns the subtotal of the years 2005 to 2007, and a new measure [Percent of Total] that shows the percentage that each row makes up of this subtotal. Incidentally, even though DAX can do this kind of subtotalling, it’s only in MDX that you can define any calculation you want on any axis in your query.
Flattened Rowsets
You might be wondering, looking at the examples above, why the column headers are all in human-unfriendly MDX and why the [Percent of Total] measure hasn’t had any formatting applied. You will also notice in this query how the name of the All Member on the [CalendarYear] hierarchy doesn’t get returned, and you get a blank row name instead:
SELECT
{[Measures].[Sum of SalesAmount]}
ON COLUMNS,
{[DimDate].[CalendarYear].[All]
,[DimDate].[CalendarYear].&[2005]}
ON ROWS
FROM [Model]
This is because, when you run queries that get bound to an Excel table they are returned as flattened rowsets and not cellsets (which is how most SSAS client tools and SQL Server Management Studio returns MDX queries). Basically, this means your nice, multidimensional resultset gets squashed into something tabular – and when this happens, a lot of useful stuff gets lost along the way. Here’s the official documentation on how flattened rowsets are generated:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms716948(v=vs.85).aspx
This is a pain, but there’s no way around it unless you want to write your own code to render a cellset in Excel unfortunately.
Conclusion
Writing your own MDX queries against a PowerPivot model isn’t exactly something you’ll need to do every day, but it’s a useful addition to your PowerPivot toolbox and I wanted to mention it in this series for the sake of completeness. In my next post I’ll be taking a look at MDX calculated members.
