Archive for the ‘PowerPivot’ Category
Microsoft has published documentation for the new DAX functions in the Excel 2016 preview here:
There’s a lot of detail, including examples (although the ConcatenateX() page isn’t live at the time of writing – but I’ve blogged about that already), so it’s well worth reading through.
Continuing my series on new DAX functions in Excel 2016, here are two more: NaturalInnerJoin() and NaturalLeftOuterJoin(). Both do pretty much what you’d expect.
Consider the following two tables in an Excel worksheet, called ColourFruit and FruitPrice:
With these tables loaded into the Excel Data Model as linked tables, the next step is to create a relationship between the tables on the Fruit column:
Both functions only work with two tables that have an active relationship between them, and both take two tables from the Excel Data Model as parameters. Once you’ve done that you can use these functions in a DAX query.
…both perform an inner join between the two tables on the Fruit column and both return the same table:
For NaturalLeftOuterJoin() the table given in the first parameter is on the left-hand side of the left outer join, so all rows from it are returned, whereas the table in the second parameter is on the right-hand side of the join so only the matching rows are returned.
Following on from my recent post on Power BI and Excel 2016 news, here are some more details about the new BI-related features in the Excel 2016 Preview. Remember that more BI-related features may appear before the release of Excel 2016, and that with Office 365 click-to-run significant new features can appear in between releases, so this is not a definitive list of what Excel 2016 will be able to do at RTM but a snapshot of functionality available as of March 2015 as outlined in this document and which I’ve found from my own investigations. When I find out more, or when new functionality appears, I’ll either update this post or write a new one.
Yesterday, in the original version of my post, I mistakenly said that Power Query was a native add-in in Excel 2016: that’s not true, it’s not an add-in at all, it’s native Excel functionality. Indeed you can see that there is no separate Power Query tab any more, and instead there is a Power Query section on the Data tab instead:
Obviously I’m a massive fan of Power Query so I’m biased, but I think this is a great move because it makes all the great Power Query functionality a lot easier to discover. There’s nothing to enable – it’s there by default – although I am a bit worried that users will be confused by having the older Data tab features next to their Power Query equivalents.
There are no new features for Power Query here compared to the latest version for Excel 2013, but that’s what I expected.
Excel Forecasting Functions
I don’t pretend to know anything about forecasting, but I had a brief play with the new Forecast.ETS function and got some reasonable results out of it as seen in the screenshot below:
There’s a new hammer icon on a slicer, which, when you click it, changes the way selection works. The default behaviour is the same as Excel 2013: every time you click on an item, that item is selected and any previous selection is lost (unless you were holding control or shift to multiselect). However with the hammer icon selected each new click adds the item to the previously selected items. This is meant to make slicers easier to use with a touch-screen.
Time Grouping in PivotTables
Quite a neat feature this, I think. If you have a table in the Excel Data Model that has a column of type date in it, you can add extra calculated columns to that table from within a PivotTable to group by things like Year and Month. For example, here’s a PivotTable I built on a table that contains just dates:
Right-clicking on the field containing the dates and clicking Group brings up the following dialog:
Choosing Years, Quarters and Months creates three extra fields in the PivotTable:
And these fields are implemented as calculated columns in the original table in the Excel Data Model, with DAX definitions as seen here:
Power View on SSAS Multidimensional
At-bloody-last. I haven’t installed SSAS on the VM I’m using for testing Excel 2016, but I assume it just works. Nothing new in Power View yet, by the way.
Power Map data cards
Not sure why this is listed as new in Excel 2016 when it seems to be the same feature that appeared in Excel 2013 Power Map recently:
There isn’t any obvious new functionality in the Power Pivot window, but it’s clear that the UI in general and the DAX formula editor experience in particular has been improved.
When you use fields from two Excel Data Model tables that have no relationship between them in a PivotTable, you get a prompt to either create new relationships yourself or let Excel detect the relationships:
Renaming Tables and Fields in the Power Pivot window
In Excel 2013 when you renamed tables or fields in the Excel Data Model, any PivotTables that used those objects had them deleted. Now, in Excel 2016, the PivotTable retains the reference to table or field and just displays the new name. What’s even better is that when you create a measure or a calculated column that refers to a table or column, the DAX definition of the measure or calculated column gets updated after a rename too.
There are lots of new DAX functions in this build. With the help of the mdschema_functions schema rowset and Power Query I was able to compare the list of DAX functions available in 2016 with those in 2013 and create the following list of new DAX functions and descriptions:
FUNCTION NAME DESCRIPTION DATEDIFF Returns the number of units (unit specified in Interval) between the input two dates CONCATENATEX Evaluates expression for each row on the table, then return the concatenation of those values in a single string result, separated by the specified delimiter KEYWORDMATCH Returns TRUE if there is a match between the MatchExpression and Text. ADDMISSINGITEMS Add the rows with empty measure values back. CALENDAR Returns a table with one column of all dates between StartDate and EndDate CALENDARAUTO Returns a table with one column of dates calculated from the model automatically CROSSFILTER Specifies cross filtering direction to be used in the evaluation of a DAX expression. The relationship is defined by naming, as arguments, the two columns that serve as endpoints CURRENTGROUP Access to the (sub)table representing current group in GroupBy function. Can be used only inside GroupBy function. GROUPBY Creates a summary the input table grouped by the specified columns IGNORE Tags a measure expression specified in the call to SUMMARIZECOLUMNS function to be ignored when determining the non-blank rows. ISONORAFTER The IsOnOrAfter function is a boolean function that emulates the behavior of Start At clause and returns true for a row that meets all the conditions mentioned as parameters in this function. NATURALINNERJOIN Joins the Left table with right table using the Inner Join semantics NATURALLEFTOUTERJOIN Joins the Left table with right table using the Left Outer Join semantics ROLLUPADDISSUBTOTAL Identifies a subset of columns specified in the call to SUMMARIZECOLUMNS function that should be used to calculate groups of subtotals ROLLUPISSUBTOTAL Pairs up the rollup groups with the column added by ROLLUPADDISSUBTOTAL SELECTCOLUMNS Returns a table with selected columns from the table and new columns specified by the DAX expressions SUBSTITUTEWITHINDEX Returns a table which represents the semijoin of two tables supplied and for which the common set of columns are replaced by a 0-based index column. The index is based on the rows of the second table sorted by specified order expressions. SUMMARIZECOLUMNS Create a summary table for the requested totals over set of groups. GEOMEAN Returns geometric mean of given column reference. GEOMEANX Returns geometric mean of an expression values in a table. MEDIANX Returns the 50th percentile of an expression values in a table. PERCENTILE.EXC Returns the k-th (exclusive) percentile of values in a column. PERCENTILE.INC Returns the k-th (inclusive) percentile of values in a column. PERCENTILEX.EXC Returns the k-th (exclusive) percentile of an expression values in a table. PERCENTILEX.INC Returns the k-th (inclusive) percentile of an expression values in a table. PRODUCT Returns the product of given column reference. PRODUCTX Returns the product of an expression values in a table. XIRR Returns the internal rate of return for a schedule of cash flows that is not necessarily periodic XNPV Returns the net present value for a schedule of cash flows
Plenty of material for future blog posts there, I think – there are lots of functions here that will be very useful. I bet Marco and Alberto are excited…
It looks like we have support for the Excel Data Model (aka Power Pivot) in VBA at last. I need to do some research here, but I get the distinct feeling that the only things that are possible through VBA are the things you can do in the Excel ribbon, such as creating connections, tables and relationships. I can’t see any support for creating measures, calculated columns or hierarchies…? I can’t see anything relating to Power Query either. Maybe I’m not looking in the right place; maybe something will come in a later build?
UPDATE: I’m an idiot – there is one minor change to the VBA support for the Excel Data Model, but actually almost everything that I see in 2016 is also present in 2013. Sorry…
Anyone who has tried to do any serious work with Power Pivot and Power Query will know about this problem: you use Power Query to load some tables into the Data Model in Excel 2013; you make some changes in the Power Pivot window; you then go back to Power Query, make some changes there and you get the dreaded error
We couldn’t refresh the table ‘xyz’ from the connection ‘Power Query – xyz’. Here’s the error message we got:
COM Error: Microsoft.Mashup.OleDbProvider; The query ‘xyz’ or one of its inputs was modified in Power Query after this connection was added. Please disable and re-enable loading to the Data Model for this query..
This post has a solution for the same problem in Excel 2010, but it doesn’t work for Excel 2013 unfortunately. There is a lot of helpful information out there on the web about this issue if you look around, though, and that’s why I thought it would be useful to bring it all together into one blog post and also pass on some hints and tips about how to recover from this error if you get it. This is the single biggest source of frustration among the Power Query users I speak to; a fix for it is being worked on, and I hope it gets released soon.
Why does this problem occur? Let’s take a simple repro.
- Import the data from a table in SQL Server using Power Query. Load it into the Excel Data Model.
- Open the PowerPivot window in Excel, then create measures/calculated fields, calculated columns, relationships with other tables as usual.
- Go back to the worksheet and build a PivotTable from data in this table, using whatever measures or calculated columns you have created.
- Go back to the PowerPivot window and rename one of the columns there. The column name change will be reflected in the PivotTable and everything will continue to work.
- Re-open the Power Query query editor, and then rename any of the columns in the table (not necessarily the one you changed in the previous step). Close the query editor window and when the query refreshes, bang! you see the error above. The table in the Excel Data Model is unaffected, however, and your PivotTable continues to work – it’s just that now you can’t refresh the data any more…
- Do what the error message suggests and change the Load To option on the Power Query query, unchecking the option to load to the Data Model. When you do this, on the very latest build of Power Query, you’ll see a “Possible Data Loss” warning dialog telling you that you’ll lose any customisations you made. Click Continue, and the query will be disabled. The destination table will be deleted from your Excel Data Model and your PivotTable, while it will still show data, will be frozen.
- Change the Load To option on the query to load the data into the Excel Data Model again. When you do this, and refresh the data, the table will be recreated in the Excel Data Model. However, your measures, calculated columns and relationships will all be gone. What’s more, although your PivotTable will now work again, any measures or calculated columns you were using in it will also have gone.
- Swear loudly at your computer and add all the measures, calculated columns and relationships to your Data Model all over again.
So what exactly happened here? The important step is step 4. As Miguel Llopis of the Power Query team explains here and here, when you make certain changes to a table in the Power Pivot window the connection from your Power Query query to the Excel Data Model goes into ‘read-only’ mode. This then stops Power Query from making any subsequent changes to the structure of the table.
What changes put the connection to the Excel Data Model in ‘read-only’ mode?
Here’s a list of changes (taken from Miguel’s posts that I linked to above) that you can make in the PowerPivot window that put the connection from your query to the Data Model into ‘read-only’ mode:
- Edit Table Properties
- Column-level changes: Rename, Data type change, Delete
- Table-level changes: Rename, Delete
- Import more tables using Power Pivot Import Wizard
- Upgrade existing workbook
How can you tell whether my connection is in ‘read-only’ mode?
To find out whether your connection is in ‘read-only’ mode, go to the Data tab in Excel and click on the Connections button. Then, in the Workbook Connections dialog you’ll see the connection from Power Query to the Data Model listed – it will be called something like ‘Power Query – Query1’ and the description will be ‘Connection to the Query1 query in the Data Model’. Select this connection and click on the Properties button. When the Connection Properties dialog opens, go to the Definition tab. If the connection is in read-only mode the properties will be greyed out, and you’ll see the message ‘Some properties cannot be changed because this connection was modified using the PowerPivot Add-In’. If you do see this message, you’re already in trouble!
How to avoid this problem
Avoiding this problem is pretty straightforward: if you’re using Power Query to load data into the Excel Data Model, don’t make any of the changes listed above in the PowerPivot window! Make them in Power Query instead.
How to recover from this problem
But what if your connection is already in ‘read-only’ mode? There is no magic solution, unfortunately, you are going to have to rebuild your model. However there are two things you can do to reduce the amount of pain you have to go through to recreate your model.
First, you can use the DISCOVER_CALC_DEPENDENCY DMV to list out all of your measure and calculated column definitions to a table in Excel. Here’s some more information about the DMV:
To use this, all you need to do is to create a DAX query table in the way Kasper shows at the end of this post, and use the query:
select * from $system.discover_calc_dependency
Secondly, before you disable and re-enable your Power Query query (as in step 6 above), install the OLAP PivotTable Extensions add-in (if you don’t already have it) and use its option to disable auto-refresh on all of your PivotTables, as described here:
Doing this prevents the PivotTables from auto-refreshing when the table is deleted from the Data Model when you disable the Power Query query. This means that they remember all of their references to your measures and calculated columns, so when you have recreated them in your Data Model (assuming that all of the names are still the same) and you re-enable auto-refresh the PivotTables will not have changed at all and will continue to work as before.
[After writing this post, I realised that Barbara Raney covered pretty much the same material in this post: http://www.girlswithpowertools.com/2014/06/power-query-refresh-fails/ . I probably read that post when it was published and then forgot about it. I usually don’t blog about things that other people have already blogged about, but since I’d already done the hard work and the tip on using OLAP PivotTable Extensions is new, I thought I’d post anyway. Apologies…]
You probably know that, when you are importing data from multiple tables in SQL Server into the Excel Data Model in Excel 2013 using Power Query, Power Query will automatically create relationships between those tables in the Data Model. But did you know that you can get Power Query to do this for other data sources too?
Now wait – don’t get excited. I’ve known about this for a while but not blogged about it because I don’t think it works all that well. You have to follow some very precise steps to make it happen and even then there are some problems. That said, I think we’re stuck with the current behaviour (at least for the time being) so I thought I might as well document it.
Consider the following Excel worksheet with two tables in it, called Dimension and Fact:
If you were to load these two tables into the Excel Data Model, you would probably want to create a relationship between the two tables based on the FruitID column. Here are the steps to use Power Query to create the relationship automatically:
- Click inside the Dimension table and then, on the Power Query tab in the Excel ribbon, click the From Table button to create a new query.
- When the Query Editor window opens, right click on the FruitID column and select Remove Duplicates.
Why are we doing this when there clearly aren’t any duplicate values in this column? The new step contains the expression
…and one of the side-effects of using Table.Distinct() is that it adds a primary key to the table. Yes, tables in Power Query can have primary keys – the Table.AddKey() function is another way of doing this. There’s a bit more information on this subject in my Power Query book, which I hope you have all bought!
- Click the Close & Load to.. button to close the Query Editor, and then choose the Only Create Connection option to make sure the output of the query is not loaded anywhere and the query is disabled, then click the Load button. (Am I the only person that doesn’t like this new dialog? I thought the old checkboxes were much simpler, although I do appreciate the new flexibility on where to put your Excel table output)
- Click inside the Fact table in the worksheet, click the From Table button again and this time do load it into the Data Model.
- Next, in the Power Query tab in the Excel ribbon, click the Merge button. In the Merge dialog select Dimension as the first table, Fact as the second, and in both select the FruitID column to join on.
- Click OK and the Query Editor window opens again. Click the Close & Load to.. button again, and load this new table into the Data Model.
- Open the Power Pivot window and you will see that not only have your two tables been loaded into the Data Model, but a relationship has been created between the two:
What are the problems I talked about then? Well, for a start, if you don’t follow these instructions exactly then you won’t get the relationship created – it is much harder than I would like. There may be other ways to make sure the relationships are created but I haven’t found them yet (if you do know of an easier way, please leave a comment!). Secondly if you delete the two tables from the Data Model and delete the two Power Query queries, and then follow these steps again, you will find the relationship is not created. That can’t be right. Thirdly, I don’t like having to create a third query with the Merge, and would prefer it if I could just create two queries and define the relationship somewhere separately. With all of these issues I don’t think there’s any practical use for this functionality right now.
I guess the reason I think the ability to create relationships automatically is so important is because the one thing that the Excel Data Model/Power Pivot/SSAS Tabular sorely lacks is a simple way to script the structure of a model. Could Power Query and M one day be the modelling language that Marco asks for here? To be fair to the Power Query team this is not and should not be their core focus right now: Power Query is all about data acquisition, and this is data modelling. If this problem was solved properly it would take a lot of thought and a lot of effort. I would love to see it solved one day though.
You can download the sample workbook for this post here.
I had an interesting challenge from a customer yesterday – one of those problems that I’d known about for a long time but never got round to working out the solution for…
Consider the following PivotTable, based on a PowerPivot model using Adventure Works data, in Excel 2010:
It shows the top 10 products by the measure Sum of Sales. There are two slicers, and the top 10 shown in the PivotTable reflects the selections made in the slicers. All of this works fine. But what if you want to use Excel cube functions to do the same thing? You can write the MDX for the top 10 products quite easily and use it in the CubeSet() function in your worksheet, but how can you get your MDX set expression to respect the selection made in the slicers?
The solution to this problem is very similar to the trick I showed here – finding the selected items in a slicer is not easy! Here are the steps I followed to do it:
- Add the slicers for EnglishOccupation and CalendarYear to a new worksheet
- Go to Slicer Settings and uncheck the box for “Visually indicate items with no data”
- Add two new PivotTables to the worksheet. Connect one to the EnglishOccupation slicer and put EnglishOccupation on rows; connect the other to the CalendarYear slicer and put CalendarYear on rows.
- Use the OLAPPivotTableExtensions add-in (which you can download here) to add new MDX calculated measures to each PivotTable. For the EnglishOccupation PivotTable call the measure SelectedOccupations and use the following MDX:
This expression does the following: it uses the Axis() function to find the set of members selected on what Excel thinks of as the rows axis in the PivotTable (actually the MDX columns axis), then uses Except() to remove the All Member from the hierarchy (which Excel uses for the Grand Totals) and then uses SetToStr() to take that set and return the string representation of it. Do the same thing for the PivotTable showing CalendarYear too, calling the calculated measure SelectedYears; the MDX in this case is:
This is what the EnglishOccupation PivotTable should look like:
- Next, to make things easy, use Excel formulas to get the values from the top cell inside each PivotTable into cells elsewhere in the worksheet, and give these cells the names SelectedOccupations and SelectedYears.
- Then enter a CubeSet() function into a new cell using the following formula:
Sum(" & SelectedOccupations & " * " & SelectedYears & ",[Measures].[Sum of Sales])
"Top 10 Set")
What this does is use the TopCount() function to find the top 10 Products, and in the third parameter of this function which is the numeric expression to find the top 10 by, it crossjoins the two sets of selected occupations and selected years and then sums the output of the crossjoin by the measure [Sum of Sales].
- Last of all, build your report using the Excel cube functions as normal, using the CubeRankedMember() function to get each item from the top 10 set created in the previous step.
You can download my sample workbook here.
The bad news about this technique is that it doesn’t work in Excel 2013 and Power Pivot. It’s no longer possible to create MDX calculated measures on Power Pivot models in Excel 2013, alas. It will work if you’re using any version of Excel from 2007 on against Analysis Services and, as I show here, Excel 2010 and PowerPivot. If you are using Power Pivot and Excel 2013 it might be possible to create a DAX measure to do the same as the MDX I’ve used here (I’m wondering if the technique Jason describes here will work). It would certainly be possible to use CubeRankedMember() to find each item selected in the slicer, as Erik Svensen shows here, and then use Excel formulas to find the MDX unique name for each selected member and concatenate these unique names to create the set expression that my calculated measures return, but that’s a topic for another post. This really should be a lot easier than it is…
Before I start, I have to state that the technique shown in this post isn’t mine but was developed by my colleague Andrew Simmans, who has very kindly allowed me to blog about it.
Over the last few months I’ve been working on an SSAS Tabular project that has not only presented some interesting modelling challenges, but has shown how DAX can offer some new and interesting solutions to these challenges. Consider the following scenario: a supermarket sells products, and we have a fact table showing sales of products by day. Here’s some sample data:
To complicate matters, each product has one product manager but product managers for particular products change from time to time. Normally this might be solved by adding the product manager name to the Product dimension table and implementing a Type 2 Slowly Changing Dimension. In this case, though, we want something slightly different: instead of seeing sales attributed to the product manager who was in charge of the product at the time of the sale, and therefore seeing sales for the same product attributed to different product managers on different dates, we want to attribute all sales for a product to a single product manager but be able to use a second date dimension to be able to determine the point in time, and therefore the product manager in charge of each product at that point in time, that we want to report as of. To put it another way, we want to be able to find the state of a dimension on any given date and use that version of the dimension to do our analysis.
For example, we have the following table showing which product manager was in charge of each product at any given point in time:
Between January 1st 2013 and January 3rd 2013 Jim was the product manager for Orange, but from January 4th 2013 onwards Rob took over as product manager for Oranges; Fred was the product manager for Apples the whole time. We want a PivotTable that looks like this when we choose to report as of January 2nd 2013:
Notice how Jim is shown as the product manager for Oranges. If we wanted to report using the managers as of January 5th 2013, we would want to see Rob shown as the product manager for Oranges like so:
The solution to this problem involves a combination of two DAX techniques that have already been blogged about quite extensively and which I’d encourage you to read up on:
- Many-to-many relationships, in this case the solution developed by Gerhard Brueckl, described on his blog here.
- ‘Between’ date filters, which I wrote about recently but which Alberto has recently improved on in his must-read white paper here.
Here are the table relationships I’ve used for the sample scenario:
I’ve added a second date table called ReportingDate which contains the same rows as the Date table shown above; note that it has no relationship with any other table.
This problem is very similar to a many-to-many relationship in that a product can have many managers across time, and a manager can have many products. Indeed we could model this as a classic many-to-many relationship by creating a bridge table with one row for each valid combination of product and manager for each possible reporting date; on my project, however, this was not a viable solution because it would have resulted in a bridge table with billions of rows in it. Therefore, instead of joining the ReportingDate table directly to the ProductManager table, we can instead filter ProductManager using the between date filter technique.
Here’s the DAX of the Sum of Sales measure used in the PivotTables show above:
This is not necessarily the best way to write the code from a performance point of view but it’s the most readable – if you need better performance I recommend you read Alberto’s white paper. What I’m doing is this:
- Only return a value if a single reporting date is selected
- Filter the ProductManager table so only the rows where the selected reporting date is between the start date and the end date are returned, ie we only get the rows where a manager was in charge of a product on the reporting date
- Use the filtered ProductManager table to filter the main fact table using the Calculate() function, in exactly the same way that you would with a many-to-many relationship
You can download my sample workbook here.