Archive for the ‘Excel’ Category
This is the first of many posts on the new DAX functions that have appeared in Excel 2016 (for a full list see this post). Today: the ConcatenateX() function.
The mdschema_functions schema rowset gives the following description of this function:
Evaluates expression for each row on the table, then return the concatenation of those values in a single string result, separated by the specified delimiter
Its signature is:
CONCATENATEX(Table, Expression, [Delimiter])
It’s easier to understand what it does using a simple example though. Consider the following table on a worksheet in Excel 2016:
When you add this table to the Excel Data Model (I called the table Sales) you can add the following measure:
Purchasing Customers:= CONCATENATEX( VALUES(Sales[Customer]), Sales[Customer], "," )
If you then use this measure in a PivotTable, you see the following:
As you can see, the measure returns a comma-delimited list of all of the customers who have bought each product. Very useful…
Following on from my recent post on Power BI and Excel 2016 news, here are some more details about the new BI-related features in the Excel 2016 Preview. Remember that more BI-related features may appear before the release of Excel 2016, and that with Office 365 click-to-run significant new features can appear in between releases, so this is not a definitive list of what Excel 2016 will be able to do at RTM but a snapshot of functionality available as of March 2015 as outlined in this document and which I’ve found from my own investigations. When I find out more, or when new functionality appears, I’ll either update this post or write a new one.
Yesterday, in the original version of my post, I mistakenly said that Power Query was a native add-in in Excel 2016: that’s not true, it’s not an add-in at all, it’s native Excel functionality. Indeed you can see that there is no separate Power Query tab any more, and instead there is a Power Query section on the Data tab instead:
Obviously I’m a massive fan of Power Query so I’m biased, but I think this is a great move because it makes all the great Power Query functionality a lot easier to discover. There’s nothing to enable – it’s there by default – although I am a bit worried that users will be confused by having the older Data tab features next to their Power Query equivalents.
There are no new features for Power Query here compared to the latest version for Excel 2013, but that’s what I expected.
Excel Forecasting Functions
I don’t pretend to know anything about forecasting, but I had a brief play with the new Forecast.ETS function and got some reasonable results out of it as seen in the screenshot below:
There’s a new hammer icon on a slicer, which, when you click it, changes the way selection works. The default behaviour is the same as Excel 2013: every time you click on an item, that item is selected and any previous selection is lost (unless you were holding control or shift to multiselect). However with the hammer icon selected each new click adds the item to the previously selected items. This is meant to make slicers easier to use with a touch-screen.
Time Grouping in PivotTables
Quite a neat feature this, I think. If you have a table in the Excel Data Model that has a column of type date in it, you can add extra calculated columns to that table from within a PivotTable to group by things like Year and Month. For example, here’s a PivotTable I built on a table that contains just dates:
Right-clicking on the field containing the dates and clicking Group brings up the following dialog:
Choosing Years, Quarters and Months creates three extra fields in the PivotTable:
And these fields are implemented as calculated columns in the original table in the Excel Data Model, with DAX definitions as seen here:
Power View on SSAS Multidimensional
At-bloody-last. I haven’t installed SSAS on the VM I’m using for testing Excel 2016, but I assume it just works. Nothing new in Power View yet, by the way.
Power Map data cards
Not sure why this is listed as new in Excel 2016 when it seems to be the same feature that appeared in Excel 2013 Power Map recently:
There isn’t any obvious new functionality in the Power Pivot window, but it’s clear that the UI in general and the DAX formula editor experience in particular has been improved.
When you use fields from two Excel Data Model tables that have no relationship between them in a PivotTable, you get a prompt to either create new relationships yourself or let Excel detect the relationships:
Renaming Tables and Fields in the Power Pivot window
In Excel 2013 when you renamed tables or fields in the Excel Data Model, any PivotTables that used those objects had them deleted. Now, in Excel 2016, the PivotTable retains the reference to table or field and just displays the new name. What’s even better is that when you create a measure or a calculated column that refers to a table or column, the DAX definition of the measure or calculated column gets updated after a rename too.
There are lots of new DAX functions in this build. With the help of the mdschema_functions schema rowset and Power Query I was able to compare the list of DAX functions available in 2016 with those in 2013 and create the following list of new DAX functions and descriptions:
FUNCTION NAME DESCRIPTION DATEDIFF Returns the number of units (unit specified in Interval) between the input two dates CONCATENATEX Evaluates expression for each row on the table, then return the concatenation of those values in a single string result, separated by the specified delimiter KEYWORDMATCH Returns TRUE if there is a match between the MatchExpression and Text. ADDMISSINGITEMS Add the rows with empty measure values back. CALENDAR Returns a table with one column of all dates between StartDate and EndDate CALENDARAUTO Returns a table with one column of dates calculated from the model automatically CROSSFILTER Specifies cross filtering direction to be used in the evaluation of a DAX expression. The relationship is defined by naming, as arguments, the two columns that serve as endpoints CURRENTGROUP Access to the (sub)table representing current group in GroupBy function. Can be used only inside GroupBy function. GROUPBY Creates a summary the input table grouped by the specified columns IGNORE Tags a measure expression specified in the call to SUMMARIZECOLUMNS function to be ignored when determining the non-blank rows. ISONORAFTER The IsOnOrAfter function is a boolean function that emulates the behavior of Start At clause and returns true for a row that meets all the conditions mentioned as parameters in this function. NATURALINNERJOIN Joins the Left table with right table using the Inner Join semantics NATURALLEFTOUTERJOIN Joins the Left table with right table using the Left Outer Join semantics ROLLUPADDISSUBTOTAL Identifies a subset of columns specified in the call to SUMMARIZECOLUMNS function that should be used to calculate groups of subtotals ROLLUPISSUBTOTAL Pairs up the rollup groups with the column added by ROLLUPADDISSUBTOTAL SELECTCOLUMNS Returns a table with selected columns from the table and new columns specified by the DAX expressions SUBSTITUTEWITHINDEX Returns a table which represents the semijoin of two tables supplied and for which the common set of columns are replaced by a 0-based index column. The index is based on the rows of the second table sorted by specified order expressions. SUMMARIZECOLUMNS Create a summary table for the requested totals over set of groups. GEOMEAN Returns geometric mean of given column reference. GEOMEANX Returns geometric mean of an expression values in a table. MEDIANX Returns the 50th percentile of an expression values in a table. PERCENTILE.EXC Returns the k-th (exclusive) percentile of values in a column. PERCENTILE.INC Returns the k-th (inclusive) percentile of values in a column. PERCENTILEX.EXC Returns the k-th (exclusive) percentile of an expression values in a table. PERCENTILEX.INC Returns the k-th (inclusive) percentile of an expression values in a table. PRODUCT Returns the product of given column reference. PRODUCTX Returns the product of an expression values in a table. XIRR Returns the internal rate of return for a schedule of cash flows that is not necessarily periodic XNPV Returns the net present value for a schedule of cash flows
Plenty of material for future blog posts there, I think – there are lots of functions here that will be very useful. I bet Marco and Alberto are excited…
It looks like we have support for the Excel Data Model (aka Power Pivot) in VBA at last. I need to do some research here, but I get the distinct feeling that the only things that are possible through VBA are the things you can do in the Excel ribbon, such as creating connections, tables and relationships. I can’t see any support for creating measures, calculated columns or hierarchies…? I can’t see anything relating to Power Query either. Maybe I’m not looking in the right place; maybe something will come in a later build?
UPDATE: I’m an idiot – there is one minor change to the VBA support for the Excel Data Model, but actually almost everything that I see in 2016 is also present in 2013. Sorry…
There have been quite a few Power BI and Office BI-related announcements over the last few weeks, and while I’ve tweeted about them (I’m @Technitrain if you’re not following me already) I though it would be a good idea to summarise them all in one post.
Power BI Announcements at Convergence and SQLBits
You’ve probably already seen the announcement today on the Power BI blog that Power BI is FINALLY available to those of us outside the USA:
At last! I’m sure MS had very good reasons why they couldn’t make the Power BI Preview available worldwide back in December, but this decision caused a lot of frustration in the MS BI community and I hope it’s not something that happens again. I can also confirm that the Power BI iPhone app is now available in the UK as well. The new data sources for Power BI that are coming soon – especially Google Analytics – will be very popular I think.
While I’m on the topic of Power BI, a few interesting nuggets about upcoming functionality emerged at SQLBits last week. Kasper mentioned that there will be some new DAX functions appearing in Power BI soon: Median, Percentile, DateDiff and XPNV. Presumably they will appear when we get the ability to create DAX measures and calculated columns in the Power BI Dashboard Designer. Also, following on from the bidirectional relationships functionality I blogged about earlier this year, there was the news that Power BI will also understand 1:1 relationships as well as 1:many, many:1 and many:many.
Office 2016 Preview BI Features
The Office 2016 preview went public today too:
There’s a great overview of what’s new for BI in Office 2016 here:
The main points are:
- Power Query is now a native feature of Excel 2016.
- Power View works on SSAS Multidimensional (this is only going to work on the versions of SSAS Multidimensional that support DAX queries, ie SSAS 2014 or SSAS 2012 SP2)
- New Excel forecasting functions
- Time grouping functionality in PivotTables
I’ll be writing a more detailed blog on all of this at some point soon, once I know what’s officially public and what isn’t.
The Power Query announcement is interesting because, as things stand at the moment, we’ll be able to use full Power Query, Power Pivot and Power View functionality for free in the Power BI Dashboard Designer, but in Excel the same functionality is restricted to users of the Professional Plus SKUs. This is crazy, and I hope Microsoft makes the Power add-ins available for every SKU of Excel 2016. Have you signed the petition for this yet?
Last week the Power Map team released a new video showcasing functionality from an upcoming release:
Although there are no details about what is shown in the video, it certainly looks like the ability to use custom shapes (the main missing feature in Power Map up to now) will be coming soon.
I wonder if I can justify buying one for demo purposes?
It’s a general rule of the internet that, whenever you have a cool idea, a few minutes spent on your favourite search engine reveals that someone else has had the idea before you. In my case, when I first saw the functionality in Power Query for working with binary files I wondered whether it was possible to read the contents of a file containing an image and render each pixel as a cell in a worksheet – and of course, it has already been done and done better than I could ever manage. However, it hasn’t been done in Power Query… until now.
First of all, I have to acknowledge the help of Matt Masson whose blog post on working with binary data in Power Query provided a number of useful examples. I also found this article on the bmp file format invaluable.
Second, what I’ve done only works with monochrome bmp files. I could have spent a few more hours coming up with the code to work with other file types but, frankly, I’m too lazy. I have to do real work too, you know.
So let’s see how this works. Here’s a picture of Fountains Abbey that I took on my phone while on holiday last summer:
I opened it in Paint and saved it as a monochrome bmp file:
Here’s the code for the Power Query query that opens the bmp file and renders the contents in Excel:
let //The picture to load SourceFilePath="C:\Users\Chris\Pictures\FountainsAbbey.bmp", //Or get the path from the output of a query called FileName //SourceFilePath=FileName, //Load the picture SourceFile=File.Contents(SourceFilePath), //First divide the file contents into two chunks: //the header of the file, always 62 bytes //and the rest, which contains the pixels //Define the format as a record OverallFormat=BinaryFormat.Record([ Header = BinaryFormat.Binary(62), Pixels = BinaryFormat.Binary() ]), //Load the data into that format Overall = OverallFormat(SourceFile), //Get the header data HeaderData = Overall[Header], //Extract the total file size and //width and height of the image HeaderFormat = BinaryFormat.Record([ Junk1 = BinaryFormat.Binary(2), FileSize = BinaryFormat.ByteOrder( BinaryFormat.SignedInteger32, ByteOrder.LittleEndian), Junk2 = BinaryFormat.Binary(12), Width = BinaryFormat.ByteOrder( BinaryFormat.SignedInteger32, ByteOrder.LittleEndian), Height = BinaryFormat.ByteOrder( BinaryFormat.SignedInteger32, ByteOrder.LittleEndian), Junk3 = BinaryFormat.Binary() ]), HeaderValues = HeaderFormat(HeaderData), FileSize = HeaderValues[FileSize], ImageWidth = HeaderValues[Width], ImageHeight = HeaderValues[Height], //Each pixel is represented as a bit //And each line is made up of groups of four bytes BytesPerLine = Number.RoundUp(ImageWidth/32)*4, //Read the pixel data into a list PixelListFormat = BinaryFormat.List( BinaryFormat.ByteOrder( BinaryFormat.Binary(BytesPerLine), ByteOrder.LittleEndian)), PixelList = PixelListFormat(Overall[Pixels]), //Convert each byte to a number PixelListNumbers = List.Transform(PixelList, each Binary.ToList(_)), //A function to convert a number into binary //and return a list containing the bits GetBinaryNumber = (ValueToConvert as number) as list => let BitList = List.Generate( ()=>[Counter=1, Value=ValueToConvert], each [Counter]<9, each [Counter=[Counter]+1, Value=Number.IntegerDivide([Value],2)], each Number.Mod([Value],2)), BitListReversed = List.Reverse(BitList) in BitListReversed, //A function to get all the bits for a single line //in the image GetAllBitsOnLine = (NumberList as list) => List.FirstN( List.Combine( List.Transform(NumberList, each GetBinaryNumber(_) ) ), ImageWidth), //Reverse the list - the file contains the pixels //from the bottom up PixelBits = List.Reverse( List.Transform(PixelListNumbers, each GetAllBitsOnLine(_))), //Output all the pixels in a table OutputTable = #table(null, PixelBits) in OutputTable
The output of this query is a table containing ones and zeroes and this must be loaded to the worksheet. The final thing to do is to make the table look like a photo by:
- Hiding the column headers on the table
- Using the ‘None’ table style so that there is no formatting on the table itself
- Hiding the values in the table by using the ;;; format (see here for more details)
- Zooming out as far as you can on the worksheet
- Resizing the row heights and column widths so the image doesn’t look too squashed
- Using Excel conditional formatting to make the cells containing 0 black and the cells containing 1 white:
Here’s the photo rendered as cells in the workbook:
And here it is again, zoomed in a bit so you can see the individual cells a bit better:
You can download the workbook (which I’ve modified so you can enter the filename of your bmp file in a cell in the worksheet, so you don’t have to edit the query – but you will have to turn Fast Combine on as a result) from here. Have fun!
Anyone who has tried to do any serious work with Power Pivot and Power Query will know about this problem: you use Power Query to load some tables into the Data Model in Excel 2013; you make some changes in the Power Pivot window; you then go back to Power Query, make some changes there and you get the dreaded error
We couldn’t refresh the table ‘xyz’ from the connection ‘Power Query – xyz’. Here’s the error message we got:
COM Error: Microsoft.Mashup.OleDbProvider; The query ‘xyz’ or one of its inputs was modified in Power Query after this connection was added. Please disable and re-enable loading to the Data Model for this query..
This post has a solution for the same problem in Excel 2010, but it doesn’t work for Excel 2013 unfortunately. There is a lot of helpful information out there on the web about this issue if you look around, though, and that’s why I thought it would be useful to bring it all together into one blog post and also pass on some hints and tips about how to recover from this error if you get it. This is the single biggest source of frustration among the Power Query users I speak to; a fix for it is being worked on, and I hope it gets released soon.
Why does this problem occur? Let’s take a simple repro.
- Import the data from a table in SQL Server using Power Query. Load it into the Excel Data Model.
- Open the PowerPivot window in Excel, then create measures/calculated fields, calculated columns, relationships with other tables as usual.
- Go back to the worksheet and build a PivotTable from data in this table, using whatever measures or calculated columns you have created.
- Go back to the PowerPivot window and rename one of the columns there. The column name change will be reflected in the PivotTable and everything will continue to work.
- Re-open the Power Query query editor, and then rename any of the columns in the table (not necessarily the one you changed in the previous step). Close the query editor window and when the query refreshes, bang! you see the error above. The table in the Excel Data Model is unaffected, however, and your PivotTable continues to work – it’s just that now you can’t refresh the data any more…
- Do what the error message suggests and change the Load To option on the Power Query query, unchecking the option to load to the Data Model. When you do this, on the very latest build of Power Query, you’ll see a “Possible Data Loss” warning dialog telling you that you’ll lose any customisations you made. Click Continue, and the query will be disabled. The destination table will be deleted from your Excel Data Model and your PivotTable, while it will still show data, will be frozen.
- Change the Load To option on the query to load the data into the Excel Data Model again. When you do this, and refresh the data, the table will be recreated in the Excel Data Model. However, your measures, calculated columns and relationships will all be gone. What’s more, although your PivotTable will now work again, any measures or calculated columns you were using in it will also have gone.
- Swear loudly at your computer and add all the measures, calculated columns and relationships to your Data Model all over again.
So what exactly happened here? The important step is step 4. As Miguel Llopis of the Power Query team explains here and here, when you make certain changes to a table in the Power Pivot window the connection from your Power Query query to the Excel Data Model goes into ‘read-only’ mode. This then stops Power Query from making any subsequent changes to the structure of the table.
What changes put the connection to the Excel Data Model in ‘read-only’ mode?
Here’s a list of changes (taken from Miguel’s posts that I linked to above) that you can make in the PowerPivot window that put the connection from your query to the Data Model into ‘read-only’ mode:
- Edit Table Properties
- Column-level changes: Rename, Data type change, Delete
- Table-level changes: Rename, Delete
- Import more tables using Power Pivot Import Wizard
- Upgrade existing workbook
How can you tell whether my connection is in ‘read-only’ mode?
To find out whether your connection is in ‘read-only’ mode, go to the Data tab in Excel and click on the Connections button. Then, in the Workbook Connections dialog you’ll see the connection from Power Query to the Data Model listed – it will be called something like ‘Power Query – Query1’ and the description will be ‘Connection to the Query1 query in the Data Model’. Select this connection and click on the Properties button. When the Connection Properties dialog opens, go to the Definition tab. If the connection is in read-only mode the properties will be greyed out, and you’ll see the message ‘Some properties cannot be changed because this connection was modified using the PowerPivot Add-In’. If you do see this message, you’re already in trouble!
How to avoid this problem
Avoiding this problem is pretty straightforward: if you’re using Power Query to load data into the Excel Data Model, don’t make any of the changes listed above in the PowerPivot window! Make them in Power Query instead.
How to recover from this problem
But what if your connection is already in ‘read-only’ mode? There is no magic solution, unfortunately, you are going to have to rebuild your model. However there are two things you can do to reduce the amount of pain you have to go through to recreate your model.
First, you can use the DISCOVER_CALC_DEPENDENCY DMV to list out all of your measure and calculated column definitions to a table in Excel. Here’s some more information about the DMV:
To use this, all you need to do is to create a DAX query table in the way Kasper shows at the end of this post, and use the query:
select * from $system.discover_calc_dependency
Secondly, before you disable and re-enable your Power Query query (as in step 6 above), install the OLAP PivotTable Extensions add-in (if you don’t already have it) and use its option to disable auto-refresh on all of your PivotTables, as described here:
Doing this prevents the PivotTables from auto-refreshing when the table is deleted from the Data Model when you disable the Power Query query. This means that they remember all of their references to your measures and calculated columns, so when you have recreated them in your Data Model (assuming that all of the names are still the same) and you re-enable auto-refresh the PivotTables will not have changed at all and will continue to work as before.
[After writing this post, I realised that Barbara Raney covered pretty much the same material in this post: http://www.girlswithpowertools.com/2014/06/power-query-refresh-fails/ . I probably read that post when it was published and then forgot about it. I usually don’t blog about things that other people have already blogged about, but since I’d already done the hard work and the tip on using OLAP PivotTable Extensions is new, I thought I’d post anyway. Apologies…]
One of the more recent additions to Power Query is the ability to access data from named ranges in the Excel worksheet rather than an Excel table. I’ve got used to formatting data as tables in Excel because that’s what Power Pivot needs to import data directly from the worksheet, but if you are working with Power Query and a pre-existing workbook then trying to reformat data as tables can be a pain. Also, if you just want to import a single value, for example as a parameter to a query, a table seems like overkill.
(Incidentally, if you’re wondering what a named range is in Excel, there are tons of good introductions to the subject on the internet like this one. You can do loads of cool stuff with them.)
Consider the following Excel worksheet:
There are three named ranges here: FirstRange, SecondRange, ThirdRange, and the values in the cells show which range the cells are in. FirstRange consists of two cells in two columns; SecondRange consists of three cells in a single row; and ThirdRange is consists of three, non-contiguous cells. (You can also use this trick to display the names of all contiguous ranges in an Excel workbook, but alas it does seem to work for non-contiguous ranges).
At the moment, the Power Query ribbon doesn’t make it obvious that you can use named ranges as data sources. However you can see all the tables and cells in a worksheet, and even return that list from a query, by creating a blank query and using the expression
Once you’ve done this you can see all the ranges (and also any tables) in the workbook, and click on the table link next to the name to see the data. For example, clicking on FirstRange shows the following table in a new step in the query editor:
The expression to get at this table in a single step is:
The range SecondRange in my example is equally straightforward to reference, and you can see its contents by using the expression
Unfortunately ThirdRange, which is not contiguous, is a problem: I can only get the first cell in the range. So the expression
Returns just this table:
It would be nice if we could get a list containing the cell values, rather than a table, for ranges like this…
Last thing to mention is that if you do want the value in a cell, rather than a table, you just need to right-click inside the cell in the Query Editor and select Drill Down:
This returns the value (in this case the text “Third Range Cell 1”) in the cell you clicked on:
This is a much more useful value to return than a table containing a single row/column, if you intend to use a value from a single cell in a named range as a parameter to another query.
You can download the sample workbook for this post here.
I had an interesting challenge from a customer yesterday – one of those problems that I’d known about for a long time but never got round to working out the solution for…
Consider the following PivotTable, based on a PowerPivot model using Adventure Works data, in Excel 2010:
It shows the top 10 products by the measure Sum of Sales. There are two slicers, and the top 10 shown in the PivotTable reflects the selections made in the slicers. All of this works fine. But what if you want to use Excel cube functions to do the same thing? You can write the MDX for the top 10 products quite easily and use it in the CubeSet() function in your worksheet, but how can you get your MDX set expression to respect the selection made in the slicers?
The solution to this problem is very similar to the trick I showed here – finding the selected items in a slicer is not easy! Here are the steps I followed to do it:
- Add the slicers for EnglishOccupation and CalendarYear to a new worksheet
- Go to Slicer Settings and uncheck the box for “Visually indicate items with no data”
- Add two new PivotTables to the worksheet. Connect one to the EnglishOccupation slicer and put EnglishOccupation on rows; connect the other to the CalendarYear slicer and put CalendarYear on rows.
- Use the OLAPPivotTableExtensions add-in (which you can download here) to add new MDX calculated measures to each PivotTable. For the EnglishOccupation PivotTable call the measure SelectedOccupations and use the following MDX:
This expression does the following: it uses the Axis() function to find the set of members selected on what Excel thinks of as the rows axis in the PivotTable (actually the MDX columns axis), then uses Except() to remove the All Member from the hierarchy (which Excel uses for the Grand Totals) and then uses SetToStr() to take that set and return the string representation of it. Do the same thing for the PivotTable showing CalendarYear too, calling the calculated measure SelectedYears; the MDX in this case is:
This is what the EnglishOccupation PivotTable should look like:
- Next, to make things easy, use Excel formulas to get the values from the top cell inside each PivotTable into cells elsewhere in the worksheet, and give these cells the names SelectedOccupations and SelectedYears.
- Then enter a CubeSet() function into a new cell using the following formula:
Sum(" & SelectedOccupations & " * " & SelectedYears & ",[Measures].[Sum of Sales])
"Top 10 Set")
What this does is use the TopCount() function to find the top 10 Products, and in the third parameter of this function which is the numeric expression to find the top 10 by, it crossjoins the two sets of selected occupations and selected years and then sums the output of the crossjoin by the measure [Sum of Sales].
- Last of all, build your report using the Excel cube functions as normal, using the CubeRankedMember() function to get each item from the top 10 set created in the previous step.
You can download my sample workbook here.
The bad news about this technique is that it doesn’t work in Excel 2013 and Power Pivot. It’s no longer possible to create MDX calculated measures on Power Pivot models in Excel 2013, alas. It will work if you’re using any version of Excel from 2007 on against Analysis Services and, as I show here, Excel 2010 and PowerPivot. If you are using Power Pivot and Excel 2013 it might be possible to create a DAX measure to do the same as the MDX I’ve used here (I’m wondering if the technique Jason describes here will work). It would certainly be possible to use CubeRankedMember() to find each item selected in the slicer, as Erik Svensen shows here, and then use Excel formulas to find the MDX unique name for each selected member and concatenate these unique names to create the set expression that my calculated measures return, but that’s a topic for another post. This really should be a lot easier than it is…