Archive for the ‘Power BI’ Category
When I first saw the announcement about Office Sway last week, I thought – well, you can probably guess what I thought. Does it have any potential for BI? After all, the Sway team are clearly targeting business users (as well as hipster designers and schoolchildren): look at the Northwest Aquarium and Smith Fashion Expansion samples, and notice that they contain tables, charts and infographics. What’s more, data storytelling is currently a very hot concept and Sway is clearly all about telling stories. Wouldn’t it be cool if you could have interactive PivotTables, PivotCharts and Power View reports from your Power BI site embedded in a Sway? It would be a much more engaging way of presenting data than yet another PowerPoint deck.
I have no idea whether any integration between Sway and Power BI is actually planned (I have learned not to get my hopes up about this type of thing), but even if it isn’t maybe someone at Microsoft will read this post and think about the possibilities… And isn’t this kind of collaboration between different teams supposedly one of the advantages Microsoft has over its competitors in the BI space?
PS I want a pink octopus costume just like the one that girl in the video has
A few months ago someone at a conference asked me what the Power Query Table.Partition() function could be used for, and I had to admit I had no idea. However, when I thought about it, I realised one obvious use: for creating histograms! Now I know there are lots of other good ways to create histograms in Excel but here’s one more, and hopefully it will satisfy the curiosity of anyone else who is wondering about Table.Partition().
Let’s start with a table in Excel (called “Customers”) containing a list of names and ages:
Here’s the M code for the query to find the buckets:
And here’s the output in Excel, with a bar chart:
How does this work?
- After loading the data from the Excel table in the Source step, the first problem is to determine how many buckets we’ll need. This is fairly straightforward: I use Table.Column() to get a list containing all of the values in the Age column, then use List.Max() to find the maximum age, then divide this number by ten and round up to the nearest integer.
- Now for Table.Partition(). The first thing to understand about this function is what it returns: it takes a table and returns a list of tables, so you start with one table and end up with multiple tables. Each row from the original table will end up in one of the output tables. A list object is something like an array.
- One of the parameters that the Table.Partition() function needs is a hash function that determines which bucket table each row from the original table goes into. The BucketHashFunction step serves this purpose here: it takes a value, divides it by ten and rounds the result down; for example pass in the age 88 and you get the value 8 back.
- The CreateBuckets step calls Table.Partition() with the four parameters it needs: the name of the table to partition, the column to partition by, the number of buckets to create and the hash function. For each row in the original table the age of each customer is passed to the hash function. The number that the hash function returns is the index of the table in the list that Table.Partition() returns. In the example above nine buckets are created, so Table.Partition() returns a list containing nine tables; for the age 8, the hash function returns 0 so the row is put in the table at index 0 in the list; for the age 88 the hash function returns 8, so the row is put in the table at index 8 in the list. The output of this step, the list of tables, looks like this:
- The next thing to do is to convert the list itself to a table, then add a custom column to show the names for each bucket. This is achieved by adding a zero-based index column and then using that index value to generate the required text in the step #”Added Custom”.
- Next, find the number of customers in each bucket. Remember that at this point the query still includes a column (called “Column1”) that contains a value of type table, so all that is needed is to create another custom column that calls Table.RowCount() for each bucket table, as seen in the step #”Added Custom1”.
- Finally I remove the columns that aren’t needed for the output table.
I’m not convinced this is the most efficient solution for large data sets (I bet query folding stops very early on if you try this on a SQL Server data source) but it’s a good example of how Table.Partition() works. What other uses for it can you think of?
You can download the sample workbook here.
Anyone who has tried to do any serious work with Power Pivot and Power Query will know about this problem: you use Power Query to load some tables into the Data Model in Excel 2013; you make some changes in the Power Pivot window; you then go back to Power Query, make some changes there and you get the dreaded error
We couldn’t refresh the table ‘xyz’ from the connection ‘Power Query – xyz’. Here’s the error message we got:
COM Error: Microsoft.Mashup.OleDbProvider; The query ‘xyz’ or one of its inputs was modified in Power Query after this connection was added. Please disable and re-enable loading to the Data Model for this query..
This post has a solution for the same problem in Excel 2010, but it doesn’t work for Excel 2013 unfortunately. There is a lot of helpful information out there on the web about this issue if you look around, though, and that’s why I thought it would be useful to bring it all together into one blog post and also pass on some hints and tips about how to recover from this error if you get it. This is the single biggest source of frustration among the Power Query users I speak to; a fix for it is being worked on, and I hope it gets released soon.
Why does this problem occur? Let’s take a simple repro.
- Import the data from a table in SQL Server using Power Query. Load it into the Excel Data Model.
- Open the PowerPivot window in Excel, then create measures/calculated fields, calculated columns, relationships with other tables as usual.
- Go back to the worksheet and build a PivotTable from data in this table, using whatever measures or calculated columns you have created.
- Go back to the PowerPivot window and rename one of the columns there. The column name change will be reflected in the PivotTable and everything will continue to work.
- Re-open the Power Query query editor, and then rename any of the columns in the table (not necessarily the one you changed in the previous step). Close the query editor window and when the query refreshes, bang! you see the error above. The table in the Excel Data Model is unaffected, however, and your PivotTable continues to work – it’s just that now you can’t refresh the data any more…
- Do what the error message suggests and change the Load To option on the Power Query query, unchecking the option to load to the Data Model. When you do this, on the very latest build of Power Query, you’ll see a “Possible Data Loss” warning dialog telling you that you’ll lose any customisations you made. Click Continue, and the query will be disabled. The destination table will be deleted from your Excel Data Model and your PivotTable, while it will still show data, will be frozen.
- Change the Load To option on the query to load the data into the Excel Data Model again. When you do this, and refresh the data, the table will be recreated in the Excel Data Model. However, your measures, calculated columns and relationships will all be gone. What’s more, although your PivotTable will now work again, any measures or calculated columns you were using in it will also have gone.
- Swear loudly at your computer and add all the measures, calculated columns and relationships to your Data Model all over again.
So what exactly happened here? The important step is step 4. As Miguel Llopis of the Power Query team explains here and here, when you make certain changes to a table in the Power Pivot window the connection from your Power Query query to the Excel Data Model goes into ‘read-only’ mode. This then stops Power Query from making any subsequent changes to the structure of the table.
What changes put the connection to the Excel Data Model in ‘read-only’ mode?
Here’s a list of changes (taken from Miguel’s posts that I linked to above) that you can make in the PowerPivot window that put the connection from your query to the Data Model into ‘read-only’ mode:
- Edit Table Properties
- Column-level changes: Rename, Data type change, Delete
- Table-level changes: Rename, Delete
- Import more tables using Power Pivot Import Wizard
- Upgrade existing workbook
How can you tell whether my connection is in ‘read-only’ mode?
To find out whether your connection is in ‘read-only’ mode, go to the Data tab in Excel and click on the Connections button. Then, in the Workbook Connections dialog you’ll see the connection from Power Query to the Data Model listed – it will be called something like ‘Power Query – Query1’ and the description will be ‘Connection to the Query1 query in the Data Model’. Select this connection and click on the Properties button. When the Connection Properties dialog opens, go to the Definition tab. If the connection is in read-only mode the properties will be greyed out, and you’ll see the message ‘Some properties cannot be changed because this connection was modified using the PowerPivot Add-In’. If you do see this message, you’re already in trouble!
How to avoid this problem
Avoiding this problem is pretty straightforward: if you’re using Power Query to load data into the Excel Data Model, don’t make any of the changes listed above in the PowerPivot window! Make them in Power Query instead.
How to recover from this problem
But what if your connection is already in ‘read-only’ mode? There is no magic solution, unfortunately, you are going to have to rebuild your model. However there are two things you can do to reduce the amount of pain you have to go through to recreate your model.
First, you can use the DISCOVER_CALC_DEPENDENCY DMV to list out all of your measure and calculated column definitions to a table in Excel. Here’s some more information about the DMV:
To use this, all you need to do is to create a DAX query table in the way Kasper shows at the end of this post, and use the query:
select * from $system.discover_calc_dependency
Secondly, before you disable and re-enable your Power Query query (as in step 6 above), install the OLAP PivotTable Extensions add-in (if you don’t already have it) and use its option to disable auto-refresh on all of your PivotTables, as described here:
Doing this prevents the PivotTables from auto-refreshing when the table is deleted from the Data Model when you disable the Power Query query. This means that they remember all of their references to your measures and calculated columns, so when you have recreated them in your Data Model (assuming that all of the names are still the same) and you re-enable auto-refresh the PivotTables will not have changed at all and will continue to work as before.
[After writing this post, I realised that Barbara Raney covered pretty much the same material in this post: http://www.girlswithpowertools.com/2014/06/power-query-refresh-fails/ . I probably read that post when it was published and then forgot about it. I usually don't blog about things that other people have already blogged about, but since I'd already done the hard work and the tip on using OLAP PivotTable Extensions is new, I thought I'd post anyway. Apologies...]
You probably know that, when you are importing data from multiple tables in SQL Server into the Excel Data Model in Excel 2013 using Power Query, Power Query will automatically create relationships between those tables in the Data Model. But did you know that you can get Power Query to do this for other data sources too?
Now wait – don’t get excited. I’ve known about this for a while but not blogged about it because I don’t think it works all that well. You have to follow some very precise steps to make it happen and even then there are some problems. That said, I think we’re stuck with the current behaviour (at least for the time being) so I thought I might as well document it.
Consider the following Excel worksheet with two tables in it, called Dimension and Fact:
If you were to load these two tables into the Excel Data Model, you would probably want to create a relationship between the two tables based on the FruitID column. Here are the steps to use Power Query to create the relationship automatically:
- Click inside the Dimension table and then, on the Power Query tab in the Excel ribbon, click the From Table button to create a new query.
- When the Query Editor window opens, right click on the FruitID column and select Remove Duplicates.
Why are we doing this when there clearly aren’t any duplicate values in this column? The new step contains the expression
…and one of the side-effects of using Table.Distinct() is that it adds a primary key to the table. Yes, tables in Power Query can have primary keys – the Table.AddKey() function is another way of doing this. There’s a bit more information on this subject in my Power Query book, which I hope you have all bought!
- Click the Close & Load to.. button to close the Query Editor, and then choose the Only Create Connection option to make sure the output of the query is not loaded anywhere and the query is disabled, then click the Load button. (Am I the only person that doesn’t like this new dialog? I thought the old checkboxes were much simpler, although I do appreciate the new flexibility on where to put your Excel table output)
- Click inside the Fact table in the worksheet, click the From Table button again and this time do load it into the Data Model.
- Next, in the Power Query tab in the Excel ribbon, click the Merge button. In the Merge dialog select Dimension as the first table, Fact as the second, and in both select the FruitID column to join on.
- Click OK and the Query Editor window opens again. Click the Close & Load to.. button again, and load this new table into the Data Model.
- Open the Power Pivot window and you will see that not only have your two tables been loaded into the Data Model, but a relationship has been created between the two:
What are the problems I talked about then? Well, for a start, if you don’t follow these instructions exactly then you won’t get the relationship created – it is much harder than I would like. There may be other ways to make sure the relationships are created but I haven’t found them yet (if you do know of an easier way, please leave a comment!). Secondly if you delete the two tables from the Data Model and delete the two Power Query queries, and then follow these steps again, you will find the relationship is not created. That can’t be right. Thirdly, I don’t like having to create a third query with the Merge, and would prefer it if I could just create two queries and define the relationship somewhere separately. With all of these issues I don’t think there’s any practical use for this functionality right now.
I guess the reason I think the ability to create relationships automatically is so important is because the one thing that the Excel Data Model/Power Pivot/SSAS Tabular sorely lacks is a simple way to script the structure of a model. Could Power Query and M one day be the modelling language that Marco asks for here? To be fair to the Power Query team this is not and should not be their core focus right now: Power Query is all about data acquisition, and this is data modelling. If this problem was solved properly it would take a lot of thought and a lot of effort. I would love to see it solved one day though.
You can download the sample workbook for this post here.
Seems like another new bit of Power BI functionality got released today: the ability to optimize your data model for Q&A in the browser. Here’s the link to the docs:
Previously, the ability to add synonyms to your model to improve the results you got from Q&A was only available in Excel on the desktop, inside the Power Pivot window. Now you can do this, as well as new stuff like add phrasings (described here) and view usage reports, in your Power BI site.
I won’t repeat what the docs say about the actual functionality, but this seems to be yet more evidence that Excel on the desktop is no longer the central hub for Power BI. If this is the case, this is a massive strategic change, and I can understand why it has happened: the need for the ‘right’ version of Excel on the desktop is a massive roadblock for Power BI adoption, especially in enterprise accounts (see also Jen Underwood’s comments on this from yesterday). Maybe now it’s BI in the browser instead?
OK, so I’m not at WPC this year but I have just watched this video of Scott Guthrie’s session “The Cloud for Modern Business”. If you’re interested in seeing some new Power BI features take a look at the demo by James Phillips, general manager for Power BI, starting at 21:20:
Some of the new things I noticed:
- 21:40 – a nice shot of one of the new Power BI dashboards first announced at the PASS BA Conference earlier this year. You can see several new types of visualisation such as treemaps, radar charts and gauges (gauges? GAUGES? Shhh, don’t tell Stephen Few).
- 22:33 – a list of out-of-the box data sources is shown from which new models can be created. They include: Salesforce, MS Dynamics, Facebook, Google Analytics, Twitter, and Upload Excel.
- 22:50 – data is imported from Salesforce in the browser. This isn’t happening in Excel on the desktop, folks, it’s in the browser. This is significant!
- 23:10 – another new visualisation shown, a doughnut chart (if that’s the right term). I see names of people from the Power Query team in the data.
- 24:50 – a Q&A analysis is pinned to the dashboard
- 25:50 – much is made of the fact that the dashboard is touch-enabled
- 25:55 – “Partner Solution Packs” are announced. This sounds important! It seems to be referring to the Salesforce demo earlier, and these solution packs are said to include: data, connectivity to the data sources, visualisation and interactive reports. So it sounds like Microsoft are going to encourage data vendors (or other sources of data) to build these solution packs on top of Power BI as pre-packaged analytical apps. Probably a good idea.
- 26:15 – editing a dashboard in the browser and swapping one visualisation for another. Again, the HTML 5 browser based editing experience – we haven’t seen Excel once in this demo.
- 27:55 – “If there was ever a partner opportunity, this is it”. Again much emphasis here. Seems like these new Power BI features, especially the solution packs, are aimed at giving partners incentives to sell and customise Power BI (something which they have not had up to now, to be honest).
Oh, and you probably already heard that Azure Machine Learning is now in public preview. Check out the docs and samples here. I wouldn’t be surprised if there was some integration between this and Power BI to come too.
Looking for some summer holiday (or winter holiday, depending on which hemisphere you live in) reading? If so, may I suggest my new Power Query book? “Power Query for Power BI and Excel” is available now from the Apress site, Amazon.com, Amazon.co.uk and all good bookstores.
It’s an introductory level book. It covers all of the stuff you can do in the UI, it has a chapter on M, and it goes into a reasonable amount of detail on more advanced topics; it is not a 500-page exhaustive guide to the product. I’ve focused on readability and teaching the fundamentals of Power Query rather than every looking at every obscure M function, but at the same time if you’ve already used Power Query I think there’ll be plenty of material in there you’ll find interesting.
Now for the bad news: the book is out-of-date already, although not by much. One of the best things about Power Query is the monthly release cycle; unfortunately that makes writing a book on it a bit of a nightmare. I started off writing in January and had to deal with lots of added functionality and changes to the UI over the next few months; I had to retake pretty much all of the screenshots as a result. The published version of the book is based on the version of Power Query that was released in early June rather than the current version. Hopefully you can forgive this – the differences are minor – but it’s a good reason to buy the book as soon as you can! I want to do a second edition in a year’s time once (if?) the release cycle slows down.
I’ve been teased a bit for blogging and teaching so much about Power Query recently, so the final thing I want to say here is why an old corporate BI/SSAS guy like me is getting so excited about a self-service ETL tool. Well, the main reason is that Power Query is a great piece of software. It does what it does very well; it does useful things rather than what the marketing guys/analysts/journalists think is hot in BI; it is easy to use but at the same time is flexible enough for the advanced user to do really complex stuff; it is updated regularly based on feedback from its users. I only wish all Microsoft software was this good… Honestly, I wouldn’t be able to motivate myself to blog and write about Power Query if I didn’t think it was cool, and even though it hasn’t been hyped in the same way as other parts of the Power BI stack it is nonetheless the part that people get excited about when I show them Power BI. It’s not just me either – every day I see positive comments like Greg Low’s here. I think it is as important, if not more important, than Power Pivot and I think it will be a massive success.
Oh, and did I mention that I’m also teaching a Power Query course in London later this year….?