Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Comparers, Combiners, Replacers and Splitters in Power Query

with 8 comments

At the end of the Power Query Formula Library Specification (which can be downloaded here) are sections on Comparer, Combiner, Replacer and Splitter functions. These functions are most often used in conjunction with other functions like Table.CombineColumns() and Table.SplitColumn, but what you may not realise from the documentation (which also has a few minor but nonetheless confusing bugs in it) is what these functions do: they are functions that return functions, and the functions that they return can be used independently just like any other function.

Take Splitter.SplitTextByDelimiter() as an example. It returns a function that splits a piece of text by a delimiter, and returns a list containing the resulting pieces. The following M code calls this function to return a function that splits comma delimited text:


    demo = Splitter.SplitTextByDelimiter(",")




As noted here, once you have a query that returns a function you can see that function’s signature and invoke it from the Query Editor window. Here’s what the query above shows in the Query Editor window:


If you click the Invoke button and enter the text


As follows:


So that the code for the query becomes:


    demo = Splitter.SplitTextByDelimiter(","),

    Invokeddemo = demo("one,two,three,four")



What is returned is the list {“one”, “two”, “three”, “four”} which looks like this in the Query Editor window:


There are various other Splitter functions that can be used to return functions that split text in different ways. Similarly, the Combiner functions return functions that can be used to combine a list of text into a single piece of text. For example:


    demo = Combiner.CombineTextByDelimiter("--"),

    Invokeddemo = demo({"one","two","three","four"})



Returns the text


The Replacer functions return functions for replacing values in text , while the Comparer functions return functions that can be used for comparing text using specific cultures and case sensitivities.

Written by Chris Webb

August 11, 2014 at 9:30 am

Posted in Power Query

The Ethics Of Big Data

with 4 comments

Some time ago I received a review copy of a book called “Ethics Of Big Data” from O’Reilly; I didn’t get round to writing a review of it here for a number of reasons but, despite its flaws (for example its brevity and limited scope), it’s worth reading. It deals with the ethics of data collection and data analysis from a purely corporate point of view: if organisations do not think carefully about what they are doing then

“Damage to your brand and customer relationships, privacy violations, running afoul of emerging legislation, and the possibility of unintentionally damaging reputations are all potential risks”

All of which is true, although I think what irked me about the book when I read it was that it did not tackle the wider and (to my mind) more important question of the social impact of new data technologies and their application. After all, this is what you and I do for a living – and I know that I haven’t spent nearly enough time thinking these issues through.

What prompted me to think about this again was a post by Adam Curtis which argues that the way that governments and corporations are using data is stifling us on a number of levels from the personal to the political:

“What Amazon and many other companies began to do in the late 1990s was build up a giant world of the past on their computer servers. A historical universe that is constantly mined to find new ways of giving back to you today what you liked yesterday – with variations.

Interestingly, one of the first people to criticise these kind of “recommender systems” for their unintended effect on society was Patti Maes who had invented RINGO. She said that the inevitable effect is to narrow and simplify your experience – leading people to get stuck in a static, ever-narrowing version of themselves.

Stuck in the endless you-loop.”

Once our tastes and opinions have been reduced to those of the cluster the k-means algorithm has placed us in we have become homogenised and easier to sell to, a slave to our past behaviour. Worse, the things we have in common with the people in other clusters become harder to see. Maybe all of this is inevitable, but if there is going to be an informed debate on this then shouldn’t we, as the people who actually implement these systems, take part in it?

Written by Chris Webb

August 4, 2014 at 9:30 am

Posted in Random Thoughts

Calculating The Value Of Overdue Invoices Using Many-To-Many Relationships in SSAS Multidimensional

with 11 comments

Recently I had to solve the same, rather complex, problem for three different customers within the space of two weeks, a variation of the ‘events-in-progress’ problem I’ve blogged about a lot in the past. It’s this one: how can you calculate the value of your overdue invoices on any given date? It’s important to stress that we are not talking about the invoices that are overdue today – we want to be able to pick any date in the past and find out the value of invoices that were overdue at that point in time.

Let’s imagine you have a fact table containing payments against invoices: each row has an invoice number, the date of the payment, the payment amount, and the date that the invoice is due. A positive value in the Amount column indicates that this is the opening amount of the invoice; negative values in the Amount column are payments against the invoice. Multiple payments can be made against an invoice before it is fully closed. Here’s some example data:


Looking at the rows highlighted for invoice 5, you can see in the upper box that there is a positive amount of £35 shown on January 1st 2014 – this is the date that the invoice was opened, and £35 is the full value of the invoice. The invoice is due on January 10th 2014. In the lower box you can see there were four separate payments of £5, £5, £10 and £15 before the invoice was fully paid off on January 12th 2014.

Now, if you were to do a total-to-date in MDX (similar to what I describe here) it would be pretty easy to calculate the outstanding amount on all invoices on any given date. However the requirement here is not only to do that, but to break the value down so that you can see what the total value for overdue invoices and non-overdue (which I’m going to call backlog in this post) invoices is on any given date. This is tricky to do because we have to take the Due Date of each invoice into account as well as the Payment Date. The desired output for invoice 5 is this:


As you can see in this PivotTable, for invoice 5 £25 was still outstanding on January 9th 2014, but this is shown as backlog because this is before the due date of January 10th. On January 11th, one day after the due date, the remaining outstanding amount of £15 is shown as overdue. Of course, we also want to be able to calculate the correct values for all invoices:


One way of solving this problem would be to calculate the overdue and backlog values for each date that each invoice is open in your ETL, and store these values in a snapshot fact table. This works, and will give you the best possible query performance, but it has two major drawbacks: first, it makes your ETL much slower, and second it makes your fact table much larger. This post describes how you can calculate the overdue and non-overdue values on any given date using many-to-many relationships instead, without having to blow out the fact table.

To go along with the fact table (called FactInvoice) shown above, I have a date dimension table called DimDate (I have deliberately reduced the number of rows here to the dates I have data for, for reasons that will become clear soon):


I have a dimension table containing all of the days that an invoice can be overdue or not overdue for (which has one less than double the number of rows as the date dimension table), plus a second column classifying each row as ‘Backlog’ or ‘Overdue’:


… and an invoice dimension table that just contains the distinct invoice numbers called DimInvoice.

Now, let me explain how to build the SSAS cube.

Step 1

  • Build dimensions from all of the tables shown above, so you have dimensions called Date, Overdue Days and Invoice.
  • Build a cube with one measure group, based on FactInvoice, and create one measure with AggregateFunction Sum based on the Amount column in that table.
  • Add the Invoice dimension to the cube with a regular relationship. Add the Date dimension to the cube twice, as Payment Date and Due Date, with regular relationships on the PaymentDate and DueDate columns. The Dimension Usage tab should look like this:


Step 2

  • Create the following view in SQL Server:
  • CREATE VIEW [dbo].[FactDateToPaymentDate]
    SELECT        a.DateKey, b.DateKey AS PaymentDateKey
    FROM            dbo.DimDate AS a INNER JOIN
                             dbo.DimDate AS b ON a.DateKey >= b.DateKey

    This view returns all of the combinations of a given date and all dates up to and including the date.

  • Add this view to the DSV and create a new measure group from it; you’ll need to create a measure here, but it can just be a Count measure.
  • Add the Date dimension to the cube once again, this time leaving the name as Date (you will now have three role-playing copies of the Date dimension in the cube).
  • Set up regular relationships between the Date and Payment Date dimensions and the new measure group, then a many-to-many relationship between Date and the Invoice measure group. This means that when you query the cube by the Date dimension, the many-to-many relationship will mean you see the sum of all Amounts whose payment date is up to and including the date selected. The Dimension Usage tab will look like this:


Step 4

  • Create another view in SQL Server with this definition:
  • CREATE VIEW [dbo].[FactDueDateToAsOfDate]
    SELECT        a.DateKey AS DueDateKey, b.DateKey AS AsOfDateKey, DATEDIFF(dd, a.FullDate, b.FullDate) AS OverDueDays
    FROM            dbo.DimDate AS a CROSS JOIN
                             dbo.DimDate AS b

    Yes, I am cross joining the DimDate table with itself and yes, this could return a lot of rows. However you should find that the view is very fast to execute.

  • Add this view to the DSV and build another measure group from it, again with a single Count measure on it.
  • Set up a regular relationship between this new measure group and the Due Date dimension.
  • Add the Overdue Days dimension to the cube, give it a regular relationship with the new measure group and a many-to-many relationship with the Invoice measure group through the new measure group.
  • Add the Date dimension to the cube yet again, creating a new role-playing dimension called As Of Date, give it a regular relationship with the new measure group and a many-to-many relationship with the Invoice measure group through the new measure group. The Dimension Usage tab will now look like this:


The purpose of this measure group is this: if you select a date on the As Of Date dimension, you will be able to select ‘Overdue’ on the Overdue Days dimension and this will give you all of the dates on Due Date that were overdue on that date.

Step 5

  • You only want to have to select one date in your PivotTable, so create the following MDX calculated measures that take your selection on the Date dimension and applies it to the As Of Date dimension too:

LINKMEMBER([Date].[Date].CURRENTMEMBER, [As Of Date].[Date])

IIF(MEASURES.[Open Amount Hidden]=0, NULL, MEASURES.[Open Amount Hidden]);

  • Hide all measures apart from the calculated measures you’ve just created
  • Hide the Payment Date, Due Date and As Of Date dimensions


This is a very complex pattern, I know, and this is after I’ve simplified it a lot (if you need currency conversion as well then things get even worse) but I also know it’s extremely useful from a business point of view. Query performance is also reasonably good, at least in the places where I have implemented this.

You can download my sample SQL Server database and VS 2012 project here.

Written by Chris Webb

July 28, 2014 at 10:00 am

Working With Excel Named Ranges In Power Query

with 24 comments

One of the more recent additions to Power Query is the ability to access data from named ranges in the Excel worksheet rather than an Excel table. I’ve got used to formatting data as tables in Excel because that’s what Power Pivot needs to import data directly from the worksheet, but if you are working with Power Query and a pre-existing workbook then trying to reformat data as tables can be a pain. Also, if you just want to import a single value, for example as a parameter to a query, a table seems like overkill.

(Incidentally, if you’re wondering what a named range is in Excel, there are tons of good introductions to the subject on the internet like this one. You can do loads of cool stuff with them.)

Consider the following Excel worksheet:


There are three named ranges here: FirstRange, SecondRange, ThirdRange, and the values in the cells show which range the cells are in. FirstRange consists of two cells in two columns; SecondRange consists of three cells in a single row; and ThirdRange is consists of three, non-contiguous cells. (You can also use this trick to display the names of all contiguous ranges in an Excel workbook, but alas it does seem to work for non-contiguous ranges).

At the moment, the Power Query ribbon doesn’t make it obvious that you can use named ranges as data sources. However you can see all the tables and cells in a worksheet, and even return that list from a query, by creating a blank query and using the expression

= Excel.CurrentWorkbook()


Once you’ve done this you can see all the ranges (and also any tables) in the workbook, and click on the table link next to the name to see the data. For example, clicking on FirstRange shows the following table in a new step in the query editor:


The expression to get at this table in a single step is:

= Excel.CurrentWorkbook(){[Name="FirstRange"]}[Content]

The range SecondRange in my example is equally straightforward to reference, and you can see its contents by using the expression

= Excel.CurrentWorkbook(){[Name="SecondRange"]}[Content]


Unfortunately ThirdRange, which is not contiguous, is a problem: I can only get the first cell in the range. So the expression

= Excel.CurrentWorkbook(){[Name="ThirdRange"]}[Content]

Returns just this table:


It would be nice if we could get a list containing the cell values, rather than a table, for ranges like this…

Last thing to mention is that if you do want the value in a cell, rather than a table, you just need to right-click inside the cell in the Query Editor and select Drill Down:


This returns the value (in this case the text “Third Range Cell 1”) in the cell you clicked on:


This is a much more useful value to return than a table containing a single row/column, if you intend to use a value from a single cell in a named range as a parameter to another query.

You can download the sample workbook for this post here.

Written by Chris Webb

July 22, 2014 at 9:30 am

Posted in Excel, Power Query

New Power BI Q&A Functionality Released: Optimisation In The Browser

with one comment

Seems like another new bit of Power BI functionality got released today: the ability to optimize your data model for Q&A in the browser. Here’s the link to the docs:

Previously, the ability to add synonyms to your model to improve the results you got from Q&A was only available in Excel on the desktop, inside the Power Pivot window. Now you can do this, as well as new stuff like add phrasings (described here) and view usage reports, in your Power BI site.

I won’t repeat what the docs say about the actual functionality, but this seems to be yet more evidence that Excel on the desktop is no longer the central hub for Power BI. If this is the case, this is a massive strategic change, and I can understand why it has happened: the need for the ‘right’ version of Excel on the desktop is a massive roadblock for Power BI adoption, especially in enterprise accounts (see also Jen Underwood’s comments on this from yesterday). Maybe now it’s BI in the browser instead?

Written by Chris Webb

July 15, 2014 at 8:44 pm

Posted in Power BI, Q&A

New Power BI Features Shown At WPC

with 9 comments

OK, so I’m not at WPC this year but I have just watched this video of Scott Guthrie’s session “The Cloud for Modern Business”. If you’re interested in seeing some new Power BI features take a look at the demo by James Phillips, general manager for Power BI, starting at 21:20:

Some of the new things I noticed:

  • 21:40 – a nice shot of one of the new Power BI dashboards first announced at the PASS BA Conference earlier this year. You can see several new types of visualisation such as treemaps, radar charts and gauges (gauges? GAUGES? Shhh, don’t tell Stephen Few).
  • 22:33 – a list of out-of-the box data sources is shown from which new models can be created. They include: Salesforce, MS Dynamics, Facebook, Google Analytics, Twitter, and Upload Excel.
  • 22:50 – data is imported from Salesforce in the browser. This isn’t happening in Excel on the desktop, folks, it’s in the browser. This is significant!
  • 23:10 – another new visualisation shown, a doughnut chart (if that’s the right term). I see names of people from the Power Query team in the data.
  • 24:50 – a Q&A analysis is pinned to the dashboard
  • 25:50 – much is made of the fact that the dashboard is touch-enabled
  • 25:55 – “Partner Solution Packs” are announced. This sounds important! It seems to be referring to the Salesforce demo earlier, and these solution packs are said to include: data, connectivity to the data sources, visualisation and interactive reports. So it sounds like Microsoft are going to encourage data vendors (or other sources of data) to build these solution packs on top of Power BI as pre-packaged analytical apps. Probably a good idea.
  • 26:15 – editing a dashboard in the browser and swapping one visualisation for another. Again, the HTML 5 browser based editing experience – we haven’t seen Excel once in this demo.
  • 27:55 – “If there was ever a partner opportunity, this is it”. Again much emphasis here. Seems like these new Power BI features, especially the solution packs, are aimed at giving partners incentives to sell and customise Power BI (something which they have not had up to now, to be honest).

Oh, and you probably already heard that Azure Machine Learning is now in public preview. Check out the docs and samples here. I wouldn’t be surprised if there was some integration between this and Power BI to come too.

Written by Chris Webb

July 14, 2014 at 11:53 pm

Posted in Power BI

Power Query Book Published!

with 12 comments

Looking for some summer holiday (or winter holiday, depending on which hemisphere you live in) reading? If so, may I suggest my new Power Query book? “Power Query for Power BI and Excel” is available now from the Apress site,, and all good bookstores.

Power Query for Power BI and Excel Cover Image

It’s an introductory level book. It covers all of the stuff you can do in the UI, it has a chapter on M, and it goes into a reasonable amount of detail on more advanced topics; it is not a 500-page exhaustive guide to the product. I’ve focused on readability and teaching the fundamentals of Power Query rather than every looking at every obscure M function, but at the same time if you’ve already used Power Query I think there’ll be plenty of material in there you’ll find interesting.

Now for the bad news: the book is out-of-date already, although not by much. One of the best things about Power Query is the monthly release cycle; unfortunately that makes writing a book on it a bit of a nightmare. I started off writing in January and had to deal with lots of added functionality and changes to the UI over the next few months; I had to retake pretty much all of the screenshots as a result. The published version of the book is based on the version of Power Query that was released in early June rather than the current version. Hopefully you can forgive this – the differences are minor – but it’s a good reason to buy the book as soon as you can! I want to do a second edition in a year’s time once (if?) the release cycle slows down.

I’ve been teased a bit for blogging and teaching so much about Power Query recently, so the final thing I want to say here is why an old corporate BI/SSAS guy like me is getting so excited about a self-service ETL tool. Well, the main reason is that Power Query is a great piece of software. It does what it does very well; it does useful things rather than what the marketing guys/analysts/journalists think is hot in BI; it is easy to use but at the same time is flexible enough for the advanced user to do really complex stuff; it is updated regularly based on feedback from its users. I only wish all Microsoft software was this good… Honestly, I wouldn’t be able to motivate myself to blog and write about Power Query if I didn’t think it was cool, and even though it hasn’t been hyped in the same way as other parts of the Power BI stack it is nonetheless the part that people get excited about when I show them Power BI. It’s not just me either – every day I see positive comments like Greg Low’s here. I think it is as important, if not more important, than Power Pivot and I think it will be a massive success.

Oh, and did I mention that I’m also teaching a Power Query course in London later this year….?

Written by Chris Webb

July 12, 2014 at 3:09 pm

Posted in Books, Power BI, Power Query


Get every new post delivered to your Inbox.

Join 3,202 other followers