Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for the ‘Data Explorer’ Category

Some Thoughts About Power BI

with 51 comments

By now you’ll probably have seen the Power BI announcement from Microsoft. It’s an important one, and if you haven’t seen it I suggest you take a look at the official announcements and website here:

http://blogs.office.com/b/office-news/archive/2013/07/08/announcing-power-bi-for-office-365.aspx
http://blogs.office.com/b/office365tech/archive/2013/07/07/what-powers-power-bi-in-office-365.aspx
http://blogs.technet.com/b/dataplatforminsider/archive/2013/07/08/introducing-power-bi-for-office-365.aspx?WT.mc_id=Social_TW_OutgoingEvents_20130708_25941_SQLServer
http://office.microsoft.com/en-us/excel/power-bi-FX104080667.aspx
http://blogs.msdn.com/b/powerbi/archive/2013/07/07/getting-started-with-pq-and-pm.aspx

Andrew Brust also has a good summary of the news here:
http://www.zdnet.com/microsoft-announces-power-bi-for-office-365-7000017746/

There’s no point me repeating what’s already been said, so I thought I’d post my initial reaction to it:

  • Proper Mobile BI! HTML 5 Power View! It works on iPads too! Hurray! At-bloody-last! The mobile BI solution will allow you to find, interact with and share “Excel and Power View content”, so I guess that includes Excel worksheets (with PivotTables, slicers etc) as well as Power View reports (a close look at the screenshots in the blog posts above back this up).
  • Making Power BI available only via Office 365 is going to be a very controversial strategy in the MS BI partner community. To be clear: as far as I understand it, Data Explorer (now Power Query), GeoFlow (now Power Map), mobile BI and all of the cool stuff that’s just been announced is only going to be available through Power BI, and therefore Office 365. Unfortunately the biggest companies, with the biggest BI budgets, are often the ones who are slowest to upgrade to the latest versions of Office and a lot of cases this won’t change just because someone wants to see their reports on an iPad. Where IT department inertia, worries about data privacy and company politics mean that Office 365 is not an option, Microsoft will lose out to the pure-play BI vendors who offer standalone solutions.
  • If you look at this from a different point of view, though, some of the things that I (as a BI Pro) feel least comfortable about in Microsoft’s BI strategy are also its greatest strengths. The way I see it, MS is not treating self-service BI as a solution in its own right, but selling self-service BI as a feature of Office. This makes a lot of sense from Microsoft’s point of view – it’s building on the fact that Excel is the tool of choice for data analysis for 99% of users. What I think is going to happen as a result of this is that, rather than partners selling BI as a standalone solution as in the past, we’re going to be talking to people who have already got Office 2013 or Office 365 and are looking to make the most of the BI features that come as part of that. The MS BI partner community is going to have to adjust to this because I doubt MS is going to change this strategy soon.
  • The same can be said of the Office 365-only strategy. If Microsoft is going to be successful with its cloud-first strategy then it’s going to have to prioritise cloud functionality over on-prem functionality. I think MS is doing the right thing with its cloud-first strategy, so therefore, even though I’m going to find it painful when I have to deal with customers that won’t or can’t move to Office 365, I can understand why MS is making Power BI Office 365 only. Beyond the hype (MS says that 1 in 4 of its customers is already on Office 365), it does seem like the uptake of Office 365 is quite strong, especially in SMBs, so hopefully there will be a large potential customer base.
  • I’ve been presenting sessions on the Excel 2013 BI stack at various user groups and conferences over the last few months and it’s gone down very well indeed. A lot of people have come up to me after seeing my session and said that they had been looking at QlikView and Tableau for BI, and would now consider Office 2013 as another alternative. The BI functionality on its own is pretty good, and good enough for a lot of customers even if it isn’t as mature as some of the competing offerings; the fact that the BI functionality comes baked into Office is the killer. While it may be expensive to upgrade to Office 2013/Office 365 this is a cost that many businesses will be considering anyway regardless of their BI requirements; you also have to compare this with the cost of QlikView and Tableau licenses and remember that not every user will need the most expensive SKUs of Office 2013.
  • The ability to refresh data in Excel workbooks deployed to Sharepoint Online, even when the data sources are on-prem, is a key feature and one that I’ve been waiting for. I wonder what the performance will be like?
  • For anyone of a certain age, the first reaction to the news of the natural language querying capability is two words: English Query. I haven’t played with it so I can’t pass judgement, but it’s going to have to be pretty impressive if anyone is going to use it for more than just sales demos. We shall see…
  • I am quite curious about the enterprise data search capabilities. Leaving aside the ability to query them in natural language, the ability to search for data across the enterprise will be useful. I think this is what happened to Project Barcelona.
  • Similarly, it seems as though this search capability is going to be significantly expanded on the public internet. At the moment, in Data Explorer Power Query we’ve seen the ability to query Wikipedia for data. Being able to query many other public data in the same way will be very powerful. There are a number of sites like Quandl that already make public datasets very easy to find and download, and the new search and query capabilities could leapfrog them.
  • No announcement on pricing has been made as yet. Please, please be ridiculously cheap!
  • We don’t have a date for the preview yet, but if you sign up here you’ll be notified when it’s available. It’s meant to be coming “later this summer”.

UPDATE: Some more things to add…

  • Something I didn’t pick up on at the time, but which emerged on Twitter later, is that PowerPivot has had a name change: it’s now Power Pivot with a space, to make it consistent with PowerV View and so on. This might seem minor, but for those of us who write books and have to sweat these details, it’s quite important!
  • I sense I’ve hurt a few feelings at MS with my comments on the natural language query. Let me be clear about my position here: I’ve not played with it, so I can’t pass judgement yet. I can imagine that natural language search for data will work well, but I will be very, very impressed if natural language query works well enough to be used on real data by real users. Real data is dirty and complex and user expectations will be very high and easily dashed. My guess is that the main issue will be that users can’t distinguish between what is a query and what is a calculation, and while the product can probably do queries well (eg “Show me the sales of widgets this year”) it may struggle with calculations (eg “Show me the customer churn by month this year”). But as I said, we shall see.

Finally, and as always, I reread this post this morning and worried that I sound too negative when in fact I’m very positive overall. To summarize:

  • The fact this is is all in Office and specifically Excel = the killer feature.
  • Mobile BI = good, even if it’s very late.
  • Power Query = very, very, very good indeed. I love it already and I think it’s going to be as big, if not bigger than Power Pivot. Power BI is worth buying for this alone.
  • Office 365 requirement = a problem for some customers, maybe, but understandable from Microsoft’s point of view.
  • Cloud requirement = again, a problem for some, but understandable and a big advantage for SMEs who can’t afford the cost of hardware and time to configure Sharepoint on-prem. The ability to refresh in the cloud from on-prem data sources is the key feature here.
  • Power Map = OK. Useful for customers who need geographic analysis, but it’s main use is that it’s great for demos (and this should not be underestimated – all products need some wow).
  • Power View goes HTML5 = relief. The Silverlight dependency undermined its credibility no end.
  • Natural language search for data sources = potentially very useful.
  • Natural language query of those data sources = see above. I remain to be convinced.
  • Data stewardship features = I haven’t seen enough of these to be able to comment.

More blog posts worth reading on this subject:
http://www.jenunderwood.com/blog.htm#O365PowerBI
http://www.jenstirrup.com/2013/07/power-business-intelligence-for.html

UPDATE #2: even more things to add…!

You can see the video of Amir’s demo from WPC here: http://www.youtube.com/watch?v=Jsa-5LGx_IY&feature=youtu.be

Some more details (which should be accurate) from a friend of mine at the conference:

  • The Power View standalone app is still Silverlight based, it’s only the mobile app that uses HTML5
  • Power Query queries can be shared between users via BI Sites, but the execution of these queries always takes place in desktop Excel. Queries will appear in searches when they’re published to a BI Site.
  • Excel workbooks connected to on-prem SSAS (Tabular and Multidimensional) will also be refreshable from a BI Site
  • Natural Language queries (what I think is being called “Q&A”) will also work against SSAS Tabular models. The Categorization property that’s in the Advanced tab in the PowerPivot window, and also in SSDT, is partly used to help Q&A do this.
  • External users can be given access to reports in BI Sites.
  • There will be a “Data Steward Portal” which will help you monitor who is doing what on your BI Site.
  • No comment on when this will arrive in Sharepoint on-prem. My feeling is that MS have no plans to do this at all, or maybe will only do it if a lot of people complain…?

Written by Chris Webb

July 8, 2013 at 5:10 pm

Flattening A Parent/Child Relationship In Data Explorer (Power Query)

with 4 comments

NOTE: This post was written before Data Explorer was renamed as Power Query. All of the content is still relevant to Power Query.

I was teaching my SSAS cube design and performance tuning course this week (which I’ll be teaching in Sydney and Melbourne next month, along with some MDX – places still available!) and demonstrating BIDS Helper’s excellent functionality for flattening parent/child relationships, and it got me thinking – can I do the same thing in Data Explorer? Not that I need to do this in Data Explorer, you know, but it’s the kind of challenge I like to set myself. Of course you can do it, and quite elegantly, and since I learned yet more interesting stuff about Data Explorer and M while I was cracking this problem I thought I’d blog about it.

Here’s what I want to do. Consider the parent/child hierarchy in the DimEmployees table in the Adventure Works DW database:

image

Each row represents an employee, EmployeeKey is the primary key and ParentEmployeeKey is the key of the employee’s boss. Therefore, by joining the table to itself, we can recreate the org chart of the Adventure Works company (ie who reports to who). The problem though is that we need to join the table to itself multiple times to do this, and the number of times we need to do the join depends on the data itself. If you flatten a parent/child hierarchy by doing this, the end result should have a series of columns representing each level in the hierarchy, and look something like this:

image

This problem can be solved in SQL reasonably easily, even if the SQL you end up writing might look a little scary (see the views that BIDS Helper generates for an example of this). What about Data Explorer?

At the heart of my approach was a recursive function. I’ve blogged about creating functions in Data Explorer already, so you might want to read that post for some background. Here’s my function declaration:

let

    Source = (FromTable, KeyColumn, ParentKeyColumn, ToTable, optional Depth) =>

let

    GetDepth = if (Depth=null) then 1 else Depth,

    GetKeyColumn = if (Depth=null) then KeyColumn

        else Number.ToText(GetDepth-1) & "." & KeyColumn,

    GetParentKeyColumn = Number.ToText(GetDepth) & "." & ParentKeyColumn,

    JoinTables = Table.Join(FromTable,{GetKeyColumn},

        Table.PrefixColumns(ToTable , Number.ToText(GetDepth)),

            {GetParentKeyColumn}, JoinKind.LeftOuter),

    FinalResult = if

        List.MatchesAll(Table.Column(JoinTables, GetParentKeyColumn), each _=null)

        then FromTable

        else RecursiveJoin(JoinTables, KeyColumn, ParentKeyColumn, ToTable, GetDepth+1)

in

    FinalResult

in

    Source

A few interesting things to point out:

  • I’ve used a LET statement inside my function declaration, so I can have multiple statements inside it
  • I’ve used Table.Join to do the left outer join between the two tables I’m expecting
  • The parameters I’m using are:
    • FromTable – the table on the left hand side of the join. When the function is first called, this should be a table that contains the Employees who have no parents (ie where ParentEmployeeKey is null); when the function calls itself, this will be the result of the join.
    • ToTable – the table on the right hand side of the join. This is always a table that contains the Employees who do have parents.
    • KeyColumn – the name of the Employee’s key column
    • ParentKeyColumn – the name of the Employee’s parent key column
  • I’ve used Table.PrefixColumn to rename all the columns in the table on the right hand side of the join, prefixing them with the depth of the call stack, so I get distinct column names.
  • The function calls itself until it finds it has done a join where the last ParentKeyColumn contains only null values. I’ve used List.MatchesAll to check this.

Here’s the call to this function – you only need to include one step in the Data Explorer query to do this – to return the flattened structure:

= RecursiveJoin(

    Table.SelectRows(Employees, each [ParentEmployeeKey]=null),

    "EmployeeKey",

    "ParentEmployeeKey",

    Table.SelectRows(Employees, each [ParentEmployeeKey]<>null)

    )

And here’s the output:

image

In this case the output isn’t exactly the same as what BIDS Helper might produce, because BIDS Helper has some special requirements for SSAS user hierarchies. Also, since I’m still learning Data Explorer and M, I’m not sure my code in the most efficient, elegant way. But I still think it’s an interesting example and I hope it’s useful to other Data Explorer enthusiasts out there – we’re a small but growing band!

You can download my demo workbook here.

Written by Chris Webb

June 22, 2013 at 3:13 pm

New Treemap, Histogram and Streamgraph Apps for Excel 2013

with 5 comments

I blogged about the new app model for Office 2013 and what it means for BI last year, but since then there hasn’t exactly been a massive flood of new data visualisation apps. However… yesterday, I was interested to see that some new apps had been published by the Visualization and Interaction for Business and Entertainment team at Microsoft Research. You can read all the details in this blog post:
http://blogs.technet.com/b/inside_microsoft_research/archive/2013/05/23/new-ways-to-visualize-your-data.aspx

The new apps (which are all free) are:

To test the Treemap out, I used Data Explorer to get the overall size on disk of the contents of the folders I use to store my presentation materials; I won’t go into detail about how I did it, but Erik Svenson has a great post on how to do this here. I ended up with a the following treemap:

image

It’s worth pointing out one cool thing about these apps: they still work when your worksheet is deployed to Sharepoint and viewed in a browser with the Excel Web App, even in Office 365!

Written by Chris Webb

May 24, 2013 at 3:14 pm

Posted in Data Explorer, Excel

Accumulating Data In An Excel Table Using Data Explorer (Power Query) and PowerPivot

with 5 comments

NOTE: This post was written before Data Explorer was renamed as Power Query. All of the content is still relevant to Power Query.

One of the first questions I get asked after showing someone PowerPivot for the first time is “Can I add new data to a PowerPivot table that already has data in it?”. Out of the box, of course, the answer is no: when you process a table in PowerPivot you have to reload all the data from your data source, you can’t just append new data (unless you’re using copy/paste to load data, which isn’t a good idea). However, there are a lot of self-service BI scenarios where the ability to do this would be extremely useful: for example, you might want to scrape stock quotes from a web page every day and then, in an Excel workbook, accumulate that data in a table so you can analyse historical stock prices with PowerPivot. I ran into a scenario very much like this last week and I thought that Data Explorer should be able to help here. It can, but it’s not obvious how to do it – hence this blog post!

Here’s a super-simple example of how to accumulate data in a table then. Let’s start with a csv file that contains the following data:

Product,Sales
Apples,1
Oranges,2

It’s straightforward to import this data into Excel using Data Explorer and the ‘From csv’ data source:

image

 

Here’s the code that Data Explorer generates:

let

    Source = Csv.Document(File.Contents("C:\InputData.csv")),

    FirstRowAsHeader = Table.PromoteHeaders(Source),

    ChangedType = Table.TransformColumnTypes(FirstRowAsHeader,

                              {{"Product", type text}, {"Sales", type number}})

in

    ChangedType

 

Now, let’s imagine that you want to keep the data from this file in Excel and every time you click Refresh in Data Explorer you add the data from the file onto the end of the existing data you’ve already captured. The first thing you’ll probably want to do in this scenario is add a new column to the data that gives the date and time that the data was loaded, and you can do that quite easily in Data Explorer using the DateTimeZone.UtcNow() function as follows:

Table.AddColumn(ChangedType, “Load Date”, each DateTimeZone.UtcNow())

Data Explorer has functionality to append the data from one query onto the end of another query, but the problem you have to solve now is that when you click Refresh you want the new data to be appended onto the end of the data that has already been collected. It’s a recursive scenario not unlike the one I grappled with here. The solution to this problem is to first of all load the data into the PowerPivot (ie what we should be calling the Excel Data Model now) by clicking on the Load To Data Model link in the Data Explorer query pane:

image

Then, on a new sheet, create an Excel query table that returns all the data from the PowerPivot table that you’ve just loaded data into. Kasper shows how to do this here; there’s no need for any special DAX, you just need to connect to the PowerPivot table in the Existing Connections dialog:

image

image

At this point you should have two tables on two sheets that contain the same data. The next step is to modify the original Data Explorer query so that it contains a new step that appends data from the table you’ve just created (ie the table getting the data from PowerPivot) onto the data from the csv file. This can be done with three new steps, first to get the data from the new Excel table:

Excel.CurrentWorkbook(){[Name="ExistingData"]}[Content]

Then to make sure the Load Date is treated as a DateTimeZone type:

Table.TransformColumnTypes(GetExistingData,{{“Load Date”, type datetimezone}})

Then finally to combine the two tables:

Table.Combine({ChangedType1,InsertedCustom})

Now, whenever you Refresh your Data Explorer query, you will see the data from the csv file appended to the data that has already been loaded:

image

image

Here’s the complete code:

let

    Source = Csv.Document(File.Contents("C:\InputData.csv")),

    FirstRowAsHeader = Table.PromoteHeaders(Source),

    ChangedType = Table.TransformColumnTypes(FirstRowAsHeader,

                  {{"Product", type text}, {"Sales", type number}}),

    InsertedCustom = Table.AddColumn(ChangedType, "Load Date", each DateTimeZone.UtcNow()),

    Custom1 = Excel.CurrentWorkbook(){[Name="Table_Input_Data"]}[Content],

    ChangedType1 = Table.TransformColumnTypes(Custom1,{{"Load Date", type datetimezone}}),

    Custom2 = Table.Combine({ChangedType1,InsertedCustom})

in

    Custom2

Now as I said, this is just a super-simple example and in the real world you’d need extra functionality to do things like delete rows you’ve already loaded and so on; but that’s all doable I think. It’s also worth mentioning that I encountered some strange errors and behaviour when implementing this, partly due to Data Explorer still being in preview I guess, so if you want to recreate this query you’ll need to follow my instructions exactly.

You can download the sample workbook here, and the csv file here.

Written by Chris Webb

May 13, 2013 at 12:40 pm

A Partly Successful Attempt To Create Life With Data Explorer (Power Query)

with 5 comments

NOTE: This post was written before Data Explorer was renamed as Power Query. All of the content is still relevant to Power Query.

I’ll apologise for the title right away: this post isn’t about a Frankenstein-like attempt at creating a living being in Excel, I’m afraid. Instead, it’s about my attempt to implement Jon Conway’s famous game ‘Life’ using Data Explorer, how it didn’t fully succeed and some of the interesting things I learned along the way…

When I’m learning a new technology I like to set myself mini-projects that are more fun than practically useful, and for some reason a few weeks ago I remembered ‘Life’ (which I’m sure almost anyone who has learned programming has had to write a version of at some stage), so I began to wonder if I could write a version of it in Data Explorer. This wasn’t because I thought Data Explorer was an appropriate tool to do this – there are certainly better ways to implement Life in Excel – but I thought doing this would help me in my attempts to learn Data Explorer’s formula language and might also result in an interesting blog post.

Here’s the code I came up with eventually:

let

    Source = Excel.CurrentWorkbook(){[Name="Input"]}[Content],

    GetAction = Excel.CurrentWorkbook(){[Name="Source"]}[Content]{0}[Source],

    GetLastOutput = try Excel.CurrentWorkbook(){[Name="Output"]}[Content],

    ChooseInput = if GetAction="Input" or GetLastOutput[HasError]

        then Source else GetLastOutput[Value],

    Unpivot = Table.Unpivot(ChooseInput,{"1", "2", "3", "4", "5", "6"

        , "7", "8", "9", "10"},"Column","Value"),

    InsertedIndex = Table.AddIndexColumn(Unpivot,"Index"),

    PivotedRows = Table.RowCount(InsertedIndex),

    OriginalColumnCount = Table.ColumnCount(ChooseInput),

    GetValue = (i) =>; if i<0 or i>=PivotedRows then 0 

        else InsertedIndex[Value]{i},

    GetNeighbours = (i) =>; GetValue(i-1) + GetValue(i+1) 

        + GetValue(i-OriginalColumnCount) + GetValue(i+OriginalColumnCount)

        + GetValue(i-OriginalColumnCount-1) + GetValue(i-OriginalColumnCount+1)

        + GetValue(i+OriginalColumnCount-1) + GetValue(i+OriginalColumnCount+1),

    NeighbourCount = Table.AddColumn(InsertedIndex, "Neighbours",

        each GetNeighbours([Index])),

    Custom1 = Table.AddColumn(NeighbourCount , "NewValue",

        each if [Value]=1 and ([Neighbours]=2 or [Neighbours]=3)

        then 1 else if [Value]=0 and [Neighbours]=3 then 1 else 0),

    HiddenColumns = Table.RemoveColumns(Custom1,{"Value", "Index", "Neighbours"}),

    Custom2 = Table.Group(HiddenColumns, {"Column"},

        {{"Count", each Table.Transpose(Table.RemoveColumns(_,"Column")), type table}}),

    #"Expand Count" = Table.ExpandTableColumn(Custom2, "Count", {"Column1", "Column2"

        , "Column3", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9"

        , "Column10"}, {"1", "2", "3", "4", "5", "6", "7", "8", "9","10"}),

    HiddenColumns1 = Table.RemoveColumns(#"Expand Count",{"Column"})

in

    HiddenColumns1

You can download the associated Excel 2013 workbook here.

Some explanation of the code:

  • The first problem that I faced was that in Life you need to start with a grid of cells, apply the rules of the game, and then output the results to the same grid. This is a problem for Data Explorer because, unless you have already run a query, the output table doesn’t exist, and even if you have you can’t update the values in that table without them being overwritten. So I decided to create a table where you can enter the initial state of your grid as a series of 1s and 0s. It’s on the Input sheet and its called Input:

    image

  • Whether the Input table is used as the starting point for the query depends on the contents of yet another table, called Source, on the sheet called Output in the workbook. It can hold one of two text values: “Input” or “Output” (I used Excel’s Data Validation functionality to lock this down), and the ChooseInput step then determines whether the Input table or the table called Output is used as the input for the Data Explorer query. Here’s what the Output worksheet looks like:image

    The workflow is therefore as follows: enter the starting point on the Input table, make sure the Source table shows “Input”, refresh the query, change the Source table to “Output” and then refresh the query to show each subsequent iteration.

  • The Output table simply displays the same values as the table that shows the results of the Data Explorer query. I created a separate table for two reasons: one, I wanted to use custom formatting to show the results; and two, to try to work around the big problem that I eventually found I couldn’t work around completely, which is that Data Explorer doesn’t actually support this type of recursive query (ie scenarios where the output of a query is also used as the input). This thread on the Data Explorer MSDN Forum has some details on the problem. I found I got errors after just about every other step when using the Data Explorer-created table as the input, whereas the errors were much less frequent if I duplicated the values in a separate table. It now works most of the time, but it still errors far too much for my liking. Hohum.
  • I did also use Data Explorer’s error-handling functionality, using the try statement in the GetLastOutput step, to solve this problem but it made no difference. It was good to find out how a try statement works: basically when you use it, the return value is a record containing two values, one which indicates whether an error occurred in the try, and the other the value that was returned if there was no error. You can see me checking these values in the ChooseInput step, with the calls to GetLastOutput[HasError] (which returns true if there was an error in the try) and GetLastValue[Value] (which returns the value tested in the try if there was no error).
  • Once the query worked out which input to use, the next step is to calculate the values in the grid for the next iteration. I decided that the easiest way to do this was to unpivot the resultset using the the new UnPivot() function; with all the data in a single column it was then relatively easy to declare the functions GetValue() and GetNeighbours() to apply the rules of Life, output the results in a new column in the Custom1 step.
  • The final problem to solve was that I needed to re-pivot the data to get it back into original table format. Although Data Explorer has an UnPivot() function it doesn’t have a Pivot() function; luckily, Peter Qian showed me how to do this on the forum in this thread using Table.Group() and my code is in the Custom2 step.

So despite the errors (and Data Explorer is still in beta, so some errors are only to be expected), many lessons learned. I hope you find this useful too.

Written by Chris Webb

April 27, 2013 at 9:44 pm

Applying a Function to Every Cell in a Table in Data Explorer (Power Query)

with 2 comments

NOTE: This post was written before Data Explorer was renamed as Power Query. All of the content is still relevant to Power Query.

Now that the PASS Business Analytics Conference is over, I can get back to playing around with Data Explorer and blogging about it. I’ve been working on a fun demo that I’ll try to blog about later this week, but in the course of creating this demo I came across a technique that I didn’t end up using but which I thought deserved a post on its own: how to apply a function to every cell in a table, rather than just every cell in a column.

For example, let’s imagine that you have a table that looks like this:

image

…and you want to add one to every single cell in the table, so you get:

image

It’s possible in the UI by creating lots of custom columns and then deleting the original columns, for sure, but I found a more elegant solution. Here’s the full DE code:

let

    Source = Excel.CurrentWorkbook(){[Name="Input"]}[Content],

    FunctionToApply = (x) =>; x + 1,

    GetColumnNames = Table.ColumnNames(Source),

    TransformList = List.Transform(GetColumnNames, each {_ , FunctionToApply}),

    Output = Table.TransformColumns(Source, TransformList)

in

    Output

Here’s what each step does:

  • Source: gets the data from the Excel table named Inputimage
  • FunctionToApply: defines a new function that takes a single parameter, x, and returns the value x+1. See this post for more details on using function in Data Explorer.image
  • GetColumnNames: returns a list object which contains the names of all of the columns in the table returned in the Source step.image
  • TransformList: this is the interesting step! It creates a new list based on GetColumnNames, but whereas GetColumnNames contains just one record per item in the list (the column name), this returns a list of lists, each with two items: the column name and a reference to the function FunctionToApply. It looks like this in the UI:imageIf you click on the first of the list links shown (and this isn’t part of the process, I’m only doing this to show what’s there) you see the following:

    image

  • Output: the list of lists created in the previous step can now be passed to the Table.TransformColumns() function to apply the function FunctionToApply() to every cell in every column in the table.image

I’m sure this is going to be useful to me at some point in the future… You can download the sample workbook here.

Written by Chris Webb

April 16, 2013 at 1:58 pm

Creating And Using Functions in Data Explorer (Power Query)

with 14 comments

NOTE: This post was written before Data Explorer was renamed as Power Query. All of the content is still relevant to Power Query.

I’ve been reading through the Data Explorer Formula Language and Library specifications (which can be downloaded here) over the last few weeks, learning about what’s possible with it when you go beyond the UI. This will translate into a lot of Data Explorer blog posts in the coming weeks…! Here’s the first of those posts.

Today I’m going to look at how to create and use functions. Page 86 onwards in the Formula Language specification has a lot of detail about how to define a function in the M language that Data Explorer uses, but it doesn’t include any practical examples of how functions can be created and used in Data Explorer and Excel unfortunately. I’ve done some experimentation, though, and this post aims to fill that gap.

First of all, take a look at this Excel worksheet with two tables in it, called InputNumbers1 and InputNumbers2:

image

Let’s say that you want to create two Data Explorer queries that each use one of these tables as a source, and in both cases you want to add a third column that displays the product of the two existing columns. So, for example, for the first table you want a new column with the values 2, 12, 30 and 56 in.

Now this is a very simple calculation and certainly one that Data Explorer can handle easily, but let’s say that you want to perform this operation many times in many different queries and for obvious reasons you don’t want to have to duplicate the same logic in each query. You want to create a function!

To do this, you need to use the Write Query option from the From Other Sources button on the Data Explorer tab in the ribbon:

image

Then change the name of the new query to MultiplicationFunction by double-clicking on the query name in large letters at the top of the Edit Query dialog:

image

Then enter the following code as the only step:

= (x,y) => x * y

This defines a query with two parameters, x and y, and returns the product of x and y. Click Done to save the query. The query will show the following result in the worksheet:

image

You can ignore this new table, and you can disable the data load onto the worksheet, but it seems like you always need to have the table there and you certainly can’t delete it.

To use this new function create a new query from the table InputNumbers1 (shown above) using the From Table button. Then in the Edit Query dialog create a new step by selecting Insert Column/Custom:

image

And then use the new function to calculate each row in the new column, passing the two existing columns as parameters:

image

This results in the following step:

= Table.AddColumn(Source, “Custom”, each MultiplicationFunction([FirstNumber],[SecondNumber]))

image

And so you have your new column showing the product of the [FirstNumber] and [SecondNumber] columns. Then do the same thing for the InputNumbers2 table and you’ll have two Data Explorer queries now with custom columns in:

image

image

Finally, to make sure that the function is indeed being used, go and alter the definition of the function to be

= (x,y) => x / y

Refresh both of the other queries, and you’ll see the contents of the custom column have changed for both of them:

image

image

I can imagine that the ability to break functionality out into functions will reduce the complexity, and improvement the maintainability, of many ETL solutions that get built with Data Explorer in the future.

You can download my sample Excel workbook with this demo in here.

Written by Chris Webb

March 25, 2013 at 5:22 pm

Follow

Get every new post delivered to your Inbox.

Join 3,144 other followers