Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for the ‘Power Query’ Category

Benford’s Law And Power Query

with 2 comments

Probably my favourite session at SQLBits the other week was Professor Mark Whitehorn on exploiting exotic patterns in data. One of the things he talked about was Benford’s Law, something I first heard about several years ago (in fact I’m sure I wrote a blog post on implementing Benford’s Law in MDX but I can’t find it), about the frequency distribution of digits in data. I won’t try to explain it myself but there are plenty of places you can read up on it, for example: http://en.wikipedia.org/wiki/Benford%27s_law . I promise, it’s a lot more interesting that it sounds!

Anyway, it struck me that it would be quite useful to have a Power Query function that could be used to find the distribution of the first digits in any list of numbers, for example for fraud detection purposes. The first thing I did was write a simple query that returned the expected distributions for the digits 1 to 9 according to Benford’s Law:

let
    //function to find the expected distribution of any given digit
    Benford = (digit as number) as number => Number.Log10(1 + (1/digit)),
    //get a list of values between 1 and 9
    Digits = {1..9},
    // get a list containing these digits and their expected distribution
    DigitsAndDist = List.Transform(Digits, each {_, Benford(_)}),
    //turn that into a table
    Output = #table({"Digit", "Distribution"}, DigitsAndDist)
in
    Output

 

image

Next I wrote the function itself:

//take a single list of numbers as a parameter
(NumbersToCheck as list) as table=>
let
    //remove any non-numeric values
    RemoveNonNumeric = List.Select(NumbersToCheck, 
                        each Value.Is(_, type number)),
    //remove any values that are less than or equal to 0
    GreaterThanZero = List.Select(RemoveNonNumeric, each _>0),
    //turn that list into a table
    ToTable = Table.FromList(GreaterThanZero, 
                        Splitter.SplitByNothing(), null, null, 
                        ExtraValues.Error),
    RenameColumn = Table.RenameColumns(ToTable,{{"Column1", "Number"}}),
    //function to get the first digit of a number
    FirstDigit = (InputNumber as number) as 
                    number => 
                    Number.FromText(Text.Start(Number.ToText(InputNumber),1))-1,
    //get the distributions of each digit
    GetDistributions = Table.Partition(RenameColumn, 
                    "Number", 9, FirstDigit),
    //turn that into a table
    DistributionTable = Table.FromList(GetDistributions, 
                    Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    //add column giving the digit
    AddIndex = Table.AddIndexColumn(DistributionTable, "Digit", 1, 1),
    //show how many times each first digit occurred
    CountOfDigits = Table.AddColumn(AddIndex, 
                    "Count", each Table.RowCount([Column1])),
    RemoveColumn = Table.RemoveColumns(CountOfDigits ,{"Column1"}),
    //merge with table showing expected distributions
    Merge = Table.NestedJoin(RemoveColumn,{"Digit"},
                             Benford,{"Digit"},"NewColumn",JoinKind.Inner),
    ExpandNewColumn = Table.ExpandTableColumn(Merge, "NewColumn", 
                            {"Distribution"}, {"Distribution"}),
    RenamedDistColumn = Table.RenameColumns(ExpandNewColumn,
                            {{"Distribution", "Expected Distribution"}}),
    //calculate actual % distribution of first digits
    SumOfCounts = List.Sum(Table.Column(RenamedDistColumn, "Count")),
    AddActualDistribution = Table.AddColumn(RenamedDistColumn, 
                            "Actual Distribution", each [Count]/SumOfCounts)
in
    AddActualDistribution

There’s not much to say about this code, apart from the fact that it’s a nice practical use case for the Table.Partition() function I blogged about here. It also references the first query shown above, called Benford, so that the expected and actual distributions can be compared.

Since this is a function that takes a list as a parameter, it’s very easy to pass it any column from any other Power Query query that’s in the same worksheet (as I showed here) for analysis. For example, I created a Power Query query on this dataset in the Azure Marketplace showing the number of minutes that each flight in the US was delayed in January 2012. I then invoked the function above, and pointed it at the column containing the delay values like so:

image

The output is a table (to which I added a column chart) which shows that this data follows the expected distribution very closely:

image

You can download my sample workbook containing all the code from here.

Written by Chris Webb

March 23, 2015 at 12:30 pm

Posted in Power Query

What’s New In The Excel 2016 Preview For BI?

with 23 comments

Following on from my recent post on Power BI and Excel 2016 news, here are some more details about the new BI-related features in the Excel 2016 Preview. Remember that more BI-related features may appear before the release of Excel 2016, and that with Office 365 click-to-run significant new features can appear in between releases, so this is not a definitive list of what Excel 2016 will be able to do at RTM but a snapshot of functionality available as of March 2015 as outlined in this document and which I’ve found from my own investigations. When I find out more, or when new functionality appears, I’ll either update this post or write a new one.

Power Query

Yesterday, in the original version of my post, I mistakenly said that Power Query was a native add-in in Excel 2016: that’s not true, it’s not an add-in at all, it’s native Excel functionality. Indeed you can see that there is no separate Power Query tab any more, and instead there is a Power Query section on the Data tab instead:

DataTab

Obviously I’m a massive fan of Power Query so I’m biased, but I think this is a great move because it makes all the great Power Query functionality a lot easier to discover. There’s nothing to enable – it’s there by default – although I am a bit worried that users will be confused by having the older Data tab features next to their Power Query equivalents.

There are no new features for Power Query here compared to the latest version for Excel 2013, but that’s what I expected.

Excel Forecasting Functions

I don’t pretend to know anything about forecasting, but I had a brief play with the new Forecast.ETS function and got some reasonable results out of it as seen in the screenshot below:

image

Slicer Multiselect

There’s a new hammer icon on a slicer, which, when you click it, changes the way selection works. The default behaviour is the same as Excel 2013: every time you click on an item, that item is selected and any previous selection is lost (unless you were holding control or shift to multiselect). However with the hammer icon selected each new click adds the item to the previously selected items. This is meant to make slicers easier to use with a touch-screen.

Slicer

Time Grouping in PivotTables

Quite a neat feature this, I think. If you have a table in the Excel Data Model that has a column of type date in it, you can add extra calculated columns to that table from within a PivotTable to group by things like Year and Month. For example, here’s a PivotTable I built on a table that contains just dates:

Group1

Right-clicking on the field containing the dates and clicking Group brings up the following dialog:

Group2

Choosing Years, Quarters and Months creates three extra fields in the PivotTable:

Group3

And these fields are implemented as calculated columns in the original table in the Excel Data Model, with DAX definitions as seen here:

Group4

Power View on SSAS Multidimensional

At-bloody-last. I haven’t installed SSAS on the VM I’m using for testing Excel 2016, but I assume it just works. Nothing new in Power View yet, by the way.

Power Map data cards

Not sure why this is listed as new in Excel 2016 when it seems to be the same feature that appeared in Excel 2013 Power Map recently:

https://support.office.com/en-za/article/Customize-a-data-card-in-Power-Map-797ab684-82e0-4705-a97f-407e4a576c6e

Power Pivot

There isn’t any obvious new functionality in the Power Pivot window, but it’s clear that the UI in general and the DAX formula editor experience in particular has been improved.

image

Suggested Relationships

When you use fields from two Excel Data Model tables that have no relationship between them in a PivotTable, you get a prompt to either create new relationships yourself or let Excel detect the relationships:

image

Renaming Tables and Fields in the Power Pivot window

In Excel 2013 when you renamed tables or fields in the Excel Data Model, any PivotTables that used those objects had them deleted. Now, in Excel 2016, the PivotTable retains the reference to table or field and just displays the new name. What’s even better is that when you create a measure or a calculated column that refers to a table or column, the DAX definition of the measure or calculated column gets updated after a rename too.

DAX

There are lots of new DAX functions in this build. With the help of the mdschema_functions schema rowset and Power Query I was able to compare the list of DAX functions available in 2016 with those in 2013 and create the following list of new DAX functions and descriptions:

FUNCTION NAME		DESCRIPTION
DATEDIFF			Returns the number of units (unit specified in Interval) 
			between the input two dates
CONCATENATEX		Evaluates expression for each row on the table, then 
			return the concatenation of those values in a single string 
			result, separated by the specified delimiter
KEYWORDMATCH		Returns TRUE if there is a match between the 
			MatchExpression and Text. 
ADDMISSINGITEMS		Add the rows with empty measure values back.
CALENDAR			Returns a table with one column of all dates between 
			StartDate and EndDate 
CALENDARAUTO		Returns a table with one column of dates 
			calculated from the model automatically
CROSSFILTER		Specifies cross filtering direction to be used in 
			the evaluation of a DAX expression. The relationship is 
			defined by naming, as arguments, the two columns that 
			serve as endpoints
CURRENTGROUP		Access to the (sub)table representing current 
			group in GroupBy function. Can be used only inside GroupBy 
			function.
GROUPBY			Creates a summary the input table grouped by the 
			specified columns
IGNORE			Tags a measure expression specified in the call to 
			SUMMARIZECOLUMNS function to be ignored when 
			determining the non-blank rows.
ISONORAFTER		The IsOnOrAfter function is a boolean function that 
			emulates the behavior of Start At clause and returns 
			true for a row that meets all the conditions mentioned as 
			parameters in this function.
NATURALINNERJOIN		Joins the Left table with right table using the 
			Inner Join semantics
NATURALLEFTOUTERJOIN	Joins the Left table with right table 
			using the Left Outer Join semantics
ROLLUPADDISSUBTOTAL		Identifies a subset of columns specified 
			in the call to SUMMARIZECOLUMNS function that should be 
			used to calculate groups of subtotals
ROLLUPISSUBTOTAL		Pairs up the rollup groups with the column 
			added by ROLLUPADDISSUBTOTAL
SELECTCOLUMNS		Returns a table with selected columns from the table 
			and new columns specified by the DAX expressions
SUBSTITUTEWITHINDEX		Returns a table which represents the semijoin of two 
			tables supplied and for which the common set of 
			columns are replaced by a 0-based index column. 
			The index is based on the rows of the second table 
			sorted by specified order expressions.
SUMMARIZECOLUMNS		Create a summary table for the requested 
			totals over set of groups.
GEOMEAN			Returns geometric mean of given column 
			reference.
GEOMEANX			Returns geometric mean of an expression 
			values in a table.
MEDIANX			Returns the 50th percentile of an expression 
			values in a table.
PERCENTILE.EXC		Returns the k-th (exclusive) percentile of 
			values in a column.
PERCENTILE.INC		Returns the k-th (inclusive) percentile of 
			values in a column.
PERCENTILEX.EXC		Returns the k-th (exclusive) percentile of an 
			expression values in a table.
PERCENTILEX.INC		Returns the k-th (inclusive) percentile of an 
			expression values in a table.
PRODUCT			Returns the product of given column reference.
PRODUCTX			Returns the product of an expression 
			values in a table.
XIRR			Returns the internal rate of return for a schedule of 
			cash flows that is not necessarily periodic
XNPV			Returns the net present value for a schedule of cash flows

Plenty of material for future blog posts there, I think – there are lots of functions here that will be very useful. I bet Marco and Alberto are excited…

VBA

It looks like we have support for the Excel Data Model (aka Power Pivot) in VBA at last.

VBAModel

I need to do some research here, but I get the distinct feeling that the only things that are possible through VBA are the things you can do in the Excel ribbon, such as creating connections, tables and relationships. I can’t see any support for creating measures, calculated columns or hierarchies…? I can’t see anything relating to Power Query either. Maybe I’m not looking in the right place; maybe something will come in a later build?

UPDATE: I’m an idiot – there is one minor change to the VBA support for the Excel Data Model, but actually almost everything that I see in 2016 is also present in 2013. Sorry…

Written by Chris Webb

March 17, 2015 at 11:14 pm

Using Excel Slicers To Pass Parameters To Power Query Queries

with 8 comments

Power Query is great for filtering data before it gets loaded into Excel, and when you do that you often need to provide a friendly way for end users to choose what data gets loaded exactly. I showed a number of different techniques for doing this last week at SQLBits but here’s my favourite: using Excel slicers.

Using the Adventure Works DW database in SQL Server as an example, imagine you wanted to load only only rows for a particular date or set of dates from the FactInternetSales table. The first step to doing this is to create a query that gets all of the data from the DimDate table (the date dimension you want to use for the filtering). Here’s the code for that query – there’s nothing interesting happening here, all I’m doing is removing unnecessary columns and renaming those that are left:

let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],
    #"Removed Other Columns" = Table.SelectColumns(dbo_DimDate,
        {"DateKey", "FullDateAlternateKey", "EnglishDayNameOfWeek", 
        "EnglishMonthName", "CalendarYear"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{
        {"FullDateAlternateKey", "Date"}, {"EnglishDayNameOfWeek", "Day"}, 
        {"EnglishMonthName", "Month"}, {"CalendarYear", "Year"}})
in
    #"Renamed Columns"

 

Here’s what the output looks like:

image

Call this query Date and then load it to a table on a worksheet. Once you’ve done that you can create Excel slicers on that table (slicers can be created on tables as well as PivotTables in Excel 2013 but not in Excel 2010) by clicking inside it and then clicking the Slicer button on the Insert tab of the Excel ribbon:

image

Creating three slicers on the Day, Month and Year columns allows you to filter the table like so:

image

The idea here is to use the filtered rows from this table as parameters to control what is loaded from the FactInternetSales table. However, if you try to use Power Query to load data from an Excel table that has any kind of filter applied to it, you’ll find that you get all of the rows from that table. Luckily there is a way to determine whether a row in a table is visible or not and I found it in this article written by Excel MVP Charley Kyd:

http://www.exceluser.com/formulas/visible-column-in-excel-tables.htm

You have to create a new calculated column on the table in the worksheet with the following formula:

=(AGGREGATE(3,5,[@DateKey])>0)+0

image

This calculated column returns 1 on a row when it is visible, 0 when it is hidden by a filter. You can then load the table back into Power Query, and when you do you can then filter the table in your new query so that it only returns the rows where the Visible column contains 1 – that’s to say, the rows that are visible in Excel. Here’s the code for this second query, called SelectedDates:

let
    Source = Excel.CurrentWorkbook(){[Name="Date"]}[Content],
    #"Filtered Rows" = Table.SelectRows(Source, each ([Visible] = 1)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Visible"})
in
    #"Removed Columns"

 

image

This query should not be loaded to the Excel Data Model or to the worksheet.

Next, you must use this table to filter the data from the FactInternetSales table. Here’s the code for a query that does that:

let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_FactInternetSales = Source{[Schema="dbo",Item="FactInternetSales"]}[Data],
    #"Removed Other Columns" = Table.SelectColumns(dbo_FactInternetSales,
        {"ProductKey", "OrderDateKey", "CustomerKey", "SalesOrderNumber", 
        "SalesOrderLineNumber", "SalesAmount", "TaxAmt"}),
    Merge = Table.NestedJoin(#"Removed Other Columns",{"OrderDateKey"},
        SelectedDates,{"DateKey"},"NewColumn",JoinKind.Inner),
    #"Removed Columns" = Table.RemoveColumns(Merge,
        {"ProductKey", "OrderDateKey", "CustomerKey"}),
    #"Expand NewColumn" = Table.ExpandTableColumn(#"Removed Columns", 
        "NewColumn", {"Date"}, {"Date"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Expand NewColumn",
        {"Date", "SalesOrderNumber", "SalesOrderLineNumber", 
        "SalesAmount", "TaxAmt"}),
    #"Renamed Columns" = Table.RenameColumns(#"Reordered Columns",{
        {"SalesOrderNumber", "Sales Order Number"}, 
        {"SalesOrderLineNumber", "Sales Order Line Number"}, 
        {"SalesAmount", "Sales Amount"}, 
        {"TaxAmt", "Tax Amount"}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",
        {{"Date", type date}})
in
    #"Changed Type"

 

Again, most of what this query does is fairly straightforward: removing and renaming columns. The important step where the filtering takes place is called Merge, and here the data from FactInternetSales is joined to the table returned by the SelectedDates query using an inline merge (see here for more details on how to do this):

image

The output of this query is a table containing rows filtered by the dates selected by the user in the slicers, which can then be loaded to a worksheet:

image

The last thing to do is to cut the slicers from the worksheet containing the Date table and paste them onto the worksheet containing the Internet Sales table:

image

You now have a query that displays rows from the FactInternetSales table that are filtered according to the selection made in the slicers. It would be nice if Power Query supported using slicers as a data source direct without using this workaround and you can vote for it to be implemented here.

You can download the sample workbook for this post here.

Written by Chris Webb

March 9, 2015 at 1:30 pm

Posted in Power Query

Handling Added Or Missing Columns In Power Query

with 3 comments

A recent conversation in the comments of this blog post brought up the subject of how to handle columns that have either been removed from or added to a data source in Power Query. Anyone who has worked with csv files knows that they have a nasty habit of changing format even when they aren’t supposed to, and added or removed columns can cause all kinds of problems downstream.

Ken Puls (whose excellent blog you are probably already reading if you’re interested in Power Query) pointed out that it’s very easy to protect yourself  against new columns in your data source. When creating a query, select all the columns that you want and then right-click and select Remove Other Columns:

image

This means that if any new columns are added to your data source in the future, they won’t appear in the output of your query. In the M code the Table.SelectColumns() function is used to do this.

Dealing with missing columns is a little bit more complicated. In order to find out whether a column is missing, first of all you’ll need a list of columns that should be present in your query. You can of course store these tables in a table in Excel and enter the column names manually, or you can do this in M fairly easily by creating a query that connects to your data source and using the Table.ColumnNames() function something like this:

let
    //Connect to CSV file
    Source = Csv.Document(
                      File.Contents(
                       "C:\Users\Chris\Documents\Power Query demos\SampleData.csv"
                      ),null,",",null,1252),
    //Use first row as headers
    FirstRowAsHeader = Table.PromoteHeaders(Source),
    //Get a list of column names
    GetColumns = Table.ColumnNames(FirstRowAsHeader),
    //Turn this list into a table
    MakeATable = Table.FromList(
                              GetColumns, 
                              Splitter.SplitByNothing(), 
                              null, 
                              null, 
                              ExtraValues.Error),
    //Rename this table's sole column
    RenamedColumns = Table.RenameColumns(
                                         MakeATable ,
                                         {{"Column1", "ColumnName"}})
in
    RenamedColumns 

Given a csv file that looks like this:

image

…the query above returns the following table of column names:

image

You can then store the output of this query in an Excel table for future reference – just remember not to refresh the query!

Having done that, you can then look at the columns returned by your data source and compare them with the columns you are expecting by using the techniques shown in this post. For example, here’s a query that reads a list of column names from an Excel table and compares them with the columns returned from a csv file:

let
    //Connect to Excel table containing expected column names
    ExcelSource = Excel.CurrentWorkbook(){[Name="GetColumnNames"]}[Content],
    //Get list of expected columns
    ExpectedColumns = Table.Column(ExcelSource, "ColumnName"),
    //Connect to CSV file
    CSVSource = Csv.Document(
                      File.Contents(
                       "C:\Users\Chris\Documents\Power Query demos\SampleData.csv"
                      ),null,",",null,1252),
    //Use first row as headers
    FirstRowAsHeader = Table.PromoteHeaders(CSVSource),
    //Get a list of column names in csv
    CSVColumns = Table.ColumnNames(FirstRowAsHeader),
    //Find missing columns
    MissingColumns = List.Difference(ExpectedColumns, CSVColumns),
    //Find added columns
    AddedColumns = List.Difference(CSVColumns, ExpectedColumns),
    //Report what has changed
    OutputMissing = if List.Count(MissingColumns)=0 then
                     "No columns missing" else
                     "Missing columns: " & Text.Combine(MissingColumns, ","),
    OutputAdded = if List.Count(AddedColumns)=0 then
                     "No columns added" else
                     "Added columns: " & Text.Combine(AddedColumns, ","),
    Output = OutputMissing & "   " & OutputAdded
in
    Output

Given a csv file that looks like this:

image

…and an Excel table like the one above containing the three column names Month, Product and Sales, the output of this query is:

image

It would be very easy to convert this query to a function that you could use to check the columns expected by multiple queries, and also to adapt the output to your own needs. Also, in certain scenarios (such as when you’re importing data from SQL Server) you might also want to check the data types used by the columns; I’ll leave that for another blog post though. In any case, data types aren’t so much of an issue with CSV files because it’s Power Query that imposes the types on the columns within a query, and any type conversion issues can be dealt with by Power Query’s error handling functionality (see Gerhard Brueckl’s post on this topic, for example).

You can download a workbook containing the two queries from this post here.

Written by Chris Webb

February 26, 2015 at 12:37 pm

Posted in Power Query

Expression.Evaluate() In Power Query/M

with 2 comments

A year ago I wrote a post on loading the M code for a Power Query query from a text file using the Expression.Evaluate() function, but I admit that at the time I didn’t understand how it worked properly. I’ve now had a chance to look at this function in more detail and thought it might be a good idea to post a few more examples of how it works to add to what’s in the Library spec.

The docs are clear about Expression.Evaluate does: it takes some text containing an M expression and evaluates that expression, returning the result. The important thing to remember here is that an M expression can be more than just a single line of code – a Power Query query is in fact a single expression, and that’s why I was able to execute one using Expression.Evaluate() in the blog post referenced above.

Here’s a simple example of Expression.Evaluate():

let
    Source = Expression.Evaluate("""Hello World""")
in
    Source

 

It returns, of course, the text “Hello World”:

image

Here’s another example of an expression that returns the number 10:

let
    Source = Expression.Evaluate("6+4")
in
    Source

 

image

Remember that I said that a Power Query query is itself a single M expression? The way Power Query implements multiple steps in each query is using a let expression, which is a single M expression that contains multiple sub-expressions. Therefore the following  example still shows a single expression (consisting of a let expression) being evaluated to return the value 12:

let
    Source = Expression.Evaluate("let a=3, b=4, c=a*b in c")
in
    Source

 

image

 

OK, so far not very interesting. What we really want to do is evaluate more complex M expressions.

Consider the following query, which uses the Text.Upper() function and returns the text ABC:

let
    Source = Text.Upper("abc")
in
    Source

 

If you run the following query you’ll get the error “The name ‘Text.Upper’ does not exist in the current context.”:

let
    Source = Expression.Evaluate("Text.Upper(""abc"")")
in
    Source

image

To get a full understanding of what’s going on here, I recommend you read section “3.3 Environments and Variables” of the language specification document, which is available here. The short explanation is that all M expressions are evaluated in an ‘environment’, where other variables and functions exist and can be referenced. The reason we’re getting an error in the query above is that it’s trying to execute the expression in an environment all of its own, where the global library functions aren’t available. We can fix this though quite easily by specifying the global environment (where the global library of functions that Text.Upper() is a part of are available) in the second parameter of Expression.Evaluate() using the #shared intrinsic variable, like so:

let
    Source = Expression.Evaluate("Text.Upper(""abc"")", #shared)
in
    Source

image

#shared returns a record containing all of the names of the variables in scope for the global environment and as such it can be used on its own in a query that returns all of the variables (including all of the functions in the global library and from all other queries in the current workbook) available:

let
    Source = #shared
in
    Source

 

Here’s what that query returns on the workbook that I’m using to write this post, which contains various queries apart from the one above:

image

Reza Rad has a blog post devoted to this which is worth checking out.

Using #shared will allow you to evaluated expressions that use global library functions but it’s not a magic wand that makes all errors go away. The following query declares a list and then attempts to use Expression.Evaluate() to get the second item in the list:

let
    MyList = {1,2,3,4,5},
    GetSecondNumber = Expression.Evaluate("MyList{1}", #shared)
in
    GetSecondNumber

image

Despite the use of #shared in the second parameter we still get the context error we saw before because the variable MyList is still not available. What you need to do here is to define a record in the second parameter of Expression.Evaluate() so that the environment that the expression evaluates in knows what the variable MyList refers to:

let
    MyList = {1,2,3,4,5},
    GetSecondNumber = Expression.Evaluate("MyList{1}", [MyList=MyList])
in
    GetSecondNumber

image

This slightly more complex example, which gets the nth item from a list, might make what’s going on here a little clearer:

let
    MyList_Outer = {1,2,3,4,5},
    NumberToGet_Outer = 3,
    GetNthNumber = Expression.Evaluate("MyList_Inner{NumberToGet_Inner}", 
       [MyList_Inner=MyList_Outer, NumberToGet_Inner=NumberToGet_Outer ])
in
    GetNthNumber

 

image

In this example you can see that the two variable names present in the text passed to the first parameter of Expression.Evaluate() are present in the record used in the second parameter, where they are paired up with the two variables from the main query whose values they use.

Finally, how can you pass your own variable names and use functions from the global library? You need to construct a single record containing all the names in #shared plus any others that you need, and you can do that using Record.Combine() to merge a manually created record with the one returned by #shared as shown here:

let
    MyList_Outer = {1,2,3,4,5},
    NumberToGet_Outer = 1,
    RecordOfVariables = 
     [MyList_Inner=MyList_Outer, NumberToGet_Inner=NumberToGet_Outer ],
    RecordOfVariablesAndGlobals = Record.Combine({RecordOfVariables, #shared}),
    GetNthNumber = Expression.Evaluate(
     "List.Reverse(MyList_Inner){NumberToGet_Inner}", 
     RecordOfVariablesAndGlobals )
in
    GetNthNumber

 

image

Written by Chris Webb

February 6, 2015 at 2:14 pm

Posted in Power Query

Rendering Images In An Excel Worksheet With Power Query Using Cells As Pixels

with 4 comments

It’s a general rule of the internet that, whenever you have a cool idea, a few minutes spent on your favourite search engine reveals that someone else has had the idea before you. In my case, when I first saw the functionality in Power Query for working with binary files I wondered whether it was possible to read the contents of a file containing an image and render each pixel as a cell in a worksheet – and of course, it has already been done and done better than I could ever manage. However, it hasn’t been done in Power Query… until now.

First of all, I have to acknowledge the help of Matt Masson whose blog post on working with binary data in Power Query provided a number of useful examples. I also found this article on the bmp file format invaluable.

Second, what I’ve done only works with monochrome bmp files. I could have spent a few more hours coming up with the code to work with other file types but, frankly, I’m too lazy. I have to do real work too, you know.

So let’s see how this works. Here’s a picture of Fountains Abbey that I took on my phone while on holiday last summer:

FountainsAbbey

I opened it in Paint and saved it as a monochrome bmp file:

FountainsAbbey

Here’s the code for the Power Query query that opens the bmp file and renders the contents in Excel:

let
   //The picture to load
   SourceFilePath="C:\Users\Chris\Pictures\FountainsAbbey.bmp",
   //Or get the path from the output of a query called FileName
   //SourceFilePath=FileName,
   //Load the picture
   SourceFile=File.Contents(SourceFilePath),

   //First divide the file contents into two chunks:
   //the header of the file, always 62 bytes
   //and the rest, which contains the pixels

   //Define the format as a record
   OverallFormat=BinaryFormat.Record([
    Header = BinaryFormat.Binary(62),
    Pixels = BinaryFormat.Binary()
    ]),
   //Load the data into that format
   Overall = OverallFormat(SourceFile),
   //Get the header data
   HeaderData = Overall[Header],

   //Extract the total file size and
   //width and height of the image
   HeaderFormat = BinaryFormat.Record([
    Junk1 = BinaryFormat.Binary(2),
    FileSize = BinaryFormat.ByteOrder(
     BinaryFormat.SignedInteger32,
     ByteOrder.LittleEndian),
    Junk2 = BinaryFormat.Binary(12),
    Width = BinaryFormat.ByteOrder(
     BinaryFormat.SignedInteger32,
     ByteOrder.LittleEndian),
    Height = BinaryFormat.ByteOrder(
     BinaryFormat.SignedInteger32,
     ByteOrder.LittleEndian),
    Junk3 = BinaryFormat.Binary()
    ]),
   HeaderValues = HeaderFormat(HeaderData),
   FileSize = HeaderValues[FileSize],
   ImageWidth = HeaderValues[Width],
   ImageHeight = HeaderValues[Height],
   
   //Each pixel is represented as a bit
   //And each line is made up of groups of four bytes
   BytesPerLine = Number.RoundUp(ImageWidth/32)*4,
   //Read the pixel data into a list
   PixelListFormat = BinaryFormat.List(
    BinaryFormat.ByteOrder(
     BinaryFormat.Binary(BytesPerLine),
     ByteOrder.LittleEndian)),
   PixelList = PixelListFormat(Overall[Pixels]),
   //Convert each byte to a number
   PixelListNumbers = List.Transform(PixelList, each Binary.ToList(_)),

   //A function to convert a number into binary
   //and return a list containing the bits
   GetBinaryNumber = (ValueToConvert as number) as list =>
    let
     BitList = List.Generate(
      ()=>[Counter=1, Value=ValueToConvert], 
      each [Counter]<9, 
      each [Counter=[Counter]+1, 
      Value=Number.IntegerDivide([Value],2)], 
      each Number.Mod([Value],2)),
     BitListReversed = List.Reverse(BitList)
    in
     BitListReversed,

   //A function to get all the bits for a single line
   //in the image
   GetAllBitsOnLine = (NumberList as list) => 
    List.FirstN(
     List.Combine(
      List.Transform(NumberList, each GetBinaryNumber(_)
     )
    ), ImageWidth),

   //Reverse the list - the file contains the pixels
   //from the bottom up
   PixelBits = List.Reverse(
    List.Transform(PixelListNumbers, 
    each GetAllBitsOnLine(_))),

   //Output all the pixels in a table
   OutputTable = #table(null, PixelBits)
in
    OutputTable

 

The output of this query is a table containing ones and zeroes and this must be loaded to the worksheet. The final thing to do is to make the table look like a photo by:

  • Hiding the column headers on the table
  • Using the ‘None’ table style so that there is no formatting on the table itself
  • Hiding the values in the table by using the ;;; format (see here for more details)
  • Zooming out as far as you can on the worksheet
  • Resizing the row heights and column widths so the image doesn’t look too squashed
  • Using Excel conditional formatting to make the cells containing 0 black and the cells containing 1 white:image

 

Here’s the photo rendered as cells in the workbook:

image

And here it is again, zoomed in a bit so you can see the individual cells a bit better:

image

You can download the workbook (which I’ve modified so you can enter the filename of your bmp file in a cell in the worksheet, so you don’t have to edit the query – but you will have to turn Fast Combine on as a result) from here. Have fun!

Written by Chris Webb

January 20, 2015 at 5:11 pm

Posted in Excel, Power Query

A Closer Look At Power Query/SSAS Integration

with 3 comments

In the November release of Power Query the most exciting new feature was the ability to connect to SSAS. I blogged about it at the time, but having used it for a month or so now I thought it was worth writing a more technical post showing how it works in more detail (since some things are not immediately obvious) as well as to see what the MDX it generates looks like.

This post was written using Power Query version 2.18.3874.242, released January 2015; some of the bugs and issues mentioned here will probably be fixed in later versions.

Connecting to SSAS

Power Query officially supports connecting to all versions of SSAS from 2008 onwards, although I’ve heard from a lot of people they have had problems getting the connection working. Certainly when I installed the version of Power Query with SSAS support in on my laptop, which has a full install of SQL Server 2014, it insisted I install the 2012 version of ADOMD.Net before it would work (I also needed to reboot). My guess is that if you’re having problems connecting you should try doing that too; ADOMD.Net 2012 is available to download in the SQL Server 2012 Feature Pack.

After clicking From Database/From SQL Server Analysis Services the following dialog appears, asking you to enter the name of the server you want to connect to.

image

If this is the first time you’re connecting to SSAS the following dialog will appear, asking you to confirm that you want to use Windows credentials to connect.

image

Unfortunately, if you’re connecting via http and need to enter a username and password you won’t be able to proceed any further. I expect this problem will be fixed soon.

Initial Selection

Once you’ve connected the Navigator pane appears on the right-hand side of the screen. Here you see all of the databases on the server you’ve connected to; expand a database and you see the cubes, and within each cube you see all of the measure groups, measures, dimensions and hierarchies.

image

The previous build of Power Query does not display any calculated measures that aren’t associated with measure groups (using the associated_measure_group property); this has been fixed in version 2.18.3874.242.

When you start to select measures and hierarchies the name of the cubes you have chosen items from will appear in the Selected items box. If you hover over the name of the cube the peek pane will appear and you’ll see a preview of the results of the query.

image

At this point you can either click the Load button to load the data either to the worksheet or the Excel Data Model, or click the Edit button to edit the query further.

You cannot specify your own MDX query to use for the query as yet.

The Query Editor

Once the Power Query Query Editor opens you’ll see the output of the query as it stands, and also on the Cube tab in the ribbon two new buttons: Add Items and Collapse Columns.

image

Here’s the MDX (captured from Profiler) showing the MDX generated for the query in the screenshot above:

select
{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]}
on 0,
subset(
nonempty(
[Date].[Calendar Year].[Calendar Year].allmembers
,{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]})
,0,50)
properties member_caption,member_unique_name 
on 1 
from [Adventure Works]

 

The MDX Subset() function is used here to ensure that the query doesn’t return more than 50 rows.

Adding Items

Clicking on the Add Items button allows you to add extra hierarchies and measures to the query. When you click the button the following dialog appears where you can choose what you want to add:

image

In this case I’ve added the Day Name hierarchy to the query, and this hierarchy appears as a new column on the right-hand edge of the query after the measures:

image

You can easily drag the column to wherever you want it though.

Here’s the MDX again:

select
{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]}
on 0,
subset(
nonempty(
crossjoin(
[Date].[Calendar Year].[Calendar Year].allmembers,
[Date].[Day Name].[Day Name].allmembers)
,{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]})
,0,50)
properties member_caption,member_unique_name 
on 1 
from [Adventure Works]

 

Collapsing Columns

Selecting the Day Name column and then clicking the Collapse Columns button simply rolls back to the previous state of the query. However, there’s more to this button than meets the eye. If you filter the Day Name column (for example, by selecting Saturday and Sunday as in the screenshot below) and then click Collapse and Remove, the filter will still be applied to the query even though the Day Name column is no longer visible.

image

Here’s what the Query Editor shows after the filter and after the Day Name column has been collapsed:

image

Compare the measure values with those shown in the original query – it’s now showing values only for Saturdays and Sundays, although that’s not really clear from the UI. Here’s the MDX generated to prove it – note the use of the subselect to do the filtering:

select
{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]}
on 0,
subset(
nonempty(
[Date].[Calendar Year].[Calendar Year].allmembers
,{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]})
,0,1000)
properties member_caption,member_unique_name 
on 1 
from(
select
({[Date].[Day Name].&[7],[Date].[Day Name].&[1]})
on 0 
from 
[Adventure Works])

 

From studying the MDX generated I can tell that certain other operations such as sorting and filtering the top n rows are folded back to SSAS.

It’s also important to realise that using the Remove option to remove a column from the query does not have the same effect as collapsing the column:

image

Using Remove just hides the column; the number of rows returned by the query remains the same.

image

User Hierarchies

In the examples above I’ve only used attribute hierarchies. User hierarchies aren’t much different – you can select either an individual level or the entire hierarchy (which is the same as selecting all of the levels of the hierarchy).

image

image

Parent-Child Hierarchies

Parent-child hierarchies work very much like user hierarchies, except that you will see some null values in columns to accommodate leaf members at different levels:

image

M Functions

There are a lot of M functions relating to cube functionality, although the documentation in the Library Specification document is fairly basic and all mention of them disappeared from the online help a month or so ago for some reason. Here’s the code for the query in the Collapsing Columns section above:

let
    Source = AnalysisServices.Databases("localhost"),
    #"Adventure Works DW 2008" = Source{[Name="Adventure Works DW 2008"]}[Data],
    #"Adventure Works1" = #"Adventure Works DW 2008"{[Id="Adventure Works"]}[Data],
    #"Adventure Works2" = #"Adventure Works1"{[Id="Adventure Works"]}[Data],
    #"Added Items" = Cube.Transform(#"Adventure Works2", {
             {Cube.AddAndExpandDimensionColumn, 
             "[Date]", {"[Date].[Calendar Year].[Calendar Year]"}, {"Date.Calendar Year"}}, 
             {Cube.AddMeasureColumn, "Internet Sales Amount", 
             "[Measures].[Internet Sales Amount]"}, 
             {Cube.AddMeasureColumn, "Internet Order Quantity", 
             "[Measures].[Internet Order Quantity]"}}),
    #"Added Items1" = Cube.Transform(#"Added Items", {
              {Cube.AddAndExpandDimensionColumn, "[Date]", 
             {"[Date].[Day Name].[Day Name]"}, {"Date.Day Name"}}}),
    #"Filtered Rows" = Table.SelectRows(#"Added Items1", each (
             Cube.AttributeMemberId([Date.Day Name]) = "[Date].[Day Name].&[7]" 
             meta [DisplayName = "Saturday"] 
             or 
             Cube.AttributeMemberId([Date.Day Name]) = "[Date].[Day Name].&[1]" 
             meta [DisplayName = "Sunday"])),
    #"Collapsed and Removed Columns" = Cube.CollapseAndRemoveColumns(
             #"Filtered Rows", 
             {"Date.Day Name"})
in
    #"Collapsed and Removed Columns"

It’s comprehensible but not exactly simple – yet another example of how difficult it is to shoe-horn multidimensional concepts into a tool that expects to work with relational data (see also SSRS). I doubt I’ll be writing any M code that uses these functions manually.

Written by Chris Webb

January 13, 2015 at 11:21 pm

Follow

Get every new post delivered to your Inbox.

Join 3,865 other followers