Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for the ‘Power Query’ Category

Handling Added Or Missing Columns In Power Query

leave a comment »

A recent conversation in the comments of this blog post brought up the subject of how to handle columns that have either been removed from or added to a data source in Power Query. Anyone who has worked with csv files knows that they have a nasty habit of changing format even when they aren’t supposed to, and added or removed columns can cause all kinds of problems downstream.

Ken Puls (whose excellent blog you are probably already reading if you’re interested in Power Query) pointed out that it’s very easy to protect yourself  against new columns in your data source. When creating a query, select all the columns that you want and then right-click and select Remove Other Columns:

image

This means that if any new columns are added to your data source in the future, they won’t appear in the output of your query. In the M code the Table.SelectColumns() function is used to do this.

Dealing with missing columns is a little bit more complicated. In order to find out whether a column is missing, first of all you’ll need a list of columns that should be present in your query. You can of course store these tables in a table in Excel and enter the column names manually, or you can do this in M fairly easily by creating a query that connects to your data source and using the Table.ColumnNames() function something like this:

let
    //Connect to CSV file
    Source = Csv.Document(
                      File.Contents(
                       "C:\Users\Chris\Documents\Power Query demos\SampleData.csv"
                      ),null,",",null,1252),
    //Use first row as headers
    FirstRowAsHeader = Table.PromoteHeaders(Source),
    //Get a list of column names
    GetColumns = Table.ColumnNames(FirstRowAsHeader),
    //Turn this list into a table
    MakeATable = Table.FromList(
                              GetColumns, 
                              Splitter.SplitByNothing(), 
                              null, 
                              null, 
                              ExtraValues.Error),
    //Rename this table's sole column
    RenamedColumns = Table.RenameColumns(
                                         MakeATable ,
                                         {{"Column1", "ColumnName"}})
in
    RenamedColumns 

Given a csv file that looks like this:

image

…the query above returns the following table of column names:

image

You can then store the output of this query in an Excel table for future reference – just remember not to refresh the query!

Having done that, you can then look at the columns returned by your data source and compare them with the columns you are expecting by using the techniques shown in this post. For example, here’s a query that reads a list of column names from an Excel table and compares them with the columns returned from a csv file:

let
    //Connect to Excel table containing expected column names
    ExcelSource = Excel.CurrentWorkbook(){[Name="GetColumnNames"]}[Content],
    //Get list of expected columns
    ExpectedColumns = Table.Column(ExcelSource, "ColumnName"),
    //Connect to CSV file
    CSVSource = Csv.Document(
                      File.Contents(
                       "C:\Users\Chris\Documents\Power Query demos\SampleData.csv"
                      ),null,",",null,1252),
    //Use first row as headers
    FirstRowAsHeader = Table.PromoteHeaders(CSVSource),
    //Get a list of column names in csv
    CSVColumns = Table.ColumnNames(FirstRowAsHeader),
    //Find missing columns
    MissingColumns = List.Difference(ExpectedColumns, CSVColumns),
    //Find added columns
    AddedColumns = List.Difference(CSVColumns, ExpectedColumns),
    //Report what has changed
    OutputMissing = if List.Count(MissingColumns)=0 then
                     "No columns missing" else
                     "Missing columns: " & Text.Combine(MissingColumns, ","),
    OutputAdded = if List.Count(AddedColumns)=0 then
                     "No columns added" else
                     "Added columns: " & Text.Combine(AddedColumns, ","),
    Output = OutputMissing & "   " & OutputAdded
in
    Output

Given a csv file that looks like this:

image

…and an Excel table like the one above containing the three column names Month, Product and Sales, the output of this query is:

image

It would be very easy to convert this query to a function that you could use to check the columns expected by multiple queries, and also to adapt the output to your own needs. Also, in certain scenarios (such as when you’re importing data from SQL Server) you might also want to check the data types used by the columns; I’ll leave that for another blog post though. In any case, data types aren’t so much of an issue with CSV files because it’s Power Query that imposes the types on the columns within a query, and any type conversion issues can be dealt with by Power Query’s error handling functionality (see Gerhard Brueckl’s post on this topic, for example).

You can download a workbook containing the two queries from this post here.

Written by Chris Webb

February 26, 2015 at 12:37 pm

Posted in Power Query

Expression.Evaluate() In Power Query/M

with one comment

A year ago I wrote a post on loading the M code for a Power Query query from a text file using the Expression.Evaluate() function, but I admit that at the time I didn’t understand how it worked properly. I’ve now had a chance to look at this function in more detail and thought it might be a good idea to post a few more examples of how it works to add to what’s in the Library spec.

The docs are clear about Expression.Evaluate does: it takes some text containing an M expression and evaluates that expression, returning the result. The important thing to remember here is that an M expression can be more than just a single line of code – a Power Query query is in fact a single expression, and that’s why I was able to execute one using Expression.Evaluate() in the blog post referenced above.

Here’s a simple example of Expression.Evaluate():

let
    Source = Expression.Evaluate("""Hello World""")
in
    Source

 

It returns, of course, the text “Hello World”:

image

Here’s another example of an expression that returns the number 10:

let
    Source = Expression.Evaluate("6+4")
in
    Source

 

image

Remember that I said that a Power Query query is itself a single M expression? The way Power Query implements multiple steps in each query is using a let expression, which is a single M expression that contains multiple sub-expressions. Therefore the following  example still shows a single expression (consisting of a let expression) being evaluated to return the value 12:

let
    Source = Expression.Evaluate("let a=3, b=4, c=a*b in c")
in
    Source

 

image

 

OK, so far not very interesting. What we really want to do is evaluate more complex M expressions.

Consider the following query, which uses the Text.Upper() function and returns the text ABC:

let
    Source = Text.Upper("abc")
in
    Source

 

If you run the following query you’ll get the error “The name ‘Text.Upper’ does not exist in the current context.”:

let
    Source = Expression.Evaluate("Text.Upper(""abc"")")
in
    Source

image

To get a full understanding of what’s going on here, I recommend you read section “3.3 Environments and Variables” of the language specification document, which is available here. The short explanation is that all M expressions are evaluated in an ‘environment’, where other variables and functions exist and can be referenced. The reason we’re getting an error in the query above is that it’s trying to execute the expression in an environment all of its own, where the global library functions aren’t available. We can fix this though quite easily by specifying the global environment (where the global library of functions that Text.Upper() is a part of are available) in the second parameter of Expression.Evaluate() using the #shared intrinsic variable, like so:

let
    Source = Expression.Evaluate("Text.Upper(""abc"")", #shared)
in
    Source

image

#shared returns a record containing all of the names of the variables in scope for the global environment and as such it can be used on its own in a query that returns all of the variables (including all of the functions in the global library and from all other queries in the current workbook) available:

let
    Source = #shared
in
    Source

 

Here’s what that query returns on the workbook that I’m using to write this post, which contains various queries apart from the one above:

image

Reza Rad has a blog post devoted to this which is worth checking out.

Using #shared will allow you to evaluated expressions that use global library functions but it’s not a magic wand that makes all errors go away. The following query declares a list and then attempts to use Expression.Evaluate() to get the second item in the list:

let
    MyList = {1,2,3,4,5},
    GetSecondNumber = Expression.Evaluate("MyList{1}", #shared)
in
    GetSecondNumber

image

Despite the use of #shared in the second parameter we still get the context error we saw before because the variable MyList is still not available. What you need to do here is to define a record in the second parameter of Expression.Evaluate() so that the environment that the expression evaluates in knows what the variable MyList refers to:

let
    MyList = {1,2,3,4,5},
    GetSecondNumber = Expression.Evaluate("MyList{1}", [MyList=MyList])
in
    GetSecondNumber

image

This slightly more complex example, which gets the nth item from a list, might make what’s going on here a little clearer:

let
    MyList_Outer = {1,2,3,4,5},
    NumberToGet_Outer = 3,
    GetNthNumber = Expression.Evaluate("MyList_Inner{NumberToGet_Inner}", 
       [MyList_Inner=MyList_Outer, NumberToGet_Inner=NumberToGet_Outer ])
in
    GetNthNumber

 

image

In this example you can see that the two variable names present in the text passed to the first parameter of Expression.Evaluate() are present in the record used in the second parameter, where they are paired up with the two variables from the main query whose values they use.

Finally, how can you pass your own variable names and use functions from the global library? You need to construct a single record containing all the names in #shared plus any others that you need, and you can do that using Record.Combine() to merge a manually created record with the one returned by #shared as shown here:

let
    MyList_Outer = {1,2,3,4,5},
    NumberToGet_Outer = 1,
    RecordOfVariables = 
     [MyList_Inner=MyList_Outer, NumberToGet_Inner=NumberToGet_Outer ],
    RecordOfVariablesAndGlobals = Record.Combine({RecordOfVariables, #shared}),
    GetNthNumber = Expression.Evaluate(
     "List.Reverse(MyList_Inner){NumberToGet_Inner}", 
     RecordOfVariablesAndGlobals )
in
    GetNthNumber

 

image

Written by Chris Webb

February 6, 2015 at 2:14 pm

Posted in Power Query

Rendering Images In An Excel Worksheet With Power Query Using Cells As Pixels

with 4 comments

It’s a general rule of the internet that, whenever you have a cool idea, a few minutes spent on your favourite search engine reveals that someone else has had the idea before you. In my case, when I first saw the functionality in Power Query for working with binary files I wondered whether it was possible to read the contents of a file containing an image and render each pixel as a cell in a worksheet – and of course, it has already been done and done better than I could ever manage. However, it hasn’t been done in Power Query… until now.

First of all, I have to acknowledge the help of Matt Masson whose blog post on working with binary data in Power Query provided a number of useful examples. I also found this article on the bmp file format invaluable.

Second, what I’ve done only works with monochrome bmp files. I could have spent a few more hours coming up with the code to work with other file types but, frankly, I’m too lazy. I have to do real work too, you know.

So let’s see how this works. Here’s a picture of Fountains Abbey that I took on my phone while on holiday last summer:

FountainsAbbey

I opened it in Paint and saved it as a monochrome bmp file:

FountainsAbbey

Here’s the code for the Power Query query that opens the bmp file and renders the contents in Excel:

let
   //The picture to load
   SourceFilePath="C:\Users\Chris\Pictures\FountainsAbbey.bmp",
   //Or get the path from the output of a query called FileName
   //SourceFilePath=FileName,
   //Load the picture
   SourceFile=File.Contents(SourceFilePath),

   //First divide the file contents into two chunks:
   //the header of the file, always 62 bytes
   //and the rest, which contains the pixels

   //Define the format as a record
   OverallFormat=BinaryFormat.Record([
    Header = BinaryFormat.Binary(62),
    Pixels = BinaryFormat.Binary()
    ]),
   //Load the data into that format
   Overall = OverallFormat(SourceFile),
   //Get the header data
   HeaderData = Overall[Header],

   //Extract the total file size and
   //width and height of the image
   HeaderFormat = BinaryFormat.Record([
    Junk1 = BinaryFormat.Binary(2),
    FileSize = BinaryFormat.ByteOrder(
     BinaryFormat.SignedInteger32,
     ByteOrder.LittleEndian),
    Junk2 = BinaryFormat.Binary(12),
    Width = BinaryFormat.ByteOrder(
     BinaryFormat.SignedInteger32,
     ByteOrder.LittleEndian),
    Height = BinaryFormat.ByteOrder(
     BinaryFormat.SignedInteger32,
     ByteOrder.LittleEndian),
    Junk3 = BinaryFormat.Binary()
    ]),
   HeaderValues = HeaderFormat(HeaderData),
   FileSize = HeaderValues[FileSize],
   ImageWidth = HeaderValues[Width],
   ImageHeight = HeaderValues[Height],
   
   //Each pixel is represented as a bit
   //And each line is made up of groups of four bytes
   BytesPerLine = Number.RoundUp(ImageWidth/32)*4,
   //Read the pixel data into a list
   PixelListFormat = BinaryFormat.List(
    BinaryFormat.ByteOrder(
     BinaryFormat.Binary(BytesPerLine),
     ByteOrder.LittleEndian)),
   PixelList = PixelListFormat(Overall[Pixels]),
   //Convert each byte to a number
   PixelListNumbers = List.Transform(PixelList, each Binary.ToList(_)),

   //A function to convert a number into binary
   //and return a list containing the bits
   GetBinaryNumber = (ValueToConvert as number) as list =>
    let
     BitList = List.Generate(
      ()=>[Counter=1, Value=ValueToConvert], 
      each [Counter]<9, 
      each [Counter=[Counter]+1, 
      Value=Number.IntegerDivide([Value],2)], 
      each Number.Mod([Value],2)),
     BitListReversed = List.Reverse(BitList)
    in
     BitListReversed,

   //A function to get all the bits for a single line
   //in the image
   GetAllBitsOnLine = (NumberList as list) => 
    List.FirstN(
     List.Combine(
      List.Transform(NumberList, each GetBinaryNumber(_)
     )
    ), ImageWidth),

   //Reverse the list - the file contains the pixels
   //from the bottom up
   PixelBits = List.Reverse(
    List.Transform(PixelListNumbers, 
    each GetAllBitsOnLine(_))),

   //Output all the pixels in a table
   OutputTable = #table(null, PixelBits)
in
    OutputTable

 

The output of this query is a table containing ones and zeroes and this must be loaded to the worksheet. The final thing to do is to make the table look like a photo by:

  • Hiding the column headers on the table
  • Using the ‘None’ table style so that there is no formatting on the table itself
  • Hiding the values in the table by using the ;;; format (see here for more details)
  • Zooming out as far as you can on the worksheet
  • Resizing the row heights and column widths so the image doesn’t look too squashed
  • Using Excel conditional formatting to make the cells containing 0 black and the cells containing 1 white:image

 

Here’s the photo rendered as cells in the workbook:

image

And here it is again, zoomed in a bit so you can see the individual cells a bit better:

image

You can download the workbook (which I’ve modified so you can enter the filename of your bmp file in a cell in the worksheet, so you don’t have to edit the query – but you will have to turn Fast Combine on as a result) from here. Have fun!

Written by Chris Webb

January 20, 2015 at 5:11 pm

Posted in Excel, Power Query

A Closer Look At Power Query/SSAS Integration

with 3 comments

In the November release of Power Query the most exciting new feature was the ability to connect to SSAS. I blogged about it at the time, but having used it for a month or so now I thought it was worth writing a more technical post showing how it works in more detail (since some things are not immediately obvious) as well as to see what the MDX it generates looks like.

This post was written using Power Query version 2.18.3874.242, released January 2015; some of the bugs and issues mentioned here will probably be fixed in later versions.

Connecting to SSAS

Power Query officially supports connecting to all versions of SSAS from 2008 onwards, although I’ve heard from a lot of people they have had problems getting the connection working. Certainly when I installed the version of Power Query with SSAS support in on my laptop, which has a full install of SQL Server 2014, it insisted I install the 2012 version of ADOMD.Net before it would work (I also needed to reboot). My guess is that if you’re having problems connecting you should try doing that too; ADOMD.Net 2012 is available to download in the SQL Server 2012 Feature Pack.

After clicking From Database/From SQL Server Analysis Services the following dialog appears, asking you to enter the name of the server you want to connect to.

image

If this is the first time you’re connecting to SSAS the following dialog will appear, asking you to confirm that you want to use Windows credentials to connect.

image

Unfortunately, if you’re connecting via http and need to enter a username and password you won’t be able to proceed any further. I expect this problem will be fixed soon.

Initial Selection

Once you’ve connected the Navigator pane appears on the right-hand side of the screen. Here you see all of the databases on the server you’ve connected to; expand a database and you see the cubes, and within each cube you see all of the measure groups, measures, dimensions and hierarchies.

image

The previous build of Power Query does not display any calculated measures that aren’t associated with measure groups (using the associated_measure_group property); this has been fixed in version 2.18.3874.242.

When you start to select measures and hierarchies the name of the cubes you have chosen items from will appear in the Selected items box. If you hover over the name of the cube the peek pane will appear and you’ll see a preview of the results of the query.

image

At this point you can either click the Load button to load the data either to the worksheet or the Excel Data Model, or click the Edit button to edit the query further.

You cannot specify your own MDX query to use for the query as yet.

The Query Editor

Once the Power Query Query Editor opens you’ll see the output of the query as it stands, and also on the Cube tab in the ribbon two new buttons: Add Items and Collapse Columns.

image

Here’s the MDX (captured from Profiler) showing the MDX generated for the query in the screenshot above:

select
{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]}
on 0,
subset(
nonempty(
[Date].[Calendar Year].[Calendar Year].allmembers
,{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]})
,0,50)
properties member_caption,member_unique_name 
on 1 
from [Adventure Works]

 

The MDX Subset() function is used here to ensure that the query doesn’t return more than 50 rows.

Adding Items

Clicking on the Add Items button allows you to add extra hierarchies and measures to the query. When you click the button the following dialog appears where you can choose what you want to add:

image

In this case I’ve added the Day Name hierarchy to the query, and this hierarchy appears as a new column on the right-hand edge of the query after the measures:

image

You can easily drag the column to wherever you want it though.

Here’s the MDX again:

select
{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]}
on 0,
subset(
nonempty(
crossjoin(
[Date].[Calendar Year].[Calendar Year].allmembers,
[Date].[Day Name].[Day Name].allmembers)
,{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]})
,0,50)
properties member_caption,member_unique_name 
on 1 
from [Adventure Works]

 

Collapsing Columns

Selecting the Day Name column and then clicking the Collapse Columns button simply rolls back to the previous state of the query. However, there’s more to this button than meets the eye. If you filter the Day Name column (for example, by selecting Saturday and Sunday as in the screenshot below) and then click Collapse and Remove, the filter will still be applied to the query even though the Day Name column is no longer visible.

image

Here’s what the Query Editor shows after the filter and after the Day Name column has been collapsed:

image

Compare the measure values with those shown in the original query – it’s now showing values only for Saturdays and Sundays, although that’s not really clear from the UI. Here’s the MDX generated to prove it – note the use of the subselect to do the filtering:

select
{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]}
on 0,
subset(
nonempty(
[Date].[Calendar Year].[Calendar Year].allmembers
,{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]})
,0,1000)
properties member_caption,member_unique_name 
on 1 
from(
select
({[Date].[Day Name].&[7],[Date].[Day Name].&[1]})
on 0 
from 
[Adventure Works])

 

From studying the MDX generated I can tell that certain other operations such as sorting and filtering the top n rows are folded back to SSAS.

It’s also important to realise that using the Remove option to remove a column from the query does not have the same effect as collapsing the column:

image

Using Remove just hides the column; the number of rows returned by the query remains the same.

image

User Hierarchies

In the examples above I’ve only used attribute hierarchies. User hierarchies aren’t much different – you can select either an individual level or the entire hierarchy (which is the same as selecting all of the levels of the hierarchy).

image

image

Parent-Child Hierarchies

Parent-child hierarchies work very much like user hierarchies, except that you will see some null values in columns to accommodate leaf members at different levels:

image

M Functions

There are a lot of M functions relating to cube functionality, although the documentation in the Library Specification document is fairly basic and all mention of them disappeared from the online help a month or so ago for some reason. Here’s the code for the query in the Collapsing Columns section above:

let
    Source = AnalysisServices.Databases("localhost"),
    #"Adventure Works DW 2008" = Source{[Name="Adventure Works DW 2008"]}[Data],
    #"Adventure Works1" = #"Adventure Works DW 2008"{[Id="Adventure Works"]}[Data],
    #"Adventure Works2" = #"Adventure Works1"{[Id="Adventure Works"]}[Data],
    #"Added Items" = Cube.Transform(#"Adventure Works2", {
             {Cube.AddAndExpandDimensionColumn, 
             "[Date]", {"[Date].[Calendar Year].[Calendar Year]"}, {"Date.Calendar Year"}}, 
             {Cube.AddMeasureColumn, "Internet Sales Amount", 
             "[Measures].[Internet Sales Amount]"}, 
             {Cube.AddMeasureColumn, "Internet Order Quantity", 
             "[Measures].[Internet Order Quantity]"}}),
    #"Added Items1" = Cube.Transform(#"Added Items", {
              {Cube.AddAndExpandDimensionColumn, "[Date]", 
             {"[Date].[Day Name].[Day Name]"}, {"Date.Day Name"}}}),
    #"Filtered Rows" = Table.SelectRows(#"Added Items1", each (
             Cube.AttributeMemberId([Date.Day Name]) = "[Date].[Day Name].&[7]" 
             meta [DisplayName = "Saturday"] 
             or 
             Cube.AttributeMemberId([Date.Day Name]) = "[Date].[Day Name].&[1]" 
             meta [DisplayName = "Sunday"])),
    #"Collapsed and Removed Columns" = Cube.CollapseAndRemoveColumns(
             #"Filtered Rows", 
             {"Date.Day Name"})
in
    #"Collapsed and Removed Columns"

It’s comprehensible but not exactly simple – yet another example of how difficult it is to shoe-horn multidimensional concepts into a tool that expects to work with relational data (see also SSRS). I doubt I’ll be writing any M code that uses these functions manually.

Written by Chris Webb

January 13, 2015 at 11:21 pm

Multiselect, Filtering And Functions In Power Query

with 4 comments

If you’re a Power Query enthusiast you’re probably already comfortable with creating functions and passing values to them. However in some scenarios you don’t want to pass just a single value to a parameter, you want to pass multiple values – for example if you are filtering a table by multiple criteria. What’s the best way of handling this in Power Query?

Imagine that you wanted to import data from the DimDate table in the SQL Server Adventure Works DW database. It’s a pretty straightforward date dimension table as you can see:

image

Imagine also that you didn’t want to import all the rows from the table but just those for certain days of the week that the user selects (filtering on the EnglishDayNameOfWeek column).

The first problem is, then, how do you allow the user to make this selection in a friendly way? I’ve already blogged about how the function parameter dialog can be made to show ‘allowed’ selections (here and here) but this only allows selection of single values. One solution I’ve used is to create an Excel table – sourced from a Power Query query of course – and then let users select from there.

In this case, the following query can be used to get all the distinct day names:

let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],
    #"Removed Other Columns" = Table.SelectColumns(dbo_DimDate,
																							{"DayNumberOfWeek", "EnglishDayNameOfWeek"}),
    #"Sorted Rows" = Table.Sort(#"Removed Other Columns",
                                    {{"DayNumberOfWeek", Order.Ascending}}),
    #"Removed Duplicates" = Table.Distinct(#"Sorted Rows"),
    #"Removed Columns" = Table.RemoveColumns(#"Removed Duplicates",
																			{"DayNumberOfWeek"}),
    #"Added Custom" = Table.AddColumn(#"Removed Columns", "Selected", each "No")
in
    #"Added Custom"

Nothing much interesting to say about this apart from that it was all created in the UI, it shows the day names in the correct order, and it has an extra column called Selected that always contains the value “No”. The output table in Excel looks like this:

image

The Selected column is going to allow the end user to choose which days of the week they want to filter the main table by. Since “Yes” and “No” are going to be the only valid values in this column you can use Excel’s Data Validation functionality to show a dropdown box in all of the cells in this column that allows the user from selecting one of those two values and nothing else.

image

image

Once the user has selected “Yes” against all of the day names they want to filter by in the Excel table, the next step is to use this table as the source for another Power Query query. To be clear, we’ve used Power Query to load a table containing day names into an Excel table, where the user can select which days they want to filter by, and we then load this data back into Power Query. This second query (called SelectedDays in this example) then just needs to filter the table so it only returns the rows where Selected is “Yes” and then removes the Selected column once it has done that:

let
    Source = Excel.CurrentWorkbook(){[Name="DistinctDates"]}[Content],
    #"Filtered Rows" = Table.SelectRows(Source, each ([Selected] = "Yes")),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Selected"})
in
    #"Removed Columns"

image

This query doesn’t need to be loaded anywhere – but it will be referenced later.

With that done, you need to create a function to filter the DimDate table. Here’s the M code:

(SelectedDays as list) =>
let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],
    #"Filtered Rows" = Table.SelectRows(dbo_DimDate, 
                             each List.Contains(SelectedDays,[EnglishDayNameOfWeek]) ) 
in
    #"Filtered Rows"

The thing to notice here is the condition used in the Table.SelectRows() function, where List.Contains() is used to check whether the day name of the current row is present in the list passed in through the SelectedDays parameter.

The final step is to invoke this function and pass the column from the query containing the selected days to it. There is a bit of UI sugar when you invoke a function with a parameter of type list that I blogged about recently. In this case when you invoke the function you just have to pass the pass it the EnglishDayNameOfWeek column from the SelectedDays query.

image

Here’s what the code for the query that invokes the function looks like:

let
    Source = DimDate(SelectedDays[EnglishDayNameOfWeek])
in
    Source

And of course, when you run the query and output the results to a table, you get the DimDate table filtered by all of the days of the week you have selected:

image

To change the output the user just needs to change the selected days and then refresh this last query.

In case you’re wondering, this query does get folded back to SQL Server too. Here’s the SQL generated by Power Query:

select [$Ordered].[DateKey],
    [$Ordered].[FullDateAlternateKey],
    [$Ordered].[DayNumberOfWeek],
    [$Ordered].[EnglishDayNameOfWeek],
    [$Ordered].[SpanishDayNameOfWeek],
    [$Ordered].[FrenchDayNameOfWeek],
    [$Ordered].[DayNumberOfMonth],
    [$Ordered].[DayNumberOfYear],
    [$Ordered].[WeekNumberOfYear],
    [$Ordered].[EnglishMonthName],
    [$Ordered].[SpanishMonthName],
    [$Ordered].[FrenchMonthName],
    [$Ordered].[MonthNumberOfYear],
    [$Ordered].[CalendarQuarter],
    [$Ordered].[CalendarYear],
    [$Ordered].[CalendarSemester],
    [$Ordered].[FiscalQuarter],
    [$Ordered].[FiscalYear],
    [$Ordered].[FiscalSemester]
from 
(
    select [_].[DateKey],
        [_].[FullDateAlternateKey],
        [_].[DayNumberOfWeek],
        [_].[EnglishDayNameOfWeek],
        [_].[SpanishDayNameOfWeek],
        [_].[FrenchDayNameOfWeek],
        [_].[DayNumberOfMonth],
        [_].[DayNumberOfYear],
        [_].[WeekNumberOfYear],
        [_].[EnglishMonthName],
        [_].[SpanishMonthName],
        [_].[FrenchMonthName],
        [_].[MonthNumberOfYear],
        [_].[CalendarQuarter],
        [_].[CalendarYear],
        [_].[CalendarSemester],
        [_].[FiscalQuarter],
        [_].[FiscalYear],
        [_].[FiscalSemester]
    from [dbo].[DimDate] as [_]
    where [_].[EnglishDayNameOfWeek] in ('Monday', 'Wednesday', 'Friday')
) as [$Ordered]
order by [$Ordered].[DateKey]

Notice that the Where clause contains an IN condition with all of the selected days.

You can download the example workbook for this post here.

Written by Chris Webb

January 8, 2015 at 9:53 am

Posted in Power Query

Viewing Error Messages For All Rows In Power Query

with 6 comments

One of the great features of Power Query is the way you can view any rows that contain error values when you load data. However, even if you can see the rows that have errors you can’t see the error messages easily – without writing a little bit of M code, which I’ll show you in this post.

Imagine you have the following table of data:

image

…and you load it into Power Query using the following query, which sets the data type for the Sales column to be Whole Number:

let
    Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(
                                      Source,
                                     {{"Sales", Int64.Type}, 
                                      {"Product", type text}})
in
    #"Changed Type"

 

As you’d expect, the last two rows contain error values as a result of this type conversion:

image

You can also see the number of rows that contain errors when you load the query:

image

Clicking on the “2 errors” link in the screenshot above creates a new query that only contains the rows with errors:

image

You can click on the Error link in any cell that contains one to see the error message:

image

But what you really want is to see the error message for each row. To do this add a new custom column with the following definition:

try [Sales]

image

You will then see a new column called Custom containing a value of type Record. You can then click the Expand icon in the column header (highlighted) and then OK:

image

You’ll then see another column called Custom.Error with an Expand icon; click on it and then click OK again.

image

image

And at last you’ll have two columns that show the error messages for each row:

image

Written by Chris Webb

December 22, 2014 at 3:00 pm

Posted in Power Query

Reading The Power Query Trace File–With Power Query

with 3 comments

In the September 2014 update of Power Query a new piece of functionality was added: the ability to turn on tracing. There’s no documentation about what the trace files actually contain anywhere (it’s clearly intended to be used only by Microsoft to help them diagnose problems) but I couldn’t help but be curious about what’s in there. And of course, when faced with a strange text file to make sense of, I turned to Power Query!

First of all, to turn on tracing, click on the Options button on the Power Query tab in the Excel ribbon, then check the ‘Enable Power Query tracing’ box:

image

Clicking on the ‘Open traces folder’ link will take you to the directory where the trace files are stored, which in my case is:

C:\Users\Chris\AppData\Local\Microsoft\Power Query\Traces

You can then run some queries and you’ll see trace log files appear in that folder:

image

[This is where the speculation starts]

As far as I can see, every time you run a Power Query query two files are created: one with a filename beginning “excel”, the other with a filename beginning “Microsoft.Mashup.Container”. All of the interesting things I found were in the “Microsoft.Mashup.Container” files so I’ll ignore the second type of file from now on.

The format of these files is pretty clear from this screenshot:

image

Each line in the file starts with “DataMashup.Trace.Information”, then there’s a number (which seems to be the same in all cases) and then there’s a JSON fragment. There are two things to point out before you attack this file in Power Query thought:

  • The obvious way to get rid of everything before the JSON is to split the column twice using a colon as the delimiter. However if you do this using the default UI settings you’ll find that the JSON is mysteriously broken – all the double quotes have disappeared. This is in fact a side-effect of the way the UI uses the Splitter.SplitTextByEachDelimiter() function: it uses a null in the second parameter, which translates to the default value of QuoteStyle.Csv, but to stop the JSON breaking you need to change this to QuoteStyle.None.
  • When you have got rid of everything but the JSON,  you just need to click Parse/JSON and you can explore the data in there:

image

Here’s an example of a query I generated to read a log file. I don’t want you to think this is a query that will read every log file though: the format may vary depending on the query you run or the version of Power Query you have. I also encountered a few bugs and strange error messages in Power Query while experimenting (I think some were caused by trying to read from files while tracing was turned on, or while the files were open in Notepad) so I can’t guarantee you’ll be able to read the files you’re interested in.

let
    Source = Table.FromColumns({
              Lines.FromBinary(
              File.Contents(
              "Insert full path and file name of log file here!")
              ,null,null,1252)}),
    #"Split Column by Delimiter" = Table.SplitColumn(
                                    Source,
                                    "Column1",
                                    Splitter.SplitTextByEachDelimiter(
                                     {":"}, 
                                     QuoteStyle.None, 
                                     false),
                                     {"Column1.1", "Column1.2"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",
                       {{"Column1.1", type text}, {"Column1.2", type text}}),
    #"Split Column by Delimiter1" = Table.SplitColumn(
                                     #"Changed Type",
                                     "Column1.2",
                                     Splitter.SplitTextByEachDelimiter(
                                      {":"}, 
                                      QuoteStyle.None, 
                                      false),
                                      {"Column1.2.1", "Column1.2.2"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter1",
                        {{"Column1.2.1", Int64.Type}, {"Column1.2.2", type text}}),
    #"Parsed JSON" = Table.TransformColumns(#"Changed Type1",
                                     {{"Column1.2.2", Json.Document}}),
    #"Removed Columns" = Table.RemoveColumns(#"Parsed JSON",
                                              {"Column1.1", "Column1.2.1"}),
    #"Expand Column1.2.2" = Table.ExpandRecordColumn(
                                           #"Removed Columns", 
                                           "Column1.2.2", 
                                           {"Start", "Action", "Duration", "Exception", 
                                           "CommandText", "ResponseFieldCount"}, 
                                           {"Start", "Action", "Duration", "Exception", 
                                           "CommandText", "ResponseFieldCount"}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Expand Column1.2.2",
                        {{"Start", type datetime}, {"Duration", type duration}})
in
    #"Changed Type2"

 

At this point there’s plenty of useful information available: a list of events, a start time for each event, a duration for each event, and other columns where error messages, durations, SQL queries and the number of fields returned by each SQL query. All very useful information when you’re trying to work out why your Power Query query is slow or why it’s not working properly. It’s a shame there isn’t any documentation on what’s in the trace file but as I said, it’s not really for our benefit so I can understand why.

image

At this point you’re on your own, I’m afraid. Happy exploring!

Written by Chris Webb

December 11, 2014 at 9:29 pm

Posted in Power Query

Follow

Get every new post delivered to your Inbox.

Join 3,733 other followers