Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Building A Reporting Solution Using Power Query

with one comment

The video of my SQLBits conference session “Building a reporting solution using Power Query” is now available to view (for free) on the SQLBits website:

http://sqlbits.com/Sessions/Event14/Building_A_Reporting_Solution_Using_Power_Query

It’s not your normal Power Query session about self-service ETL – instead it’s about using Power Query to create a SSRS-like reporting solution inside Excel. This is a topic I’ve been thinking about for a while, and while I have blogged about some of the tricks I show in the session (like this one about using slicers to pass parameters to Power Query) there’s a lot of new material in there too that should interest all you Power Query fans.

Of course there are literally hundreds of other great videos to watch for free at http://sqlbits.com/content/ including many others on Power BI, Power Pivot and Power Query. Alas my “Amazing Things You Can Do With Power BI” session video hasn’t been posted yet though…

[Don’t forget I’m running public Power BI and Power Query training courses in London next month! Full details at http://technitrain.com/courses.php]

Written by Chris Webb

April 17, 2015 at 9:25 am

Posted in Power Query

Analysing SSAS Extended Event Data With Power Query: Part 1

with one comment

The other day, while I was reading this post by Melissa Coates, I was reminded of the existence of extended events in SSAS. I say ‘reminded’ because although this is a subject I’ve blogged about before, I have never done anything serious with extended events because you can get the same data from Profiler much more easily, so I had pretty much forgotten about them. But… while Profiler is good, it’s a long way from perfect and there’s a lot of information that you can get from a trace that is still very hard to analyse. I started thinking: what if there was a tool we could use to analyse the data captured by extended events easily? [Lightbulb moment] Of course, Power Query!

I’m not going to go over how to use Extended Events in SSAS because the following blog posts do a great job already:
http://byobi.com/blog/2013/06/extended-events-for-analysis-services/
http://markvsql.com/2014/02/introduction-to-analysis-services-extended-events/
https://francescodechirico.wordpress.com/2012/08/03/identify-storage-engine-and-formula-engine-bottlenecks-with-new-ssas-xevents-5/

You may also want to check out these (old, but still relevant) articles on performance tuning SSAS taken from the book I co-wrote with Marco and Alberto, “Expert Cube Development”:

http://www.packtpub.com/article/query-performance-tuning-microsoft-analysis-services-part1
http://www.packtpub.com/article/query-performance-tuning-microsoft-analysis-services-part2

What I want to concentrate on in this series of posts is how to make sense of this data using Power BI in general and Power Query in particular. The first step is to be able to load data from the .xel file using Power Query, and that’s what this post will cover. In the future I want to explore how to get at and use specific pieces of text data such as that given by the Query Subcube Verbose, Calculation Evaluation and Resource Usage events, and to show how this data can be used to solve difficult performance problems. I’m only going to talk about SSAS Multidimensional, but of course a lot of what I show will be applicable (or easily adapted to) Tabular; I guess you could also do something similar for SQL Server Extended Events too. I’m also going to focus on ad hoc analysis of this data, rather than building a more generic performance monitoring solution; the latter is a perfectly valid thing to want to build, but why build one yourself when companies like SQL Sentry have great tools for this purpose that you can buy off the shelf?

Anyway, let’s get on. Here’s a Power Query function that can be used to get data from one or more .xel files generated by SSAS:

(servername as text, 
initialcatalog as text, 
filename as text) 
as table =>
let
    //Query the xel data
    Source = Sql.Database(servername, 
                          initialcatalog, 
                          [Query="SELECT 
                          object_name, event_data, file_name 
                          FROM sys.fn_xe_file_target_read_file ( '" 
                          & filename & "', null, null, null )"]),
    //Treat the contents of the event_data column
    //as XML
    ParseXML = Table.TransformColumns(Source,
                            {{"event_data", Xml.Tables}}),
    //Expand that column
    Expandevent_data = Table.ExpandTableColumn(ParseXML, 
                            "event_data", 
                            {"Attribute:timestamp", "data"}, 
                            {"event_data.Attribute:timestamp", 
                            "event_data.data"}),
    //A function to tranpose the data held in the
    //eventdata.data column
    GetAttributeData = (AttributeTable as table) as table =>
	let
    	  RemoveTextColumn = Table.RemoveColumns(AttributeTable,
                            {"text"}),
          SetTypes = Table.TransformColumnTypes(RemoveTextColumn ,
                            {{"value", type text}, {"Attribute:name", type text}}),
          TransposeTable = Table.Transpose(SetTypes),
          ReverseRows = Table.ReverseRows(TransposeTable),
          PromoteHeaders = Table.PromoteHeaders(ReverseRows)
	in
          PromoteHeaders,
    //Use the function above
    ParseAttributeData = Table.TransformColumns(Expandevent_data, 
                            {"event_data.data", GetAttributeData})
in
    ParseAttributeData

 

This function can be thought of as the starting point for everything else: it allows you to load the raw data necessary for any SSAS performance tuning work. Its output can then, in turn, be filtered and transformed to solve particular problems.

The function takes three parameters:

  • The name of a SQL Server relational database instance – this is because I’m using sys.fn_exe_file_target_read_file to actually read the data from the .xel file. I guess I could try to parse the binary data in the .xel file, but why make things difficult?
  • The name of a database on that SQL Server instance
  • The file name (including the full path) or pattern for the .xel files

The only other thing to mention here is that the event_data column contains XML data, which of course Power Query can handle quite nicely, but even then the data in the XML needs to be cleaned and transposed before you can get a useful table of data. The GetAttributeData function in the code above does this cleaning and transposing but, when invoked, the function still returns an unexpanded column called event_data.data as seen in the following screenshot:

image

There are two reasons why the function does not expand this column for you:

  1. You probably don’t want to see every column returned by every event
  2. Expanding all the columns in a nested table, when you don’t know what the names of these columns are, is not trivial (although this post shows how to do it)

Here’s an example of how the function can be used:

let
    //Invoke the GetXelData function
    Source = GetXelData(
                        "localhost", 
                        "adventure works dW", 
                        "C:\SSAS_Monitoring*.xel"),
    //Only return Query End events
    #"Filtered Rows" = Table.SelectRows(Source, 
                        each ([object_name] = "QueryEnd")),
    //Expand Duration and TextData columns
    #"Expand event_data.data" = Table.ExpandTableColumn(
                        #"Filtered Rows", "event_data.data", 
                        {"Duration", "TextData"}, 
                        {"event_data.data.Duration", 
                        "event_data.data.TextData"}),
    //Set some data types
    #"Changed Type" = Table.TransformColumnTypes(
                        #"Expand event_data.data",
                        {{"event_data.Attribute:timestamp", type datetime}, 
                        {"event_data.data.Duration", Int64.Type}}),
    //Sort by timestamp
    #"Sorted Rows" = Table.Sort(#"Changed Type",
                        {{"event_data.Attribute:timestamp", Order.Ascending}}),
    //Add an index column to identify each query
    #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Query Number", 1, 1),
    //Remove unwanted columns
    #"Removed Columns" = Table.RemoveColumns(#"Added Index",
                        {"object_name", "file_name"})
in
    #"Removed Columns"

 

All that’s happening here is that the function is being called in the first step, Source, and then I’m filtering by the Query End event, expanding some of the columns in event_data.data and setting column data types. You won’t need to copy all this code yourself though – you just need to invoke the function and then expand the event_data.data column to reveal whatever columns you are interested in. When you run a query that calls this function for the first time, you may need to give Power Query permission to connect to SQL Server and also to run a native database query.

Here’s an example PivotChart showing query durations built from this data after it has been loaded to the Excel Data Model:

image

Not very useful, for sure, but in the next post you’ll see a more practical use for this function.

You can download the sample workbook for this post here.

Written by Chris Webb

April 13, 2015 at 1:30 pm

Documentation For New Excel 2016 DAX Functions

with 4 comments

Microsoft has published documentation for the new DAX functions in the Excel 2016 preview here:

https://support.office.com/en-us/article/New-DAX-functions-for-Excel-2016-Preview-8192a787-aa91-4d7f-9a82-4e2c607e629a

There’s a lot of detail, including examples (although the ConcatenateX() page isn’t live at the time of writing – but I’ve blogged about that already), so it’s well worth reading through.

Written by Chris Webb

April 6, 2015 at 9:09 pm

Posted in DAX, Excel, PowerPivot

BI Survey 15

with 2 comments

It’s BI Survey time again! BI Survey is the largest annual survey of BI users in the world, so if you want to share your feelings on Microsoft BI tools or whatever else you’re using then this is the opportunity to do it. As in the past, in return for promoting the survey I get access to the results when they appear later in the year, and they always make for interesting reading and a good blog post. This year I’m curious to find out what people are saying about Power BI…

Anyway, if you do want to take part (it should only take 20 minutes and you’ll also be entered in a draw for some Amazon vouchers) then here’s the link:

https://digiumenterprise.com/answer/?link=2419-3RFFUGEB

Written by Chris Webb

April 1, 2015 at 12:30 pm

Posted in Uncategorized

NaturalInnerJoin And NaturalLeftOuterJoin DAX Functions In Excel 2016

with 3 comments

Continuing my series on new DAX functions in Excel 2016, here are two more: NaturalInnerJoin() and NaturalLeftOuterJoin(). Both do pretty much what you’d expect.

Consider the following two tables in an Excel worksheet, called ColourFruit and FruitPrice:

image

With these tables loaded into the Excel Data Model as linked tables, the next step is to create a relationship between the tables on the Fruit column:

image

Both functions only work with two tables that have an active relationship between them, and both take two tables from the Excel Data Model as parameters. Once you’ve done that you can use these functions in a DAX query.

The queries

evaluate naturalinnerjoin(ColourFruit,FruitPrice)

and

evaluate naturalinnerjoin(FruitPrice,ColourFruit)

…both perform an inner join between the two tables on the Fruit column and both return the same table:

image

The query

evaluate naturalleftouterjoin(ColourFruit,FruitPrice)

…returns

image

The query

evaluate naturalleftouterjoin(FruitPrice,ColourFruit)

…returns:

image

For NaturalLeftOuterJoin() the table given in the first parameter is on the left-hand side of the left outer join, so all rows from it are returned, whereas the table in the second parameter is on the right-hand side of the join so only the matching rows are returned.

Written by Chris Webb

March 30, 2015 at 2:00 pm

Posted in DAX, Excel, PowerPivot

Benford’s Law And Power Query

with 2 comments

Probably my favourite session at SQLBits the other week was Professor Mark Whitehorn on exploiting exotic patterns in data. One of the things he talked about was Benford’s Law, something I first heard about several years ago (in fact I’m sure I wrote a blog post on implementing Benford’s Law in MDX but I can’t find it), about the frequency distribution of digits in data. I won’t try to explain it myself but there are plenty of places you can read up on it, for example: http://en.wikipedia.org/wiki/Benford%27s_law . I promise, it’s a lot more interesting that it sounds!

Anyway, it struck me that it would be quite useful to have a Power Query function that could be used to find the distribution of the first digits in any list of numbers, for example for fraud detection purposes. The first thing I did was write a simple query that returned the expected distributions for the digits 1 to 9 according to Benford’s Law:

let
    //function to find the expected distribution of any given digit
    Benford = (digit as number) as number => Number.Log10(1 + (1/digit)),
    //get a list of values between 1 and 9
    Digits = {1..9},
    // get a list containing these digits and their expected distribution
    DigitsAndDist = List.Transform(Digits, each {_, Benford(_)}),
    //turn that into a table
    Output = #table({"Digit", "Distribution"}, DigitsAndDist)
in
    Output

 

image

Next I wrote the function itself:

//take a single list of numbers as a parameter
(NumbersToCheck as list) as table=>
let
    //remove any non-numeric values
    RemoveNonNumeric = List.Select(NumbersToCheck, 
                        each Value.Is(_, type number)),
    //remove any values that are less than or equal to 0
    GreaterThanZero = List.Select(RemoveNonNumeric, each _>0),
    //turn that list into a table
    ToTable = Table.FromList(GreaterThanZero, 
                        Splitter.SplitByNothing(), null, null, 
                        ExtraValues.Error),
    RenameColumn = Table.RenameColumns(ToTable,{{"Column1", "Number"}}),
    //function to get the first digit of a number
    FirstDigit = (InputNumber as number) as 
                    number => 
                    Number.FromText(Text.Start(Number.ToText(InputNumber),1))-1,
    //get the distributions of each digit
    GetDistributions = Table.Partition(RenameColumn, 
                    "Number", 9, FirstDigit),
    //turn that into a table
    DistributionTable = Table.FromList(GetDistributions, 
                    Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    //add column giving the digit
    AddIndex = Table.AddIndexColumn(DistributionTable, "Digit", 1, 1),
    //show how many times each first digit occurred
    CountOfDigits = Table.AddColumn(AddIndex, 
                    "Count", each Table.RowCount([Column1])),
    RemoveColumn = Table.RemoveColumns(CountOfDigits ,{"Column1"}),
    //merge with table showing expected distributions
    Merge = Table.NestedJoin(RemoveColumn,{"Digit"},
                             Benford,{"Digit"},"NewColumn",JoinKind.Inner),
    ExpandNewColumn = Table.ExpandTableColumn(Merge, "NewColumn", 
                            {"Distribution"}, {"Distribution"}),
    RenamedDistColumn = Table.RenameColumns(ExpandNewColumn,
                            {{"Distribution", "Expected Distribution"}}),
    //calculate actual % distribution of first digits
    SumOfCounts = List.Sum(Table.Column(RenamedDistColumn, "Count")),
    AddActualDistribution = Table.AddColumn(RenamedDistColumn, 
                            "Actual Distribution", each [Count]/SumOfCounts)
in
    AddActualDistribution

There’s not much to say about this code, apart from the fact that it’s a nice practical use case for the Table.Partition() function I blogged about here. It also references the first query shown above, called Benford, so that the expected and actual distributions can be compared.

Since this is a function that takes a list as a parameter, it’s very easy to pass it any column from any other Power Query query that’s in the same worksheet (as I showed here) for analysis. For example, I created a Power Query query on this dataset in the Azure Marketplace showing the number of minutes that each flight in the US was delayed in January 2012. I then invoked the function above, and pointed it at the column containing the delay values like so:

image

The output is a table (to which I added a column chart) which shows that this data follows the expected distribution very closely:

image

You can download my sample workbook containing all the code from here.

Written by Chris Webb

March 23, 2015 at 12:30 pm

Posted in Power Query

ConcatenateX() DAX Function In Excel 2016

with 15 comments

This is the first of many posts on the new DAX functions that have appeared in Excel 2016 (for a full list see this post). Today: the ConcatenateX() function.

The mdschema_functions schema rowset gives the following description of this function:

Evaluates expression for each row on the table, then return the concatenation of those values in a single string result, separated by the specified delimiter

Its signature is:

CONCATENATEX(Table, Expression, [Delimiter])

It’s easier to understand what it does using a simple example though. Consider the following table on a worksheet in Excel 2016:

image

When you add this table to the Excel Data Model (I called the table Sales) you can add the following measure:

Purchasing Customers:=
CONCATENATEX(
	VALUES(Sales[Customer]), 
	Sales[Customer], 
	","
)

If you then use this measure in a PivotTable, you see the following:

image

As you can see, the measure returns a comma-delimited list of all of the customers who have bought each product. Very useful…

Written by Chris Webb

March 19, 2015 at 10:24 pm

Posted in DAX, Excel

Follow

Get every new post delivered to your Inbox.

Join 3,962 other followers