IF, SWITCH And The Effect Of DAX Variables On Strict/Eager Evaluation

A lot of DAX performance problems relate to the use of strict and eager evaluation with the IF or SWITCH functions. It’s an incredibly complex, “it depends”, black-box type of topic and naturally Marco and Alberto have a great article on it here. Rather than describe any general rules – which would be pretty much impossible – in this blog post I want to show a specific scenario that illustrates how the use of DAX variables can influence whether strict or eager evaluation takes place.

Consider a Power BI semantic model that has the following three tables:

TableA and TableB are fact tables with numeric measure columns; Choice is a disconnected table containing text values that is intended for use in a slicer, so a report user can select an item in that slicer and that in turn influences what values a measure returns:

Here’s the definition of one such measure:

With Variables =
VAR a =
    SUM ( TableA[A] )
VAR b =
    SUM ( TableB[B] )
RETURN
    IF ( SELECTEDVALUE ( Choice[Choice] ) = "TableA", a, b )
)

And here’s my test report that includes a slicer and a card visual displaying the output of this measure:

Let’s look at the DAX query generated by the card visual containing the measure when “TableA” is selected in the slicer:

DEFINE
    VAR __DS0FilterTable =
        TREATAS ( { "TableA" }, 'Choice'[Choice] )

EVALUATE
SUMMARIZECOLUMNS (
    __DS0FilterTable,
    "With_Variables", IGNORE ( 'Choice'[With Variables] )
)

…and in particular what DAX Studio’s Server Timings feature shows us:

Even though only TableA is selected in the slicer and the query only returns the sum of the values in the A column of TableA, we can see that the Storage Engine is also querying TableB and getting the sum of the B column. It’s a great example of eager evaluation: both branches of the IF are being evaluated. Is this a bad thing? For this particular report it may be if the Storage Engine query for TableB is expensive.

How can you force strict evaluation to take place? You can force eager evaluation using the IF.EAGER function but there is no equivalent function to force strict evaluation. However you maybe be able to rewrite the measure to get strict evaluation to take place.

The key factor in this case is the use of variables in the measure definition. If you rewrite the measure to not use variables, like so:

No Variables =
IF (
    SELECTEDVALUE ( Choice[Choice] ) = "TableA",
    SUM ( TableA[A] ),
    SUM ( TableB[B] )
)

…then the query for the card visual query behaves differently:

DEFINE VAR __DS0FilterTable = 
TREATAS({"TableA"}, 'Choice'[Choice])

EVALUATE
SUMMARIZECOLUMNS(__DS0FilterTable, "No_Variables", IGNORE('Choice'[No Variables]))

Notice that there are now only two Storage Engine queries and that TableB is not now being queried, which will probably result in better performance. This is strict evaluation.

Why does the use of variables result in the use of eager evaluation here? Because it does, and it’s complicated. I need to stress that DAX uses lazy evaluation for variables which means that variables are not evaluated if they are not used – in the first measure above the IF is deliberately evaluating both branches. There are certainly other optimisations that may kick in and result in strict evaluation even when variables are used in IF/SWITCH. Indeed, the two measures in this example being from different fact tables is extremely important: if they had been from the same fact table then the behaviour would have been different and strict evaluation would have been used. In summary, though, if you are using variables and you want to try to force strict evaluation with IF/SWITCH then it’s worth rewriting your code to remove those variables to see if it makes a difference.

I also need to stress that these two measures will perform better or worse depending on how and where they are used. Consider these two table visuals that use the values from the Choice table on rows along with the two measures above:

Running the DAX query for the table on the left, which uses the measure with no variables and strict evaluation, gives the following in DAX Studio’s Server Timings:

Note that there are two Storage Engine queries for TableB, which is not a good thing.

On the other hand, the table on the right which uses the [No Variables] measure gives the following:

Note that there is only one Storage Engine query for TableB now, so in this case the tables have turned and the [With Variables] measure is likely to perform better.

When you use variables in the branches of IF/SWITCH, as in the [With Variables] measure, the variables are evaluated in a context at the place of the variable definition; if you don’t use variables in this way, as in the [No Variables] measure, the two SUM expressions used in the two branches are evaluated in the context of the branch, which adds a hidden filter context corresponding to the condition. This has two consequences which may hurt query performance:

  1. During evaluation, the hidden filter may be materialised into a large in-memory table
  2. Fusion cannot happen across common subexpressions in different branches because they have different contexts

In contrast the use of variables means the expressions used in them are evaluated in a context without the hidden filter corresponding to the branch, which can encourage fusion and discourage the materialisation of large in-memory tables.

[Thanks to Marius Dumitru and Jeffrey Wang for much of the information in this post]

Analyse Power BI Data In Excel With Python

In the Power BI/Fabric community everyone is excited about the recent release of Semantic Link: the ability to analyse Power BI data easily using Python in Fabric notebooks. Sandeep Pawar has an excellent blog post here explaining what this is and why it’s so cool. Meanwhile in the Excel community, everyone is excited about the new integration of Python into Excel. But can you analyse Power BI data in Excel using Python? Yes you can – so as my teenage daughter would say, it’s time for a crossover episode.

Let’s see a simple example. The main problem to solve is how to get data from Power BI into Excel in a format that Python in Excel can consume easily, ie a table rather than a PivotTable. The easiest way to do this is using the new Excel Connected Table feature, described here, which creates a live connection back to Power BI so when the data in the dataset changes the data in Excel is updated too. I have a Power BI dataset published to the Service that contains data from the UK government’s Land Registry Price Paid data which has details of all the property transactions in England and Wales; I found this in Excel and clicked Insert Table:

I then defined a query that found the number of property transactions and average price paid broken down by county:

This gave me a table, connected to the dataset in the Power BI Service using a DAX query, in my worksheet:

I renamed this table to “Sales”:

Unfortunately you can’t change the ugly column names without rewriting the DAX query behind the table, which makes life more difficult later on.

Then, on a new worksheet, I created a Python code cell using the PY function and entered the following Python code:

df=xl("Sales[#All]", headers=True)
s = plt.scatter(df.iloc[:,1], df.iloc[:,2], marker=11)
s.axes.xaxis.set_label_text("Count Of Sales")
s.axes.yaxis.set_label_text("Average Price Paid")

[I’ve only just started learning Python so please excuse any newbie bad practices/mistakes here! The main point is to visualise the data returned from Power BI]

This gave me a scatterplot with each county as a marker, the count of sales measure on the x axis and the average price paid measure on the y axis:

A few comments:

  • The xl function allows you to reference Excel cells, ranges and tables in your code; the reference to Sales[#All] gets the whole of the Sales table, including headers; adding headers=True means the table headers are recognised as such
  • Dealing with those ugly column names in Python is such a pain that I copped out and referenced the columns by position
  • After entering the code and committing it, you also need to tell Excel to treat the output as an Excel Value rather than a Python object to see the scatterplot; you also need to resize the cell

A second way of getting data into Excel from Power BI is to export the data from a published Power BI report. If you’re going to do that, you should export as a connected table so again the data stays connected to the source Power BI dataset.

There’s also a third , slightly different way of getting data from Power BI into Excel that is possible if you have Premium and which is a bit more complex but also more flexible: you can use Power Query, but maybe not in the way you would expect. The xl function can reference the output of a Power Query query even if that query is not loaded to a worksheet or the Excel Data Model – which I think is a nice touch and important if you’re working with larger data volumes.

To get data from Power BI into Excel using Power Query you need to use Power Query’s Analysis Services connector to connect to your workspace’s XMLA Endpoint. Go to the Data tab in Excel, click the Get Data button then From Database/From SQL Server Analysis Services Database (Import):

On the connection dialog the XMLA Endpoint goes into the Server box, the name of the dataset goes into the Database box and you can paste a DAX query into the MDX or DAX query box:

There are several benefits to using this approach:

  • You can use your own DAX query rather than have one generated for you
  • You can easily edit the DAX query after you have created the Power Query query
  • You can rename the query as well as all those ugly column names, making them easier to work with in Python – I named my query SalesByCounty and renamed my columns to County, CountOfSales and AveragePricePaid

I then closed the Power Query Editor without loading the output of the query anywhere.

You can read more about how to use Power Query queries in Python in Excel here.

Finally, here’s the modified version of the Python code to create the scatterplot shown above:

df=xl("SalesByCounty")
s = plt.scatter(df.CountOfSales, df.AveragePricePaid, marker=11)
s.axes.xaxis.set_label_text("Count Of Sales")
s.axes.yaxis.set_label_text("Average Price Paid")

Note how, in the first line, I can reference the Power Query query by name in the xl function and how, in the second line, renaming the columns in Power Query makes writing the Python code much easier.

Is this actually going to be useful to anyone? Well if Python in Excel is going to be used, it will be used by data analysts who love both Excel and Python – and who are also likely to use Power BI too. As Sandeep argues in the blog post about Semantic Link referenced above there are several reasons why these data analysts should use a Power BI dataset as a data source for their work rather than going back to the raw data: for example they can be sure they are using exactly the same data that is being used in their reports and they can use measures defined in the dataset rather than have to recreate the same calculations in their own code. While Semantic Link in Fabric is much more powerful than anything you can do in Excel with Power BI data, it’s only available in Fabric notebooks and this needs a Fabric or Premium capacity; this technique is available to anyone who has Python in Excel and works with Power BI Pro as well as Premium. So yes, in conclusion, I think there are some practical uses for this.

The “Visual Has Exceeded The Available Resources” Error In Power BI Desktop

One of my most popular blog posts of the last few years is this one on the “Visual has exceeded available resources” error in the Power BI Service:

https://blog.crossjoin.co.uk/2020/01/20/visual-has-exceeded-the-available-resources-error-power-bi/

This error only used to appear in the Power BI Service, but the good news is – and trust me, this is good news – it may now appear in Power BI Desktop too following the May 2023 release.

First, a quick recap of what causes this error. The Power BI Service has finite resources so we at Microsoft don’t allow you to run queries there that take hours to complete or consume vast amounts of memory; we impose limits on query duration and memory usage, and you’ll see this error in the Power BI Service when you hit these limits. The problem has been that, up until recently, these limits were not imposed in Power BI Desktop so it was easy to develop inefficient reports and datasets that worked ok (if a little slowly) on the developer’s PC but then caused errors after they were published. What has changed is that these limits are now imposed in Power BI and they are also configurable there.

The announcement of this change is here:

https://powerbi.microsoft.com/en-us/blog/power-bi-may-2023-feature-summary/#post-23312-_Toc134425717

How do you know if you are running into these limits? You’ll see an error in your visual in Power BI Desktop that looks like this:

The error message is:

Visual has exceeded the available resources

If you click on the “See details” link you’ll see the following dialog:

Resources Exceeded
This visual has exceeded the available resources. Try filtering to decrease the amount of data displayed.

What should you do if you encounter this error? The most important thing is to understand why it’s happening. There are two possible causes:

  1. Your query is taking too long to run. For the Power BI Service, the default query timeout is 225 seconds although it is possible for an admin to reduce this; unless you’ve set a custom limit or you’re not using a Power BI dataset as a source, it’s likely that 225 seconds is the longest that a query will run for in Power BI Desktop.
  2. Your query is using too much memory. This is probably because you are doing an inefficient DAX calculation on a very large table (filtering on entire table rather than a single column is a classic anti-pattern, for example).

As a result you need to do some tuning. “But Chris!”, you say, “my query has to run for longer than 225 seconds! It’s too difficult to tune!” To which I say – tough, you don’t have a choice. Slow, inefficient queries result in unhappy users so if you don’t fix these issues you have an even bigger problem. This video of Marco’s is a good place to start if you want to learn about performance tuning.

In order to do that tuning though (or if you just want to get stuff done before you do any tuning, or if you think the wrong limits are being imposed) you’ll need to turn off the limits so you can capture the DAX query for the offending visual using Performance Analyzer. To do this, go to File/Options and settings/Options to open the Options dialog, go to the Report settings pane and scroll to the bottom to see the Query limit simulations section:

You can either use the “Custom limits” option, as shown in the screenshot above, to set your own limits (enter 0 in one of these boxes for no limits to be imposed) or use the “No query limits” for no limits. You should only use these options temporarily, remember, otherwise you run the risk of getting this error in the Power BI Service later on!

It’s worth mentioning that the limits imposed in Power BI Desktop are only imposed on queries generated by Power BI Desktop itself, which means that they won’t affect external tools like DAX Studio that can also be used to query a dataset in Power BI Desktop. You can see how the limits are imposed by running a Profiler trace on Power BI Desktop, finding the Query Begin event for each query and looking for the Timeout and DbpropMsmdRequestMemoryLimit properties in the Property List shown under the query text:

Also, these settings are saved on a per-file basis, so if you create a new .pbix file it will have the default settings and not the settings you made in any other .pbix file.

DirectQuery Parallelisation In Power BI – Some Examples

Recently we announced an important new optimisation for DirectQuery datasets: the ability to run (some) of the queries generated by a single DAX query in parallel. You can read the blog post here:

https://powerbi.microsoft.com/en-za/blog/query-parallelization-helps-to-boost-power-bi-dataset-performance-in-directquery-mode/

A few of us on the Power BI CAT team have tested this out with customers and seen some great results, so I thought I’d write a post illustrating the effect this optimisation can have and explaining when it can and can’t help.

For the purposes of this blog post I built a DirectQuery dataset (the source and the design of the dataset are irrelevant) using the New York Taxi data. I then created a table with five measures on columns; the precise definitions of the measures don’t matter much either, except for the fact that they each generate a single Storage Engine request and horizontal fusion cannot combine these requests in any way.

Next, I published the dataset to the Power BI Service, connected to it from Tabular Editor, set the compatibility level property to 1569 and saved the change:

Having done this the Max Parallelism Per Query property became visible, and I set it to 1:

1 is not the default value for this property but it does give the same behaviour as the default at the time of writing – it ensures that the Storage Engine requests for a single DAX query are always executed one at a time.

Next I connected DAX Studio up to the XMLA Endpoint for the workspace and ran the DAX query generated by the table visual above with Server Timings enabled. The new timeline view in DAX Studio 3.0.6 (see Marco’s video for more details) does a great job of showing when the Storage Engine queries – in this case the SQL queries sent to the data source – are made:

As you can see, in this case it’s clear that the six Storage Engine queries/SQL queries generated by this one DAX query are all executed in sequence. Also notice that the duration of the DAX query was 3 seconds.

I then changed the Max Parallelism Per Query property to 6 and reran the same query from DAX Studio. Here’s what the timeline looked like:

The Storage Engine queries/SQL queries now all run in parallel and the overall duration of the DAX query is just 0.9 seconds!

This is great, but as always there are a few things that need to be highlighted. First, Power BI may not be able to run all the Storage Engine queries in parallel if there are certain types of dependency in your DAX. In the example above there were no dependencies between the measures – none of them referenced any other measures, and they all used fairly simple expressions – but in the real world that might not be the case. To illustrate this I created a new measure with the following definition:

MEASURE WithDependencies =
    IF (
        CALCULATE ( [Distinct Medallions], ALL ( 'GEOGRAPHY'[STATE] ) ) > 0,
        [Trip Count],
        [Monday Trips]
    )

I then used it in a new table visual and captured the DAX query generated:

Here’s what the DAX Studio Server Timings showed for this query with Max Parallelism Per Query set to 6:

This time there is some parallelism but the first Storage Engine query/SQL query has to complete before the last two can be executed. I wrote this measure specifically to get this behaviour so you may be able to rewrite your DAX to get better parallelism, but it’s something to be aware of.

One last thing to mention: increasing parallelism here may lead to worse query performance overall if you end up running into bottlenecks elsewhere in your architecture (see here for an example). You may need to increase the “maximum number of connections per data source” property on your dataset to allow more queries to run in parallel, and if that results in additional load on the data source then you may need to scale up or scale out there too. Remember also that the DAX queries for a report page are, and always have been run in parallel so the Storage Engine queries for different DAX queries will also be run in parallel; you’ll need to watch out for report pages with a large number of visuals on them.

Text Search Performance In Power BI

In the blog post for the October 2022 release of Power BI Desktop there’s a brief description of a new optimisation in Power BI for filters that perform searches in text columns. In this blog post I’m going to share more details about this optimisation and how you can make sure your reports can benefit from it.

First of all, a brief description of the problem. Let’s say you have a Power BI dataset containing just the following table:

Let’s also say you want to build a report where a user can search for certain terms in the Description field and display a filtered table in a report:

Note how the Description field has been dragged into the Filter pane, the filter type is set to “Advanced filtering”, the “Show items when the value” dropdown has “contains” selected and the search term is “citrus”, so the table on the left only shows the fruit where the description includes the text “citrus”. Some custom visuals such as the Text Filter visual have very similar functionality.

Here’s the DAX query generated for the table visual in this screenshot:

DEFINE
	VAR __DS0FilterTable = 
		FILTER(
			KEEPFILTERS(VALUES('Fruit'[Description])),
			SEARCH("citrus", 'Fruit'[Description], 1, 0) >= 1
		)
	VAR __DS0Core = 
		CALCULATETABLE(
			SUMMARIZE('Fruit', 'Fruit'[Fruit Name], 'Fruit'[Description]),
			KEEPFILTERS(__DS0FilterTable)
		)
	VAR __DS0PrimaryWindowed = 
		TOPN(501, __DS0Core, 'Fruit'[Fruit Name], 1, 'Fruit'[Description], 1)
EVALUATE
	__DS0PrimaryWindowed
ORDER BY
	'Fruit'[Fruit Name], 'Fruit'[Description]

As you can see, the filter in this query is accomplished using the DAX Search() function. This is a great example of the type of query that the optimisation I’m talking about can speed up.

Here are some more details about how this optimisation works:

  • This optimisation is now enabled and works in the same way in both Power BI Desktop and the Power BI Service.
  • The first time that any query or measure that uses either the Search() or ContainsString() DAX functions on a text column is evaluated, Power BI starts to build a special text index just for that column.
  • This index build will only be successful if two conditions are true:
    • The text column must only contain characters from the classic 128 character ASCII set.
    • The index build must take less than 25 seconds. If 25 seconds elapse then the build will timeout and Power BI will continue to run the query without the index being present.
  • If the index build succeeds for that column then the index can be used by all subsequent queries by all users, but it will be dropped when:
    • Power BI Desktop is restarted, if you’re in Power BI Desktop.
    • The dataset is refreshed, either in Power BI Desktop or the Power BI Service.
    • The dataset is evicted from memory in the Power BI Service or when the dataset is under memory pressure.
  • DAX queries that use the index will be a lot faster than queries that do not, although the difference will only be noticeable when you are searching on a table with thousands of rows and in a column with reasonably long text values.
  • There is no way for you to know whether an index has been built or whether the build has failed, or if a DAX query uses an index, unfortunately. However if you look at the duration of the DAX queries that do this kind of search (for example in Log Analytics or by running a Profiler trace) and you see the first query after a refresh is relatively slow and subsequent queries are almost instant then it’s likely that the index has been built successfully; on the other hand if your queries are consistently slow then it’s likely the index has not been built successfully. Before you all leave comments complaining, I know this is not ideal and I hope we’ll be able to make further improvements in the future.

How can you ensure that the index is built successfully? The only way to ensure that you stay under the 25 second timeout limit is to reduce the amount of text that needs to be indexed, either by reducing the number of rows in the table or by reducing the amount of text in the column. Reducing the amount of text in the column is probably the only feasible option: for example you may be able to remove words like “and” and “the” from your text since users are less likely to need to search for them. Making sure your text column only contains ASCII characters is much harder because you can’t just remove all non-ASCII characters (such as characters with diacritics) without making your text unsearchable; I have a good-enough solution in the form of an M custom function which removes diacritics from characters in text and ensures that only ASCII characters are left here.

[Thanks to Jeffrey Wang for the information in this post]

Why DAX Window Functions Are Important For Performance In Power BI DirectQuery Mode

The new DAX window functions (announced here, more details on Jeffrey Wang’s blog here and here) have generated a lot of excitement already – they are extremely powerful. However one important benefit of using them has not been mentioned so far: they can give you much better performance in DirectQuery mode because they make it more likely that aggregations are used. After all, the fastest DirectQuery datasets are the ones that can use aggregations (ideally Import mode aggregations) as much as possible.

To illustrate this, here’s a very simple dataset with a fact table in DirectQuery mode and a Date dimension table in Dual mode built on the SQL Server AdventureWorksDW2017 sample database:

Let’s start off with a simple measure that sums up the values in the SalesAmount column:

Sales Amount = SUM('Internet Sales'[SalesAmount])

When you use it in a table visual with the CalendarYear column from the Date table like so:

…Power BI can get the data it needs with a single SQL query. I won’t show the whole query here, but it’s a simple Group By and returns exactly what you’d expect if you run it in SQL Server Management Studio:

Now let’s say you want to do a year-on-year growth calculation. To do this, you’ll need to be able to find the Sales Amount for the previous year. One way to do this in DAX would be to use the SamePeriodsLastYear function like so:

LY Sales Amount V1 =
CALCULATE (
    [Sales Amount],
    SAMEPERIODLASTYEAR ( 'Date'[FullDateAlternateKey] )
)

Used in a table it gives the correct result:

However the SQL query generated for this visual is now a lot more, ahem, verbose and because the DAX time intelligence functions are all resolved at the date granularity it now returns Sales Amount summed up by date rather than by year:

If you write the same previous year sales measure using the new Offset function like so:

LY Sales Amount V2 =
CALCULATE (
    [Sales Amount],
    OFFSET (
        -1,
        ALLSELECTED ( 'Date'[CalendarYear] ),
        ORDERBY ( 'Date'[CalendarYear], ASC )
    )
)

…you get the same result as before:

…but now the SQL query is much simpler and returns data at the year granularity, as you’d want:

I’m not a SQL expert so I won’t comment on the SQL generated – besides, it may change in the future – but the most important implication of this is that the version of the measure that uses Offset is more likely to be able to use aggregations.

For example, if you add an extra import-mode table to the dataset with the data from the fact table aggregated to year granularity:

…and set it up as an aggregation table:

…then any queries at the year granularity should use it. As you would expect, the query for visual shown above with the measure using SamePeriodLastYear misses the aggregation and goes to the DirectQuery fact table:

The query for the measure using Offset, though, can use the aggregation and there is no DirectQuery activity at all:

This is just one example. I’m almost certain there are other ways to write this calculation without using Offset which will also hit the aggregation but they won’t be as elegant as the Offset version. What’s more, as your measures get more and more complex it gets harder and harder to write DAX that results in simple, efficient SQL in DirectQuery mode and the new window functions are a massive help here.

Performance Tuning Table Visuals With Filters Applied In Power BI

Do you have a Power BI report with a table or a matrix on it, where there is a filter on the rows of the table? It’s a very common scenario indeed. Is the table or matrix slow to render? If so, this post is for you!

Consider the following table visual:

There are four columns: Date, Town and two measures. One measure called [Fast Measure] is, as the name suggests, very quick to execute; the other measure, called [Slow Measure], is very complex and slow. The definitions are irrelevant here. Notice that there is a filter on this table visual so only the rows where [Fast Measure] is greater than 1 are shown.

If I measure the amount of time to render this table in Performance Analyzer, it takes around 17.5 seconds to run. However, if I remove the filter on [Fast Measure], the table only takes 8 seconds to run. Why? The filter is on the fast measure and surely more rows are returned without the filter, so wouldn’t the slow measure be evaluated more?

It turns out that the DAX generated for tables with filters applied isn’t as well-optimised as it could be. Yes, I work on the Power BI team and yes this is something that should be addressed, but before that happens I thought it would be good to make as many people aware of this as possible so they can tune their reports. I have worked with a lot of customers recently who are running into this problem without realising it.

Here’s the DAX query for the table without the filter:

DEFINE
  VAR __DS0Core = 
    SUMMARIZECOLUMNS(
      'Date'[Date],
      'Property Transactions'[Town],
      "Fast_Measure", 'Property Transactions'[Fast Measure],
      "Slow_Measure", 'Property Transactions'[Slow Measure]
    )

  VAR __DS0PrimaryWindowed = 
    TOPN(501, __DS0Core, 'Date'[Date], 1, 'Property Transactions'[Town], 1)

EVALUATE
  __DS0PrimaryWindowed

ORDER BY
  'Date'[Date], 'Property Transactions'[Town]

Here’s the DAX query for the table with the filter:

DEFINE
  VAR __ValueFilterDM0 = 
    FILTER(
      KEEPFILTERS(
        SUMMARIZECOLUMNS(
          'Date'[Date],
          'Property Transactions'[Town],
          "Fast_Measure", 'Property Transactions'[Fast Measure],
          "Slow_Measure", 'Property Transactions'[Slow Measure]
        )
      ),
      [Fast_Measure] > 1
    )

  VAR __DS0Core = 
    SUMMARIZECOLUMNS(
      'Date'[Date],
      'Property Transactions'[Town],
      __ValueFilterDM0,
      "Fast_Measure", 'Property Transactions'[Fast Measure],
      "Slow_Measure", 'Property Transactions'[Slow Measure]
    )

  VAR __DS0PrimaryWindowed = 
    TOPN(501, __DS0Core, 'Date'[Date], 1, 'Property Transactions'[Town], 1)

EVALUATE
  __DS0PrimaryWindowed

ORDER BY
  'Date'[Date], 'Property Transactions'[Town]

Notice how, in this second query, the filter is applied in an additional variable called __ValueFilterDM0. The side effect of implementing the filter in this way is that all the measures in the table, including [Slow Measure], are evaluated twice. This explains why, in this example, the filtered table is twice as slow as the unfiltered table. The performance of your table may be different: it depends on the number of measures, their definition, the nature of the filter and many other factors. Measuring the performance of your table with and without the filter applied will tell you how much of an impact this issue is having in your report.

What can you do about this? One solution is to apply the filter inside the measures rather than on the visual. In this case, creating two new measures with the following definitions:

Fast Measure Filtered =
VAR f = [Fast Measure] 
RETURN IF ( f > 1, f )

Slow Measure Filtered =
IF ( [Fast Measure] > 1, [Slow Measure] )

..and then using these measures on columns instead of the original measures, removing the visual-level filter, like so:

In my example this results in the table rendering in 8 seconds again, but again your mileage may vary. What’s more I’ve turned off totals in the table and therefore avoided the problem of making these measures return the same totals as the totals in the original filtered table. There may be other solutions (I suspect calculation groups would be worth investigating) that work better in your dataset and report.

Are you affected by this problem? How much impact does it have? Do you have a different solution? Let me know in the comments…

Diagnosing Calculation Group-Related Performance Problems In Power BI Using EvaluateAndLog

A few weeks ago I wrote a post showing how you can use the new EvaluateAndLog DAX function to diagnose performance problems relating to the use of the Switch function. Did you know that calculation groups can experience similar performance problems though? In some scenarios limited evaluation takes place for all calculation items, not just the one you expect; luckily you can use EvaluateAndLog to diagnose this too. In this post I’ll show you a simple example.

Let’s say you have a dataset containing the following table, called Sales:

There’s also a calculation group with three calculation items:

The Measure Value calculation item returns the value of SelectedMeasure but also uses EvaluateAndLog:

EVALUATEANDLOG( SELECTEDMEASURE() )

The Measure Value * 2 and Measure Value * 3 calculation items do the same thing but multiply the value returned by SelectedMeasure by two and three respectively, for example:

EVALUATEANDLOG( SELECTEDMEASURE() * 2 )

There’s another table in the dataset, with no relationship to the Sales table, called Calc Item Names containing a single column listing the three calculation items’ names:

Last of all, there is a measure called Sales Amount:

Sales Amount = SUM(Sales[Sales])

…and a measure called Dynamic Sales Amount Unoptimised that takes the selection made on the Calc Item Names column of the Calc Item Names table and uses it to dynamically select the calculation item with that name and apply it to the Sales Amount measure:

Dynamic Sales Amount Not Optimised =
VAR SelectedCalcItem =
    SELECTEDVALUE ( 'Calc Item Names'[Calc Item Names] )
RETURN
    CALCULATE ( [Sales Amount], 'My Calc Group'[Name] = SelectedCalcItem )

All this allows you to build the following report where a slicer controls the calculation item applied in the Dynamic Sales Amount Not Optimised measure:

As the name suggests, though, there’s a problem with the Dynamic Sales Amount Not Optimised measure that EvaluateAndLog can help uncover. Using Jeffrey Wang’s DAXDebugOutput tool to capture what happens when Measure Value is selected in the slicer, you can see three DAXEvaluationLog events. The first is the one you’d expect: since Measure Value is selected in the slicer, the Measure Value calculation item is evaluated:

The other two events are for the other two, unselected calculation items though – and similar to the Switch problem, some evaluation taking place for all calculation items can cause performance problems:

How do you avoid this problem? It turns out that if you use TreatAs to apply the calculation item inside Calculate, like so:

Dynamic Sales Amount Optimised =
CALCULATE (
    [Sales Amount],
    TREATAS ( VALUES ( 'Calc Item Names'[Calc Item Names] ), 'My Calc Group'[Name] )
)

You only get one DaxEvaluationLog event when the query runs and only the calculation item you selected is evaluated:

[Thanks to Krystian Sakowski for originally diagnosing the problem in this post and coming up with the fix]

Finding The Tables, Columns And Measures Used By A DAX Query In Power BI

If you’re monitoring activity in Power BI, for example using Log Analytics, you’ll know that you can capture the DAX queries generated by your published Power BI reports. How do you make sense of them though? You may want to know which tables, columns or measures are being used by a query – maybe so you can work out which ones aren’t being used and can be deleted. I always thought the only way to do this would be to parse the DAX query, which would be incredibly difficult to do. Yesterday, though, Igor Cotruta pointed out on Twitter that there’s a really easy way to do this that I didn’t know about, using the DISCOVER_CALC_DEPENDENCY DMV. I’ve used this DMV before and blogged about it here, but what was new to me is that you can pass a DAX query into the QUERY restriction (it is all documented here). To see how it works let’s look at a simple example.

Consider the following dataset consisting of a single table with a calculated column and a few measures:

Now consider a table visual built on this dataset with a slicer linked to it:

You can get the DAX query generated by this visual using Performance Analyzer’s Copy Query feature (described in this article). You can then pass this DAX query into the WHERE clause of a query against the DISCOVER_CALC_DEPENDENCY DMV like so:

SELECT 
REFERENCED_OBJECT_TYPE, 
REFERENCED_TABLE,
REFERENCED_OBJECT,
REFERENCED_EXPRESSION
FROM 
$SYSTEM.DISCOVER_CALC_DEPENDENCY 
WHERE 
QUERY = 
'
DEFINE
  VAR __DS0FilterTable = 
    TREATAS({"Apples"}, ''Sales''[Product])

  VAR __DS0Core = 
    SUMMARIZECOLUMNS(
      ROLLUPADDISSUBTOTAL(''Sales''[Country], "IsGrandTotalRowTotal"),
      __DS0FilterTable,
      "Sales_Amount", ''Sales''[Sales Amount],
      "Effective_Tax_Rate", ''Sales''[Effective Tax Rate]
    )

  VAR __DS0PrimaryWindowed = 
    TOPN(502, __DS0Core, [IsGrandTotalRowTotal], 0, ''Sales''[Country], 1)

EVALUATE
  __DS0PrimaryWindowed

ORDER BY
  [IsGrandTotalRowTotal] DESC, ''Sales''[Country]

'

[Note that the DAX query I’m passing in contains single quotes that have to be escaped, which explains why all the table names are surrounded by two single quotes]

The query can be run from DAX Studio connected to either Power BI Desktop or a dataset published to Power BI Premium via the XMLA Endpoint. Here’s the output:

As you can see, this gives you the table, columns (including the hidden RowNumber column), measures and calculated column directly referenced by the query as well as the DAX definitions of the measures and calculated column. It does not mention the [Tax Amount] measure, which the [Effective Tax Rate] measure references; you can get the full list of measures and calculated columns in the dataset and the dependencies between them by querying the DMV without the WHERE clause though. Neither does the output tell you that the Product column is being sliced by the value “Apples”, which would also be useful. Overall, though, I think this is a really powerful feature.

Diagnosing Switch-Related Performance Problems In Power BI DAX Using EvaluateAndLog

The Switch function is often a cause of performance problems in DAX. Last year a number of optimisations were made that fixed the most common issues (and as a result you should ignore older blog posts that you may find on this subject) but some scenarios still remain where performance can be bad; Marco and Alberto wrote a great post recently outlining some of them that I encourage you to read. How do you even know whether your measure is slow because of your use of the Switch function though? Trying to read query plans, as Marco and Alberto do in their post, isn’t usually practical and commenting out branches in a Switch to see if there is a performance change can be very time-consuming. Luckily, the new EvaluateAndLog DAX function can help you diagnose Switch-related performance issues.

To illustrate this I built a very simple Power BI dataset with a fact table called Sales and a table called DisconnectedTable used for choosing the measure to display in a report (I know you can use field parameters for this but this is just a demo).

The Sales table looks like this:

…and the DisconnectedTable table looks like this:

There are two basic measures that sum up the values in the Sales and Tax columns:

Sales Amount = SUM('Sales'[Sales])

Tax Amount = SUM('Sales'[Tax])

There are also two measures that display either Sales Amount or Tax Amount based on the selection made on DisconnectedTable. One is called Dynamic Measure Optimised:

Dynamic Measure Optimised =
SWITCH (
    SELECTEDVALUE ( DisconnectedTable[Measure] ),
    "Sales Amount", [Sales Amount],
    "Tax Amount", [Tax Amount],
    BLANK ()
)

…and one is called Dynamic Measure Unoptimised:

Dynamic Measure Unoptimised =
SWITCH (
    SELECTEDVALUE ( DisconnectedTable[Key] ),
    1, [Sales Amount],
    2, [Tax Amount],
    BLANK ()
)

Now let’s assume you have a report with a slicer linked to the Measure column on DisconnectedTable and a table showing the Product column and either the Dynamic Measure Optimised or Dynamic Measure Unoptimised measure:

Both measures return the same value, but as Marco and Alberto show in their post since Dynamic Measure Optimised looks for the selected value on the column that is used in the slicer – the column called Measure on DisconnectedTable – it will perform better than Dynamic Measure Unoptimised, which looks for the selected value on the Key column of DisconnectedTable, which is not used in the slicer.

How can we prove this though? If you wrap the expressions used in each branch of Switch with EvaluateAndLog, like so:

Dynamic Measure Optimised =
SWITCH (
    SELECTEDVALUE ( DisconnectedTable[Measure] ),
    "Sales Amount", EVALUATEANDLOG ( [Sales Amount] ),
    "Tax Amount", EVALUATEANDLOG ( [Tax Amount] ),
    BLANK ()
)

Dynamic Measure Unoptimised =
SWITCH (
    SELECTEDVALUE ( DisconnectedTable[Key] ),
    1, EVALUATEANDLOG ( [Sales Amount] ),
    2, EVALUATEANDLOG ( [Tax Amount] ),
    BLANK ()
)

…and then run Jeffrey Wang’s DAXDebugOutput tool and connect it up to your Power BI Desktop file, you can see the difference between the two measures for the DAX query that populates the table visual in the report.

When using Dynamic Measure Optimised and with Sales Amount selected in the slicer, the DAX query for the table visual generates two DAXEvaluationLog events, one for the body of the table, one for the grand total row:

This is what you would expect and this is the optimal pattern: both DAXEvaluationLog events only mention the Sales Amount measure, the measure whose value is displayed in the slicer. However, if you use Dynamic Measure Unoptimised with Sales Amount selected in the slicer, you’ll see two extra DAXEvaluationLog events being generated where Power BI is paying the cost for analysing and executing the unselected branch for Tax Amount just to result an empty, unused result in the end:

It’s these evaluations of unused Switch branches that can cause query performance problems. So my advice is if you’ve got a lot of complex measures that use Switch you should study Jeffrey’s blog carefully use EvaluateAndLog to see if any optimisation is necessary.

[Thanks to Jeffrey Wang for his help writing this post]