## Introduction to MDX for PowerPivot Users, Part 4: Filtering

The combination of the filtering functionality built into PivotTables, and the ability to delete and reorder tuples in a set without needing to edit the set expression itself that the Excel named set functionality gives you, means that you can usually implement the filters you need in PowerPivot without needing to resort to MDX. However there are some scenarios where knowing the MDX functions that allow you to filter a set are useful and in this post I’ll show a few of them.

#### FILTER()

The Filter() function is the Swiss-Army penknife of filtering in MDX: it can do pretty much anything you want, but isn’t always the most elegant method. It’s quite simple in that it takes two parameters, the set that is to be filtered and a boolean expression that is evaluated for every item in the set and which determines whether that item passes through the filter or not.

Here’s a simple example. Consider a simple PivotTable (using my example model, described here) with FullDateAlternateKey on rows and the Sum of SalesAmount measure on columns:

The set of members on the FullDateAlternateKey level of the FullDateAlternateKey hierarchy can be obtained by using the .Members() function as I showed earlier in this series:

[DimDate].[FullDateAlternateKey].[FullDateAlternateKey].MEMBERS

This set can then be filtered by passing it to the Filter function, which itself returns a set, so it too can be used to create a named set. Let’s say we only wanted the set of dates where Sum of SalesAmount was greater than £10000; we could get it using the following expression:

Filter(

[DimDate].[FullDateAlternateKey].[FullDateAlternateKey].MEMBERS

, ([Measures].[Sum of SalesAmount])>10000)

What I’m doing is passing the set of all dates into the first parameter of Filter() and then, in the second parameter, testing to see if the value of the tuple ([Measures].[Sum of SalesAmount]) is greater than 10000 for each item in that set.

Here’s the result:

As I’ve mentioned before, it’s very important that you remember to check the ‘Recalculate set with every update’ button if you want the filter to be re-evaluated every time you change a slicer, which you almost always want to do.

Where is the filter function actually useful though? Here’s the same PivotTable but with Color on columns and with only Red and Black showing:

It’s not possible to filter this PivotTable to show only the dates where sales for Black products are greater than Sales of Red products using native functionality, but it is using MDX. Here’s the set expression:

Filter(

[DimDate].[FullDateAlternateKey].[FullDateAlternateKey].MEMBERS

, ([Measures].[Sum of SalesAmount], [DimProduct].[Color].&[Black])

>

([Measures].[Sum of SalesAmount], [DimProduct].[Color].&[Red])

)

Here I’m doing something similar to what I did in the first example, but now comparing two tuple values for each date: the tuple that returns the value of Sum of SalesAmount for Black products, which is

([Measures].[Sum of SalesAmount], [DimProduct].[Color].&[Black])

and the tuple that returns the value of Sum of SalesAmount for Red products, which is

([Measures].[Sum of SalesAmount], [DimProduct].[Color].&[Red])

#### NONEMPTY()

The NonEmpty() function also does filtering, but it’s much more specialised than the Filter() function – it filters items from a set that have empty values for one or more tuples. As with the Filter() function its first parameter is the set to be filtered, but its second parameter is another set, each of whose items are evaluated for each item in the first set. If one item in the second set evaluates to a non empty value for an item in the first set then that item passes through the filter.

That explanation is, I know, quite hard to digest so let’s look at an example. Here’s our PivotTable with no filter applied on rows and all Colors displayed on columns:

If you wanted to see only the rows where there were sales for Black products, you could use the following expression:

NonEmpty(

[DimDate].[FullDateAlternateKey].[FullDateAlternateKey].MEMBERS

, {([Measures].[Sum of SalesAmount], [DimProduct].[Color].&[Black])})

If you wanted to see only the rows where there were sales for Black **OR** Silver products, you could use this expression:

NonEmpty(

[DimDate].[FullDateAlternateKey].[FullDateAlternateKey].MEMBERS

, {([Measures].[Sum of SalesAmount], [DimProduct].[Color].&[Black])

,([Measures].[Sum of SalesAmount], [DimProduct].[Color].&[Silver])})

If you wanted to see only the rows where there were sales for Black AND Silver products you’d need to use two, nested NonEmpty functions:

NonEmpty(

NonEmpty(

[DimDate].[FullDateAlternateKey].[FullDateAlternateKey].MEMBERS

, {([Measures].[Sum of SalesAmount], [DimProduct].[Color].&[Black])})

, {([Measures].[Sum of SalesAmount], [DimProduct].[Color].&[Silver])})

#### TOPCOUNT(), BOTTOMCOUNT(), TOPPERCENT(), BOTTOMPERCENT()

The TopCount() and related functions are, as you’ve probably guessed from their names, useful for doing top N style filters. If you wanted to see the top 10 dates for sales of Black products you could use the following expression:

TopCount(

[DimDate].[FullDateAlternateKey].[FullDateAlternateKey].MEMBERS

, 10

, ([Measures].[Sum of SalesAmount], [DimProduct].[Color].&[Black])

)

Notice here how the dates are displayed in descending order for the Black column, but no other – that’s how you can tell that the TopCount() function is doing what you want.

To get the top N dates that provide at least 5% of the total sales across all time for Black products, you can use the following expression:

TopPercent(

[DimDate].[FullDateAlternateKey].[FullDateAlternateKey].MEMBERS

, 5

, ([Measures].[Sum of SalesAmount], [DimProduct].[Color].&[Black])

)

The BottomCount() and BottomPercent() function (I’m always reminded of one of the old, old jokes here when I use the BottomCount() function…) do the opposite and return the bottom items in a set, but you need to be careful using them because the bottom items in a set often have no values at all which is not very useful. So, for example, if you wanted to find the bottom 10 dates that have sales for Black products you need to use the NonEmpty() function as well as the BottomCount() function as follows:

BottomCount(

NonEmpty(

[DimDate].[FullDateAlternateKey].[FullDateAlternateKey].MEMBERS

, {([Measures].[Sum of SalesAmount], [DimProduct].[Color].&[Black])})

, 10

, ([Measures].[Sum of SalesAmount], [DimProduct].[Color].&[Black])

)

Here I’m taking the set of all members on the FullDateAlternateKey level of the FullDateAlternateKey hierarchy, passing that to the NonEmpty() function to return only the dates that have values for Black products and Sum of SalesAmount, and then getting the bottom 10 of those dates.

In part 5, I take a look at running MDX queries against a PowerPivot model.

[...] part 4 of this series (sorry for the long wait since then!) I finished off looking at what you can do with [...]

Introduction to MDX for PowerPivot Users, Part 5: MDX Queries « Chris Webb's BI BlogDecember 18, 2012 at 3:41 pm

[...] part 4, I’ll take a closer look at how to filter [...]

Introduction to MDX for PowerPivot Users, Part 3: The Members() and Crossjoin() functions « Chris Webb's BI BlogDecember 18, 2012 at 3:46 pm