Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Advanced Options For Loading Data From SQL Server With Power Query

leave a comment »

Loading data from SQL Server using Power Query is fairly straightforward, and the basics are well-documented (see here for example). However there are a few advanced options, not all of which are currently shown in the online help although they are in the latest version of the Library Specification document and are visible inside the Power Query window’s own help, and not all of which are documented in a lot of detail.

Here’s what the Power Query window shows for the Sql.Database function when you show help:

image

Here are all of the options available:

MaxDegreeOfParallelism does what you would expect, setting the MAXDOP query hint for the SQL query that Power Query generates. For example, the Power Query query:

let

    Source = Sql.Database("localhost", "adventure works dw", [MaxDegreeOfParallelism=2]),

    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data]

in

    dbo_DimDate

 

Generates the query:

select [$Ordered].[DateKey],

    [$Ordered].[FullDateAlternateKey],

    [$Ordered].[DayNumberOfWeek],

    [$Ordered].[EnglishDayNameOfWeek],

    [$Ordered].[SpanishDayNameOfWeek],

    [$Ordered].[FrenchDayNameOfWeek],

    [$Ordered].[DayNumberOfMonth],

    [$Ordered].[DayNumberOfYear],

    [$Ordered].[WeekNumberOfYear],

    [$Ordered].[EnglishMonthName],

    [$Ordered].[SpanishMonthName],

    [$Ordered].[FrenchMonthName],

    [$Ordered].[MonthNumberOfYear],

    [$Ordered].[CalendarQuarter],

    [$Ordered].[CalendarYear],

    [$Ordered].[CalendarSemester],

    [$Ordered].[FiscalQuarter],

    [$Ordered].[FiscalYear],

    [$Ordered].[FiscalSemester]

from [dbo].[DimDate] as [$Ordered]

order by [$Ordered].[DateKey]

option(maxdop 2)

[as an aside – yes, the SQL query has an Order By clause in it. Power Query likes all of its tables ordered. It would be nice to have an option to turn off the Order By clause though, I think]

CreateNavigationProperties controls the creation of the navigation properties that allow you to browse from table to table in the Power Query Query Editor. For example, when you connect to a table in SQL Server and Power Query can see foreign key relationships between that table and other table, you’ll see extra columns that allow you to follow these relationships:

image

The problem is that these columns will appear as useless text columns when you load the data into a table on the worksheet or the Excel Data Model, although of course you can delete them manually using the Remove Columns functionality in the Power Query Query Editor:

image

Setting CreateNavigationProperties=false will stop these extra columns being created, for example:

let

    Source = Sql.Database("localhost", "adventure works dw",[CreateNavigationProperties=false]),

    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data]

in

    dbo_DimDate

 

I believe it should also prevent the automatic creation of relationships between tables in the Excel Data Model, but I can’t work out how to test this.

NavigationPropertyNameGenerator controls how the names for these columns are generated. There’s no public documentation about this at all, and I’m rather pleased with myself for working out how to use it. It’s quite useful because I find the default names that get generated aren’t as clear as they could be in all cases. Here’s an example of how to build more detailed, albeit more verbose, names:

let

    //declare a function that combines a list of text using commas

    ConcatByComma = Combiner.CombineTextByDelimiter(","),

    //declare a name function

    MyNameFunction = (p, a) => 

      List.Transform(a, each 

        "Navigate from " & _[SourceTableName] & " to " & _[TargetTableName] & 

        " via " & ConcatByComma(_[SourceKeys]) & " and " & ConcatByComma(_[TargetKeys])),

    //use this name function in Sql.Database

    Source = Sql.Database("localhost", "adventure works dw",

      [NavigationPropertyNameGenerator=MyNameFunction]),

    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data]

in

    dbo_DimDate

image

The NavigationPropertyNameGenerator option takes a function with two arguments:

  • Pattern, which appears to be a list containing all of the names of the columns on the current table
  • Arguments, which is a list of records containing information about each of the navigation columns. Each record contains the following fields
    • SourceTableName – the name of the source table, ie the table you’re currently on
    • TargetTableName – the name of the table to navigate to
    • SourceKeys – a list containing the names of the columns on the source table involved in the relationship
    • TargetKeys – a list containing the names of the columns on the target table involved in the relationship
    • SingleTarget – not quite sure about this, but it appears to be a logical (ie true or false) value indicating whether there is just one target table involved in all relationships

In my example above, I’ve created two functions. The first, ConcatByComma, is a simple combiner that concatenates all of the text in a list using commas (see here for more details on how to do this). The second, MyNameFunction, is the important one – it uses List.Transform to iterate over the list passed to Arguments (called a in my function) and generate the text for each column header.

Query allows you to specify your own SQL query for Power Query to use. Very useful, but there’s not much to say about this that isn’t already covered in the official help on this feature. The only thing that’s important is that if you do use this option it will prevent query folding from taking place for the rest of your query – which could lead to performance problems.

Here’s an example query:

let

    Source = Sql.Database("localhost", "adventure works dw", 

     [Query="select distinct CalendarYear from DimDate"])

in

    Source

 

CommandTimeout allows you to specify a query timeout as a value of type duration. The default timeout is ten minutes. Here’s an example of how to set a new timeout of one day, two hours, three minutes and four seconds using the #duration() intrinsic function:

let

    Source = Sql.Database("localhost", "adventure works dw", 

     [CommandTimeout=#duration(1,2,3,4)]),

    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data]

in

    dbo_DimDate

Written by Chris Webb

September 29, 2014 at 9:30 am

Posted in Power Query

Vote James Rowland-Jones For The PASS Board Of Directors

with one comment

I apologise for two posts in a row on PASS politics, but this will only be a short one and normal service will be resumed soon. If you’re a member of PASS then it’s likely you’ll have received an email allowing you to vote in the PASS Board elections. I’ve just voted for my friend James Rowland-Jones (aka JRJ) and I humbly submit that you should consider doing so too. Here’s why:

  • Although I resigned from SQLBits earlier this year, for a long time I worked alongside him on the SQLBits committee. I know first-hand that he’s intelligent, organised and an experienced manager. His skilful handling of the SQLBits sponsorship portfolio is one of the main reasons why its finances are so healthy and therefore why the conference is so successful.
  • He’s the only European candidate standing and is committed to expanding PASS internationally.
  • He understands the needs of the BI community within PASS.
  • Most importantly he has the diplomatic skills that the job requires. You only have to look at how he has conducted himself in the recent mudslinging over the PASS name-change and the BA conference to know this.

You can read JRJ’s full manifesto here.

Written by Chris Webb

September 24, 2014 at 9:46 pm

Posted in PASS

Why I Support The PASS Name Change And Its Efforts To Expand Into The BA Community

with 27 comments

Unless you take a close interest in the politics of PASS you may have missed the kerfuffle around it changing its name from the “Professional Association of SQL Server” to just PASS a month or so ago. Blog posts from Andy Warren and R. Brian Kelley seem to sum up a lot of people’s feelings. There was a similar outcry on Twitter following an announcement about the way speakers will be selected for next year’s PASS BA Conference. In both cases I read the news when it came out and didn’t give much thought to it, because I agreed with what PASS were doing and I could understand why they were doing it. The strength of feeling these moves stirred up surprised me, and it made me realise that what made perfect sense to me was not so clear-cut to other people in the SQL Server community and needed to be defended – hence this blog post.

Let me start with the name change. I don’t think PASS will ever be anything but a SQL Server-centric organisation, whatever its name. However the definition of what is “SQL Server” and what isn’t has blurred in recent years. You could argue that PASS should confine itself to products that have “SQL Server” in its name, but if you were to apply that rule strictly you’d exclude Azure SQL Database. Aha, you say, that would be silly because the technology behind Azure SQL Database is so closely related to on-prem SQL Server and people who use the former will almost certainly use the latter. But in the same way Power Pivot is SQL Server Analysis Services running inside Excel – it’s more or less the same code base, the development experience is the same, the development team is the same, and the DAX language is identical in both SSAS Tabular and Power Pivot. The SSRS team gave birth to Power View and a lot of MS people closely associated with SSIS now work on Power Query. The worlds of corporate BI and self-service BI are drawing ever closer and soon it will be difficult to say where one starts and the other ends. A lot of SQL Server BI consultants and companies (like me) are now using and building Power BI solutions in addition to the SQL Server BI stack. This is why PASS is right to concern itself with Power BI and self-service BI: it’s a big deal for a lot of PASS community members.

What’s more, Microsoft is working on a number of new product that may have no technological link to SQL Server but will be used and championed by members of the PASS community. Azure Document DB and Azure Machine Learning are two good examples. To me it’s right that PASS takes an interest in these products even if other technical communities do so as well. Anyone with a background in SQL Server – anyone with a background in managing and analysing data with Microsoft’s traditional data platform – will have a natural interest in Microsoft’s new tools for managing and analysing data.

The argument against what’s happening seems to be that it means PASS is no longer focused solely on SQL Server, that it will end up spreading itself too thin. The people making this argument, as far as I can see, are from the DBA side of the SQL Family who equate “SQL Server” with the SQL Server relational database. That’s an easy assumption to make for those who only work with the relational engine. It’s true that “SQL Server” is the name of the relational engine, but “SQL Server” has also meant more than the relational engine for a long time: SQL Server is a suite of tools, and the relational engine is just one part of it. I’ve been a SQL Server MVP for nine years and have almost nothing to do with the relational engine. I started working with OLAP Services with SQL Server 7 and spoke at my first PASS event at (I think) the 2005 PASS European Conference in Munich. I haven’t been involved in the SQL Server community as long as some people, I know, but I feel like I’ve been part of it for long enough for my definition of what “SQL Server” is to carry some weight. As I’ve just argued, SQL Server Analysis Services, the tool that I have worked with for almost all of my professional career, has found a new expression in Power Pivot (even while traditional SSAS still exists) and so I think it is right for PASS to follow me as my career changes. This isn’t PASS losing its focus; this is PASS keeping its focus on what SQL Server has become and what its members are working with. I’m not leaving the world of SQL Server behind. SQL Server itself is changing and I’m changing with it.

Nor do I see any evidence of this stopping PASS from providing support and education for DBAs and those who work exclusively with the relational engine. As the PASS community has grown, as the PASS Summit adds yet more tracks, as other events like SQLBits and SQL Saturdays have taken off, I see that the economies of scale that come from the wider definition of “SQL Server” to include the relational engine and the BI tools have provided even more opportunities for those who are focused on the relational engine alone to learn and grow. Each sub-community may have a smaller slice of the pie, but the pie itself is much larger than it would be if we all went our separate ways.

This brings me on to the topic of the BA Conference. Andy Warren’s assertion, in the blog post I linked to above, that the BA Conference should have been the BI Conference is an opinion I hear repeated a lot and one I disagree with strongly. Before I do get onto that topic, though, I want to be clear that I’m not picking on Andy in particular – I’m engaging with him because he puts his side of the argument in a clear, thoughtful way which I respect (this more recent post is a great example of why I admire him and his thoughts on PASS). Unfortunately this particular discussion risks being turned into a very unpleasant argument, and I hope the PASS Board takes action to stop this happening. Twitter is where this is most obvious: it’s easy to be angry in 140 characters, less easy to have a nuanced debate. The tradition of heckling by certain members of the DBA community on Twitter during PASS Summit keynotes while BI features are announced is something that particularly p*sses me off: I feel like it’s rude and disrespectful to the large number of people in the audience who are interested in these features, and to the people on stage announcing them. Let’s stop this kind of thing, it can only be divisive.

OK, back to the topic of the BA Conference. When it was first announced I wrote a post entitled ”The PASS Business Analytics Conference is not the PASS Business Intelligence Conference” but I think my arguments here need to be clearer. First of all, I do not agree that BI content should be split off from the PASS Summit into its own conference. Some DBAs are not interested in BI; some BI pros like me are not interested in relational engine topics. However we are in the minority. I know from my friends and customers that a lot of people who work in the SQL Server community now deal with both the relational engine and some or all of the BI tools, and having the PASS Summit cover both BI and DBA topics at the same conference makes it appeal to a much wider audience than it would  if it concentrated on just one or the other. Indeed, part of the reason why I felt disappointed with this year’s PASS BA Conference, and why I felt it was a bit of a failure, was because it had fallen into the trap of being a PASS BI Conference.

So why then should PASS have a BA Conference, and why is BA<>BI? Let me give you an example. Last week I ran a private SSAS Tabular/DAX training course for a large company in London. The people on the course were traditional corporate BI developers, smart guys, the kind of people I have worked with all my career. This week I’m going back to the same company to deliver training on Power Pivot, DAX and Power BI to the business analytics team of the same company. A lot of technical topics I’ll be covering will be the same but the people attending will be very different. They are equally smart guys who work outside the IT department, but in close association with the IT department, analysing data to answer business questions. They don’t have the kind of technical background that someone in the IT department might have but they do use many of the same tools and technologies (SSAS as a user; Power Pivot and Power BI; Excel). They ask the business questions and they then work together with the IT guys to find the answer. This is the BA community.

A BA community has always existed within the Microsoft ecosystem, originally around Excel and Access, but with the advent of Power BI and the wider self-service BI and Big Data movements it has grown in size and confidence recently. Evidence of this growth includes the “New Excel” user groups that Rob Collie is involved in organising, and the London Business Analytics User Group that Mark Wilcock organises. Should PASS reach out to this community? Yes, I think so. There is a clear overlap between the technologies that the BI side of the PASS community uses and those that the Microsoft BA community is using. Also, I think the BA community can benefit from the passion, experience and infrastructure built up by PASS. As far as I can see, the BA community is also happy to collaborate and engage with the PASS BI community too. In the past few years I’ve made a lot of new friends in the Excel world, for example Excel MVPs like Bob Phillips, who have this attitude. BI pros and BA pros are now working side by side in the office, so it makes sense that we come together as a professional association.

The BA community is not, however, interested in being swallowed up by an organisation that identifies itself purely with the IT department. It is interested in joining an organisation that has IT focused people as a major component but which also recognises the importance of collaboration between IT and the business to work with data. This is why the PASS name change, and especially a separate BA conference, is important. DBAs will only ever want to attend the PASS Summit. SQL Server generalists, and BI pros, will also want to attend the PASS Summit. Some BI pros will want to attend the BA Conference as well as or maybe instead of the PASS Summit. BA pros will only want to attend the BA Conference. BI pros like me will be presenting on topics like Power BI alongside Excel pros at the BA Conference; this is where the overlap between the existing BI community inside PASS and the new BA community is obvious. There are also a lot of topics such as machine learning, R (which is a big part of Azure Machine Learning as well as a massive topic in its own right), Hadoop and pure Excel that do belong at the BA Conference and do not belong at the PASS Summit.

As I already said, I felt like this year’s second BA Conference was a bit of a step back from the first because it had too much pure BI content. Part of the reason for this was, I suspect, because PASS relied too heavily on its existing speaker community and did not make enough effort reach out to the BA community’s speakers. I believe the PASS Board has come to the same conclusion as me and this is why the BA conference next year will not have an open call for speakers. Brent Ozar did a valiant job of trying to imagine why this has taken place here; he falls a little wide of the mark with his explanation in my opinion, even if he’s dead right that the PASS Board need to make a clear and positive case for why they are doing this. Here’s my take. A large Microsoft BA community undoubtedly exists. There are a number of great speakers out there too, but they are not yet convinced that PASS is serious about engaging with the BA community so they are cautious about investing the time and money to speak at the BA conference. What’s more the economics of the BA community are very different from the economics of the traditional PASS community. In the world of SQL Server there is a virtuous circle that allows someone like me (or Brent, and any number of other well-known speakers and trainers and consultants) to invest our own time and money in attending the PASS Summit, SQLBits and SQL Saturdays and so on because we know that it strengthens the community and therefore benefits our own businesses, which are based on high-rate consultancy and training. I’m not saying that we do this cynically – we are genuinely motivated by a love of the community, it’s just that what benefits the wider community benefits us as well. Unfortunately this virtuous circle does not exist (yet) in the BA community. Potential speakers in the BA community are more likely to be full-time employees, or academics, or people like Mr Excel who make a living from selling their content to a mass market, rather than from a small number of customers who pay high rates. They may not be able to afford the time and money to come and speak, and giving away their content for free at the BA Conference might be detrimental to their business. If PASS is going to make the BA Conference work then it will have to approach these potential speakers individually, convince them that the conference is worth their while to attend, and probably pay travel expenses and maybe also pay them a rate. Let’s not forget that this is normal practice in many other technical communities, and I know that top speakers have in the past been paid quite handsomely to speak at TechEd.

The last thing I want to say is that it could be argued that PASS is changing its name and focus because Microsoft wants it to, and indeed that it depends on Microsoft so much that it has to do Microsoft’s bidding. I don’t accept this argument though. PASS should support the interests of its members regardless of the whims of the marketing folks at Microsoft. My point is that I believe that the changes that PASS has made have been for the right reasons, because they are in the interests of a large number of people like me from the BI pro community who have been part of PASS for a long time. The forces that Microsoft is responding to in the wider world of data are the ones that PASS itself must respond to.

I think this post has gone on for quite long enough, and if you’ve read this far then I’m very grateful. I expect that I’ll provoke yet more debate by what I’ve said here, but I also think it needed to be said because the arguments so far have been one-sided. I don’t pretend to speak for the PASS Board or have any more inside information than anyone else already involved in this discussion. However I do believe that the more we talk about these issues, the more likely we are to understand each other’s points of view and come to some kind of agreement that we are all happy with.

Written by Chris Webb

September 22, 2014 at 11:06 am

Posted in PASS

Handling Data Source Errors In Power Query

with 5 comments

Recently I was asked an interesting question by Ondra Plánička in the comments of a blog post: how can you handle errors caused by unavailable or missing data sources in Power Query?

Let’s imagine you are loading a csv file like this one into Excel using Power Query:

image

The M query generated by Power Query will be as follows:

let

    Source = Csv.Document(File.Contents("C:\Users\Chris\Documents\SampleData.csv")

                ,null,",",null,1252),

    #"First Row as Header" = Table.PromoteHeaders(Source),

    #"Changed Type" = Table.TransformColumnTypes(#"First Row as Header"

                ,{{"Month", type text}, {"Product", type text}, {"Sales", Int64.Type}})

in

    #"Changed Type"

 

If you load into the Excel Data Model you’ll see the following in the Power Pivot window:

image

So far so good. But what happens if you try to refresh the query and the csv file is not there any more? The query refreshes but you will see the following in the Power Pivot window:

image

The structure of the table that has been loaded has changed: instead of three columns you get just one, containing the error message. This wipes any selections in Excel PivotTables that are based on this table; they will need to be recreated when the source file is available once again. Similarly, any relationships between this table and other tables in the Excel Data Model get deleted and have to be added again manually when the source file is there again. Not good.

Here’s how to alter the query so that it handles the error more gracefully:

let

    //This is the original code generated by Power Query

    Source = 

      Csv.Document(File.Contents("C:\Users\Chris\Documents\SampleData.csv"),null,",",null,1252),

    #"First Row as Header" = Table.PromoteHeaders(Source),

    #"Changed Type" = Table.TransformColumnTypes(#"First Row as Header",

      {{"Month", type text}, {"Product", type text}, {"Sales", Int64.Type}}),

    //End of original code

    //Define the alternative table to return in case of error    

    AlternativeOutput=#table(type table [Month=text,Product=text,Sales=Int64.Type],

      {{"Error", "Error", 0}}),

    //Does the Source step return an error?

    TestForError= try Source,

    //If Source returns an error then return the alternative table output

    //else return the value of the #"Changed Type" step

    Output = if TestForError[HasError] then AlternativeOutput else #"Changed Type"

in

    Output

 

While the code from the original query remains intact, the following extra steps have been added:

  • The AlternativeOutput step returns a table (defined using #table) that has exactly the same columns as the csv file. This table has one row containing the text “Error” in the two text columns and 0 in the Sales column.
  • The TestForError step uses a try to see whether the Source step returns an error (for example because the file is missing)
  • The Output step checks to see whether TestForError found an error – if it does, it returns the table defined in the AlternativeOutput step, otherwise it returns the contents of the csv file as returned by the #”Changed Type” step.

Now when you run the query and the csv file is missing, then you see the following in the Power Pivot window:

image

Because this table has the same structure as the one the query returns when the csv file is present, any PivotTables connected to this table will still retain their selections and the relationships in the Excel Data Model are left intact. This means that when the csv file is back in its proper place everything works again with no extra work required.

You can download the example workbook and csv file here.

Written by Chris Webb

September 18, 2014 at 9:17 pm

Posted in Power Query

Counting Customers Who Have Bought All Selected Products

leave a comment »

Market basket analysis is a fairly well understood problem for SSAS – it’s best handled using many-to-many relationships, as shown in the excellent The Many-to-Many Revolution  white paper. However the other week I was asked a question that was an interesting variation on this problem, and which shows up one of the limitations of the many-to-many approach.

Let me show you an example. The following PivotTable on the Adventure Works cube shows the number of distinct customers who bought products in the product category Bikes:

image

The Customer Count measure is a distinct count measure, the Country hierarchy from the Customer dimension is shown on rows, and the slicer here shows the Category hierarchy from the Product dimension. If you select Bikes and Clothing in the slicer, like so:

image

… the measure values are larger because what you’re seeing now is the number of distinct customers who bought products in either the category Bikes or the category Clothing.

But what if you want to see the number of customers who bought Bikes AND Clothing? Well, setting up a many-to-many relationship will give you this (and a lot more useful stuff too) but there are some limitations with this approach:

  • Adding the m2m relationship and the extra measure groups it needs increases processing time
  • You have to select your two product categories from two different (but identical) hierarchies
  • You can only select two product categories – if you want to add a third to your AND filter you need to add a third m2m relationship, and so on. It’s not very flexible.

It is possible to get around these issues with a pure MDX approach, and apply an AND filter that works with however many product categories that the user selects.

The key to understanding how this works is to see how you would write an MDX calculated member that does an AND filter on two product categories:

WITH

MEMBER MEASURES.[Bikes And Clothing] AS

COUNT(

EXISTING 

NONEMPTY(

NONEMPTY(

[Customer].[Customer].[Customer].MEMBERS,

([Measures].[Internet Sales Amount],[Product].[Category].&[3])),

([Measures].[Internet Sales Amount],[Product].[Category].&[1]))

)

SELECT

{MEASURES.[Bikes And Clothing]}

ON 0,

NON EMPTY

[Customer].[Country].[Country].MEMBERS

ON 1

FROM

[Adventure Works]

image

Here I’ve used two nested NonEmpty() functions to filter the set of all customers, first to get the ones that bought Clothing, then to filter these customers again to get the ones that bought Bikes; there’s also an EXISTING there to get only the ones in the current country.

Once again, to add more product categories to the AND filter you need to add more nested NonEmpty() functions… which means you need a way to dynamically generate the code, which of course you can do using StrToSet(). Now normally I avoid using StrToSet() inside MDX calculations because it can cause serious performance problems but in this case it’s the best choice.

The following named set and calculated measure show how to solve the problem for Adventure Works:

CREATE DYNAMIC SET [Selected Product Categories] as  

EXISTING [Product].[Category].[Category].MEMBERS;

 

CREATE MEMBER CURRENTCUBE.MEASURES.[Distinct Customers with all selected Product Categories] AS

IIF(

[Measures].[Internet Sales Amount]=0,

NULL,

COUNT(

EXISTING

STRTOSET(

GENERATE(

[Selected Product Categories],

"NONEMPTY(", "")

+ "[Customer].[Customer].[Customer].MEMBERS" +

GENERATE(

[Selected Product Categories],

", ([Measures].[Internet Sales Amount], " +

[Selected Product Categories].CURRENT.UNIQUENAME

+ "))", "")

)

)

);

The dynamic named set is there to help work out what is selected in the slicer in my Excel worksheet, as shown in Mosha’s old post here. The calculated measure is where all the interesting stuff happens: it uses the ability of the Generate() function to iterate over a set (in this case the dynamic named set), evaluate a string expression and concatenate these strings. The output of this is a series of nested NonEmpty()s, which then goes to StrToSet() to be evaluated, and then the contents of that set are counted.

image

This technique does not replace using a m2m relationship, because it will not allow you to show a query with product category on rows and columns and the number of customers who bought each combination shown. However, if all you want to do is show the number of customers who have bought one specific combination, this has a lot of advantages.

For all of your MDX fans out there, here’s another approach I came up with which doesn’t perform quite as well but is so much fun I had to share it:

CREATE MEMBER CURRENTCUBE.MEASURES.[Distinct Customers with all selected Product Categories V2] AS

IIF(

[Measures].[Internet Sales Amount]=0,

NULL,

COUNT(

UNION(

INTERSECT(EXISTING [Customer].[Customer].[Customer].MEMBERS AS MYCUSTOMERS, {}),

GENERATE(

[Selected Product Categories],

IIF(

[Selected Product Categories].CURRENT.ITEM(0) IS TAIL([Selected Product Categories]).ITEM(0),

NONEMPTY(MYCUSTOMERS, [Measures].[Internet Sales Amount]),

INTERSECT(NONEMPTY(MYCUSTOMERS, [Measures].[Internet Sales Amount]) AS MYCUSTOMERS, {})

)

)

)

)

);

I’m using Generate() here again, but this time I’m using it with inline named sets (which are even worse for performance than StrToSet) in a way similar to this classic post. The theory is the same though: it’s a way of calling NonEmpty() multiple times over a set with an arbitrary number of items in.

Here’s what the calculation does:

  • Inside the first Intersect() I’m declaring an inline named set called MYCUSTOMERS, then doing the Intersect() with an empty set to return… and empty set. But I’ve managed to declare my inline named set, which is the point here.
  • I’m then iterating over the same dynamic named set shown in my earlier example using Generate(), and:
    • For all but the last product category in that set, I’m doing the NonEmpty() on the contents of MYCUSTOMERS on the current product category and then overwriting the contents of MYCUSTOMERS with the output. However, for these iterations of Generate() I’m returning an empty set.
    • For the last product category in the set I’m actually returning the output of NonEmpty() over MYCUSTOMERS for the current product category. The reason I’m only returning something other than an empty set on the last iteration of Generate() is that Generate() returns the union of all the sets returned by each iteration, and that would give me the wrong results! I only want to output the set from the final iteration.

Written by Chris Webb

September 12, 2014 at 9:30 am

Posted in Analysis Services, MDX

Power Pivot / Power Query Read-Only Connection Problems In Excel 2013 – And What To Do About Them

with 8 comments

Anyone who has tried to do any serious work with Power Pivot and Power Query will know about this problem: you use Power Query to load some tables into the Data Model in Excel 2013; you make some changes in the Power Pivot window; you then go back to Power Query, make some changes there and you get the dreaded error

We couldn’t refresh the table ‘xyz’ from the connection ‘Power Query – xyz’. Here’s the error message we got:

COM Error: Microsoft.Mashup.OleDbProvider; The query ‘xyz’ or one of its inputs was modified in Power Query after this connection was added. Please disable and re-enable loading to the Data Model for this query..

image

This post has a solution for the same problem in Excel 2010, but it doesn’t work for Excel 2013 unfortunately. There is a lot of helpful information out there on the web about this issue if you look around, though, and that’s why I thought it would be useful to bring it all together into one blog post and also pass on some hints and tips about how to recover from this error if you get it. This is the single biggest source of frustration among the Power Query users I speak to; a fix for it is being worked on, and I hope it gets released soon.

Problem Description

Why does this problem occur? Let’s take a simple repro.

  1. Import the data from a table in SQL Server using Power Query. Load it into the Excel Data Model.
  2. Open the PowerPivot window in Excel, then create measures/calculated fields, calculated columns, relationships with other tables as usual.
  3. Go back to the worksheet and build a PivotTable from data in this table, using whatever measures or calculated columns you have created.
  4. Go back to the PowerPivot window and rename one of the columns there. The column name change will be reflected in the PivotTable and everything will continue to work.
  5. Re-open the Power Query query editor, and then rename any of the columns in the table (not necessarily the one you changed in the previous step). Close the query editor window and when the query refreshes, bang! you see the error above. The table in the Excel Data Model is unaffected, however, and your PivotTable continues to work – it’s just that now you can’t refresh the data any more…
  6. Do what the error message suggests and change the Load To option on the Power Query query, unchecking the option to load to the Data Model. When you do this, on the very latest build of Power Query, you’ll see a “Possible Data Loss” warning dialog telling you that you’ll lose any customisations you made. Click Continue, and the query will be disabled. The destination table will be deleted from your Excel Data Model and your PivotTable, while it will still show data, will be frozen.
  7. Change the Load To option on the query to load the data into the Excel Data Model again. When you do this, and refresh the data, the table will be recreated in the Excel Data Model. However, your measures, calculated columns and relationships will all be gone. What’s more, although your PivotTable will now work again, any measures or calculated columns you were using in it will also have gone.
  8. Swear loudly at your computer and add all the measures, calculated columns and relationships to your Data Model all over again.

So what exactly happened here? The important step is step 4. As Miguel Llopis of the Power Query team explains here and here, when you make certain changes to a table in the Power Pivot window the connection from your Power Query query to the Excel Data Model goes into ‘read-only’ mode. This then stops Power Query from making any subsequent changes to the structure of the table.

What changes put the connection to the Excel Data Model in ‘read-only’ mode?

Here’s a list of changes (taken from Miguel’s posts that I linked to above) that you can make in the PowerPivot window that put the connection from your query to the Data Model into ‘read-only’ mode:

  • Edit Table Properties
  • Column-level changes: Rename, Data type change, Delete
  • Table-level changes: Rename, Delete
  • Import more tables using Power Pivot Import Wizard
  • Upgrade existing workbook

How can you tell whether my connection is in ‘read-only’ mode?

To find out whether your connection is in ‘read-only’ mode, go to the Data tab in Excel and click on the Connections button. Then, in the Workbook Connections dialog you’ll see the connection from Power Query to the Data Model listed – it will be called something like ‘Power Query – Query1’ and the description will be ‘Connection to the Query1 query in the Data Model’. Select this connection and click on the Properties button. When the Connection Properties dialog opens, go to the Definition tab. If the connection is in read-only mode the properties will be greyed out, and you’ll see the message ‘Some properties cannot be changed because this connection was modified using the PowerPivot Add-In’. If you do see this message, you’re already in trouble!

image

How to avoid this problem

Avoiding this problem is pretty straightforward: if you’re using Power Query to load data into the Excel Data Model, don’t make any of the changes listed above in the PowerPivot window! Make them in Power Query instead.

How to recover from this problem

But what if your connection is already in ‘read-only’ mode? There is no magic solution, unfortunately, you are going to have to rebuild your model. However there are two things you can do to reduce the amount of pain you have to go through to recreate your model.

First, you can use the DISCOVER_CALC_DEPENDENCY DMV to list out all of your measure and calculated column definitions to a table in Excel. Here’s some more information about the DMV:

http://cwebbbi.wordpress.com/2011/09/17/documenting-dependencies-between-dax-calculations/

To use this, all you need to do is to create a DAX query table in the way Kasper shows at the end of this post, and use the query:

select * from $system.discover_calc_dependency

Secondly, before you disable and re-enable your Power Query query (as in step 6 above), install the OLAP PivotTable Extensions add-in (if you don’t already have it) and use its option to disable auto-refresh on all of your PivotTables, as described here:

http://olappivottableextend.codeplex.com/wikipage?title=Disable%20Auto%20Refresh&referringTitle=Home

http://www.artisconsulting.com/blogs/greggalloway/Lists/Posts/Post.aspx?ID=26

Doing this prevents the PivotTables from auto-refreshing when the table is deleted from the Data Model when you disable the Power Query query. This means that they remember all of their references to your measures and calculated columns, so when you have recreated them in your Data Model (assuming that all of the names are still the same) and you re-enable auto-refresh the PivotTables will not have changed at all and will continue to work as before.

[After writing this post, I realised that Barbara Raney covered pretty much the same material in this post: http://www.girlswithpowertools.com/2014/06/power-query-refresh-fails/ . I probably read that post when it was published and then forgot about it. I usually don't blog about things that other people have already blogged about, but since I'd already done the hard work and the tip on using OLAP PivotTable Extensions is new, I thought I'd post anyway. Apologies...]

Written by Chris Webb

September 8, 2014 at 9:30 am

Create Your Own Relationships Between Tables In The Excel Data Model With Power Query

with 14 comments

You probably know that, when you are importing data from multiple tables in SQL Server into the Excel Data Model in Excel 2013 using Power Query, Power Query will automatically create relationships between those tables in the Data Model. But did you know that you can get Power Query to do this for other data sources too?

Now wait – don’t get excited. I’ve known about this for a while but not blogged about it because I don’t think it works all that well. You have to follow some very precise steps to make it happen and even then there are some problems. That said, I think we’re stuck with the current behaviour (at least for the time being) so I thought I might as well document it.

Consider the following Excel worksheet with two tables in it, called Dimension and Fact:

image

If you were to load these two tables into the Excel Data Model, you would probably want to create a relationship between the two tables based on the FruitID column. Here are the steps to use Power Query to create the relationship automatically:

  1. Click inside the Dimension table and then, on the Power Query tab in the Excel ribbon, click the From Table button to create a new query.
  2. When the Query Editor window opens, right click on the FruitID column and select Remove Duplicates.
    image
    Why are we doing this when there clearly aren’t any duplicate values in this column? The new step contains the expression
    Table.Distinct(Source, {"FruitID"})
    …and one of the side-effects of using Table.Distinct() is that it adds a primary key to the table. Yes, tables in Power Query can have primary keys – the Table.AddKey() function is another way of doing this. There’s a bit more information on this subject in my Power Query book, which I hope you have all bought!
  3. Click the Close & Load to.. button to close the Query Editor, and then choose the Only Create Connection option to make sure the output of the query is not loaded anywhere and the query is disabled, then click the Load button. (Am I the only person that doesn’t like this new dialog? I thought the old checkboxes were much simpler, although I do appreciate the new flexibility on where to put your Excel table output)
    image
  4. Click inside the Fact table in the worksheet, click the From Table button again and this time do load it into the Data Model.
  5. Next, in the Power Query tab in the Excel ribbon, click the Merge button. In the Merge dialog select Dimension as the first table, Fact as the second, and in both select the FruitID column to join on.
    image
  6. Click OK and the Query Editor window opens again. Click the Close & Load to.. button again, and load this new table into the Data Model.
  7. Open the Power Pivot window and you will see that not only have your two tables been loaded into the Data Model, but a relationship has been created between the two:
    image

What are the problems I talked about then? Well, for a start, if you don’t follow these instructions exactly then you won’t get the relationship created – it is much harder than I would like. There may be other ways to make sure the relationships are created but I haven’t found them yet (if you do know of an easier way, please leave a comment!). Secondly if you delete the two tables from the Data Model and delete the two Power Query queries, and then follow these steps again, you will find the relationship is not created. That can’t be right. Thirdly, I don’t like having to create a third query with the Merge, and would prefer it if I could just create two queries and define the relationship somewhere separately. With all of these issues I don’t think there’s any practical use for this functionality right now.

I guess the reason I think the ability to create relationships automatically is so important is because the one thing that the Excel Data Model/Power Pivot/SSAS Tabular sorely lacks is a simple way to script the structure of a model. Could Power Query and M one day be the modelling language that Marco asks for here? To be fair to the Power Query team this is not and should not be their core focus right now: Power Query is all about data acquisition, and this is data modelling. If this problem was solved properly it would take a lot of thought and a lot of effort. I would love to see it solved one day though.

You can download the sample workbook for this post here.

Written by Chris Webb

September 2, 2014 at 10:06 am

Follow

Get every new post delivered to your Inbox.

Join 3,309 other followers