Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Create Your Own Relationships Between Tables In The Excel Data Model With Power Query

with 10 comments

You probably know that, when you are importing data from multiple tables in SQL Server into the Excel Data Model in Excel 2013 using Power Query, Power Query will automatically create relationships between those tables in the Data Model. But did you know that you can get Power Query to do this for other data sources too?

Now wait – don’t get excited. I’ve known about this for a while but not blogged about it because I don’t think it works all that well. You have to follow some very precise steps to make it happen and even then there are some problems. That said, I think we’re stuck with the current behaviour (at least for the time being) so I thought I might as well document it.

Consider the following Excel worksheet with two tables in it, called Dimension and Fact:

image

If you were to load these two tables into the Excel Data Model, you would probably want to create a relationship between the two tables based on the FruitID column. Here are the steps to use Power Query to create the relationship automatically:

  1. Click inside the Dimension table and then, on the Power Query tab in the Excel ribbon, click the From Table button to create a new query.
  2. When the Query Editor window opens, right click on the FruitID column and select Remove Duplicates.
    image
    Why are we doing this when there clearly aren’t any duplicate values in this column? The new step contains the expression
    Table.Distinct(Source, {"FruitID"})
    …and one of the side-effects of using Table.Distinct() is that it adds a primary key to the table. Yes, tables in Power Query can have primary keys – the Table.AddKey() function is another way of doing this. There’s a bit more information on this subject in my Power Query book, which I hope you have all bought!
  3. Click the Close & Load to.. button to close the Query Editor, and then choose the Only Create Connection option to make sure the output of the query is not loaded anywhere and the query is disabled, then click the Load button. (Am I the only person that doesn’t like this new dialog? I thought the old checkboxes were much simpler, although I do appreciate the new flexibility on where to put your Excel table output)
    image
  4. Click inside the Fact table in the worksheet, click the From Table button again and this time do load it into the Data Model.
  5. Next, in the Power Query tab in the Excel ribbon, click the Merge button. In the Merge dialog select Dimension as the first table, Fact as the second, and in both select the FruitID column to join on.
    image
  6. Click OK and the Query Editor window opens again. Click the Close & Load to.. button again, and load this new table into the Data Model.
  7. Open the Power Pivot window and you will see that not only have your two tables been loaded into the Data Model, but a relationship has been created between the two:
    image

What are the problems I talked about then? Well, for a start, if you don’t follow these instructions exactly then you won’t get the relationship created – it is much harder than I would like. There may be other ways to make sure the relationships are created but I haven’t found them yet (if you do know of an easier way, please leave a comment!). Secondly if you delete the two tables from the Data Model and delete the two Power Query queries, and then follow these steps again, you will find the relationship is not created. That can’t be right. Thirdly, I don’t like having to create a third query with the Merge, and would prefer it if I could just create two queries and define the relationship somewhere separately. With all of these issues I don’t think there’s any practical use for this functionality right now.

I guess the reason I think the ability to create relationships automatically is so important is because the one thing that the Excel Data Model/Power Pivot/SSAS Tabular sorely lacks is a simple way to script the structure of a model. Could Power Query and M one day be the modelling language that Marco asks for here? To be fair to the Power Query team this is not and should not be their core focus right now: Power Query is all about data acquisition, and this is data modelling. If this problem was solved properly it would take a lot of thought and a lot of effort. I would love to see it solved one day though.

You can download the sample workbook for this post here.

Written by Chris Webb

September 2, 2014 at 10:06 am

10 Responses

Subscribe to comments with RSS.

  1. Interesting post! I agree this could be useful for automation, though most other times I’ll prefer to create the relationship in Power Pivot.

    And I find the new Load To dialog less intuitive, but appreciate the additional flexibility and having it pointed out that I’m creating a connection in the workbook regardless of whether I load it.

    Your post reminds me of a tangentially related logical question I’ve been trying to answer for myself.

    I will often load from a particular SQL database two related tables, one with general ledger journal numbers and dates, the other with the journal detail (account numbers, amounts, transaction descriptions), and need only a particular date range and a particular account number range. I can filter each table on the respective attributes and load to the data model, but I end up loading a lot of extraneous data (i.e. journal records without the account range in question and detail records outside the date range).

    I know I can merge the tables but I am trying to avoid flattening. Although I’m not intimately familiar with SQL I’m sure there must be a relatively simple way to push the filtering to the database server. Is there a simple way to access that in Power Query?

    Meanwhile, I somehow missed that your book was released. That is exciting. I will be purchasing and hopefully will have a chance to dive into it ASAP.

    Jim

    September 2, 2014 at 12:16 pm

    • It would be possible in Power Query, but maybe not simple – you’d need to write some M code to do the filtering, which would not be difficult in itself, but you would have to be sure that you did it in such a way that Power Query generated SQL to do the filtering rather than grabbing all the data and doing the filtering on the desktop. When you get my book look for the section on “Query Folding” and you’ll see what I mean; this post covers similar ground: http://cwebbbi.wordpress.com/2014/06/04/join-conditions-in-power-query-part-2-events-in-progress-performance-and-query-folding/

      Chris Webb

      September 2, 2014 at 12:41 pm

      • Just read the post and will check out that section of the book. This gets me going in the right direction. Thanks.

        Jim

        September 2, 2014 at 2:37 pm

  2. it’s not just you, we don’t like the new load interface either, along with some of the other interface changes that is made every month
    as long as it is for the better I suppose we can’t complain

    Pieter

    September 3, 2014 at 9:13 am

  3. Hi, thanks for the awesome post.

    But I still have some problems with defining relationships in Power Query/Power Pivot.

    I am getting data from the REST api on the web. It is something about 5 dimensional tables and 1 fact table.
    Everything worked fine, I loaded everything to Power Pivot and set the relationships there.

    But problem is when some error occurs during the refresh of the query. Then PQ will load a table to power pivot with one row containing the description of the error, so every relationship set in power pivot is lost.

    Because of this, the report is unusable, because even if I will refresh data again and everything will load correctly, I dont have relationships between tables.

    Is there some kind of workaround?

    Maybe something like

    let
    Source = try Json.Document(Web.Contents(“http://bla.service.net/api/Tickets/GetTickets”))
    in
    Result = if Source[HasError] then /*Generate table with same columns as Source with no data*/ else /* Continue working with this Source*/

    Thanks for your answer

    Ondra Plánička

    September 3, 2014 at 12:55 pm

    • I see exactly what your problem is, and it’s a good question! My suggestion is this: use a try…otherwise statement to catch the error, and if the error is present return an empty table that has exactly the same structure as the one you are expecting. You can return an empty table using the Table.FromRows() function. This is a good topic for a future blog post I think!

      Chris Webb

      September 3, 2014 at 1:22 pm

      • So I tried this and it seems to work:

        let
        ErrorTable = Table.FromRecords({[Name="Error", IDTicketSummaryState=1, Descritpion="Error"]}),
        Source = try Web.Page(Web.Contents(“http://bla.service.net/Help/ResoursceModel?modelName=TicketSummaryEnum”)),
        #”Changed Type” = Table.TransformColumnTypes(Source[Value],{{“Name”, type text}, {“Value”, Int64.Type}, {“Description”, type text}}),
        #”Reorder Columns” = Table.ReorderColumns(#”Changed Type”,{“Value”, “Name”, “Description”}),
        #”Renamed Columns” = Table.RenameColumns(#”Reorder Columns”,{{“Value”, “IDTicketSummaryState”}}),
        #”Changed Type1″ = Table.TransformColumnTypes(#”Renamed Columns”,{{“IDTicketSummaryState”, Int64.Type}, {“Name”, type text}, {“Description”, type text}}),
        Result = if Source[HasError] then ErrorTable else #”Changed Type1″,
        in
        Result

        But I think this could use more explanation in your blog post, many people can have problem with that.

        Maybe this should be a part of PQ, if there is some kind of error in query. dont load it to data model or excel and show user error message. That way you dont screw up the data model and user knows what happened.

        Same problem is with autentication. Imagine, that you are loading data from Odata feed with autentication, PQ prompts user to type login and password, he clicks cancel and PQ loads error message to data model. So all relationships are gone.

        What do you think about this problem?

        Thank you for helping me

        Ondra Plánička

        September 4, 2014 at 7:19 am

      • Great, I’m glad to see it works! I agree, this is something that should definitely be part of Power Query. I’ll write a blog post about this when I get the chance…

        Regarding authentication: maybe you can get around this with a multiple query approach? Here’s my idea. You have query A that connects to the OData source and may fail if the user does not enter the login and password; this query should be disabled so it does not load to either a table in the worksheet or the Data Model. You have query B that returns an empty table using Table.FromRecords() (thinking about it some more, using #table might be even more elegant); this query is also disabled. Then you have query C which uses query A as a source, but if query A returns an error returns the output of query B; the output of query C loads to the data model. I wonder if this will solve the problem? I’m not sure but it’s worth trying.

        Chris Webb

        September 4, 2014 at 3:00 pm

  4. […] I was asked an interesting question by Ondra Plánička in the comments of a blog post: how can you handle errors caused by unavailable or missing data sources in Power […]

  5. […] I was asked an interesting question by Ondra Plánička in the comments of a blog post: how can you handle errors caused by unavailable or missing data sources in Power […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 3,240 other followers

%d bloggers like this: