Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for the ‘PowerPivot’ Category

Power Pivot / Power Query Read-Only Connection Problems In Excel 2013 – And What To Do About Them

with 5 comments

Anyone who has tried to do any serious work with Power Pivot and Power Query will know about this problem: you use Power Query to load some tables into the Data Model in Excel 2013; you make some changes in the Power Pivot window; you then go back to Power Query, make some changes there and you get the dreaded error

We couldn’t refresh the table ‘xyz’ from the connection ‘Power Query – xyz’. Here’s the error message we got:

COM Error: Microsoft.Mashup.OleDbProvider; The query ‘xyz’ or one of its inputs was modified in Power Query after this connection was added. Please disable and re-enable loading to the Data Model for this query..

image

This post has a solution for the same problem in Excel 2010, but it doesn’t work for Excel 2013 unfortunately. There is a lot of helpful information out there on the web about this issue if you look around, though, and that’s why I thought it would be useful to bring it all together into one blog post and also pass on some hints and tips about how to recover from this error if you get it. This is the single biggest source of frustration among the Power Query users I speak to; a fix for it is being worked on, and I hope it gets released soon.

Problem Description

Why does this problem occur? Let’s take a simple repro.

  1. Import the data from a table in SQL Server using Power Query. Load it into the Excel Data Model.
  2. Open the PowerPivot window in Excel, then create measures/calculated fields, calculated columns, relationships with other tables as usual.
  3. Go back to the worksheet and build a PivotTable from data in this table, using whatever measures or calculated columns you have created.
  4. Go back to the PowerPivot window and rename one of the columns there. The column name change will be reflected in the PivotTable and everything will continue to work.
  5. Re-open the Power Query query editor, and then rename any of the columns in the table (not necessarily the one you changed in the previous step). Close the query editor window and when the query refreshes, bang! you see the error above. The table in the Excel Data Model is unaffected, however, and your PivotTable continues to work – it’s just that now you can’t refresh the data any more…
  6. Do what the error message suggests and change the Load To option on the Power Query query, unchecking the option to load to the Data Model. When you do this, on the very latest build of Power Query, you’ll see a “Possible Data Loss” warning dialog telling you that you’ll lose any customisations you made. Click Continue, and the query will be disabled. The destination table will be deleted from your Excel Data Model and your PivotTable, while it will still show data, will be frozen.
  7. Change the Load To option on the query to load the data into the Excel Data Model again. When you do this, and refresh the data, the table will be recreated in the Excel Data Model. However, your measures, calculated columns and relationships will all be gone. What’s more, although your PivotTable will now work again, any measures or calculated columns you were using in it will also have gone.
  8. Swear loudly at your computer and add all the measures, calculated columns and relationships to your Data Model all over again.

So what exactly happened here? The important step is step 4. As Miguel Llopis of the Power Query team explains here and here, when you make certain changes to a table in the Power Pivot window the connection from your Power Query query to the Excel Data Model goes into ‘read-only’ mode. This then stops Power Query from making any subsequent changes to the structure of the table.

What changes put the connection to the Excel Data Model in ‘read-only’ mode?

Here’s a list of changes (taken from Miguel’s posts that I linked to above) that you can make in the PowerPivot window that put the connection from your query to the Data Model into ‘read-only’ mode:

  • Edit Table Properties
  • Column-level changes: Rename, Data type change, Delete
  • Table-level changes: Rename, Delete
  • Import more tables using Power Pivot Import Wizard
  • Upgrade existing workbook

How can you tell whether my connection is in ‘read-only’ mode?

To find out whether your connection is in ‘read-only’ mode, go to the Data tab in Excel and click on the Connections button. Then, in the Workbook Connections dialog you’ll see the connection from Power Query to the Data Model listed – it will be called something like ‘Power Query – Query1’ and the description will be ‘Connection to the Query1 query in the Data Model’. Select this connection and click on the Properties button. When the Connection Properties dialog opens, go to the Definition tab. If the connection is in read-only mode the properties will be greyed out, and you’ll see the message ‘Some properties cannot be changed because this connection was modified using the PowerPivot Add-In’. If you do see this message, you’re already in trouble!

image

How to avoid this problem

Avoiding this problem is pretty straightforward: if you’re using Power Query to load data into the Excel Data Model, don’t make any of the changes listed above in the PowerPivot window! Make them in Power Query instead.

How to recover from this problem

But what if your connection is already in ‘read-only’ mode? There is no magic solution, unfortunately, you are going to have to rebuild your model. However there are two things you can do to reduce the amount of pain you have to go through to recreate your model.

First, you can use the DISCOVER_CALC_DEPENDENCY DMV to list out all of your measure and calculated column definitions to a table in Excel. Here’s some more information about the DMV:

http://cwebbbi.wordpress.com/2011/09/17/documenting-dependencies-between-dax-calculations/

To use this, all you need to do is to create a DAX query table in the way Kasper shows at the end of this post, and use the query:

select * from $system.discover_calc_dependency

Secondly, before you disable and re-enable your Power Query query (as in step 6 above), install the OLAP PivotTable Extensions add-in (if you don’t already have it) and use its option to disable auto-refresh on all of your PivotTables, as described here:

http://olappivottableextend.codeplex.com/wikipage?title=Disable%20Auto%20Refresh&referringTitle=Home

http://www.artisconsulting.com/blogs/greggalloway/Lists/Posts/Post.aspx?ID=26

Doing this prevents the PivotTables from auto-refreshing when the table is deleted from the Data Model when you disable the Power Query query. This means that they remember all of their references to your measures and calculated columns, so when you have recreated them in your Data Model (assuming that all of the names are still the same) and you re-enable auto-refresh the PivotTables will not have changed at all and will continue to work as before.

[After writing this post, I realised that Barbara Raney covered pretty much the same material in this post: http://www.girlswithpowertools.com/2014/06/power-query-refresh-fails/ . I probably read that post when it was published and then forgot about it. I usually don't blog about things that other people have already blogged about, but since I'd already done the hard work and the tip on using OLAP PivotTable Extensions is new, I thought I'd post anyway. Apologies...]

Written by Chris Webb

September 8, 2014 at 9:30 am

Create Your Own Relationships Between Tables In The Excel Data Model With Power Query

with 8 comments

You probably know that, when you are importing data from multiple tables in SQL Server into the Excel Data Model in Excel 2013 using Power Query, Power Query will automatically create relationships between those tables in the Data Model. But did you know that you can get Power Query to do this for other data sources too?

Now wait – don’t get excited. I’ve known about this for a while but not blogged about it because I don’t think it works all that well. You have to follow some very precise steps to make it happen and even then there are some problems. That said, I think we’re stuck with the current behaviour (at least for the time being) so I thought I might as well document it.

Consider the following Excel worksheet with two tables in it, called Dimension and Fact:

image

If you were to load these two tables into the Excel Data Model, you would probably want to create a relationship between the two tables based on the FruitID column. Here are the steps to use Power Query to create the relationship automatically:

  1. Click inside the Dimension table and then, on the Power Query tab in the Excel ribbon, click the From Table button to create a new query.
  2. When the Query Editor window opens, right click on the FruitID column and select Remove Duplicates.
    image
    Why are we doing this when there clearly aren’t any duplicate values in this column? The new step contains the expression
    Table.Distinct(Source, {"FruitID"})
    …and one of the side-effects of using Table.Distinct() is that it adds a primary key to the table. Yes, tables in Power Query can have primary keys – the Table.AddKey() function is another way of doing this. There’s a bit more information on this subject in my Power Query book, which I hope you have all bought!
  3. Click the Close & Load to.. button to close the Query Editor, and then choose the Only Create Connection option to make sure the output of the query is not loaded anywhere and the query is disabled, then click the Load button. (Am I the only person that doesn’t like this new dialog? I thought the old checkboxes were much simpler, although I do appreciate the new flexibility on where to put your Excel table output)
    image
  4. Click inside the Fact table in the worksheet, click the From Table button again and this time do load it into the Data Model.
  5. Next, in the Power Query tab in the Excel ribbon, click the Merge button. In the Merge dialog select Dimension as the first table, Fact as the second, and in both select the FruitID column to join on.
    image
  6. Click OK and the Query Editor window opens again. Click the Close & Load to.. button again, and load this new table into the Data Model.
  7. Open the Power Pivot window and you will see that not only have your two tables been loaded into the Data Model, but a relationship has been created between the two:
    image

What are the problems I talked about then? Well, for a start, if you don’t follow these instructions exactly then you won’t get the relationship created – it is much harder than I would like. There may be other ways to make sure the relationships are created but I haven’t found them yet (if you do know of an easier way, please leave a comment!). Secondly if you delete the two tables from the Data Model and delete the two Power Query queries, and then follow these steps again, you will find the relationship is not created. That can’t be right. Thirdly, I don’t like having to create a third query with the Merge, and would prefer it if I could just create two queries and define the relationship somewhere separately. With all of these issues I don’t think there’s any practical use for this functionality right now.

I guess the reason I think the ability to create relationships automatically is so important is because the one thing that the Excel Data Model/Power Pivot/SSAS Tabular sorely lacks is a simple way to script the structure of a model. Could Power Query and M one day be the modelling language that Marco asks for here? To be fair to the Power Query team this is not and should not be their core focus right now: Power Query is all about data acquisition, and this is data modelling. If this problem was solved properly it would take a lot of thought and a lot of effort. I would love to see it solved one day though.

You can download the sample workbook for this post here.

Written by Chris Webb

September 2, 2014 at 10:06 am

Using Slicer Selections In The CubeSet Function

with 8 comments

I had an interesting challenge from a customer yesterday – one of those problems that I’d known about for a long time but never got round to working out the solution for…

Consider the following PivotTable, based on a PowerPivot model using Adventure Works data, in Excel 2010:

image

It shows the top 10 products by the measure Sum of Sales. There are two slicers, and the top 10 shown in the PivotTable reflects the selections made in the slicers. All of this works fine. But what if you want to use Excel cube functions to do the same thing? You can write the MDX for the top 10 products quite easily and use it in the CubeSet() function in your worksheet, but how can you get your MDX set expression to respect the selection made in the slicers?

The solution to this problem is very similar to the trick I showed here – finding the selected items in a slicer is not easy! Here are the steps I followed to do it:

  • Add the slicers for EnglishOccupation and CalendarYear to a new worksheet
  • Go to Slicer Settings and uncheck the box for “Visually indicate items with no data”
  • Add two new PivotTables to the worksheet. Connect one to the EnglishOccupation slicer and put EnglishOccupation on rows; connect the other to the CalendarYear slicer and put CalendarYear on rows.
  • Use the OLAPPivotTableExtensions add-in (which you can download here) to add new MDX calculated measures to each PivotTable. For the EnglishOccupation PivotTable call the measure SelectedOccupations and use the following MDX:
    SetToStr(Except(Axis(0), {[Customer].[EnglishOccupation].[All]}))
    This expression does the following: it uses the Axis() function to find the set of members selected on what Excel thinks of as the rows axis in the PivotTable (actually the MDX columns axis), then uses Except() to remove the All Member from the hierarchy (which Excel uses for the Grand Totals) and then uses SetToStr() to take that set and return the string representation of it. Do the same thing for the PivotTable showing CalendarYear too, calling the calculated measure SelectedYears; the MDX in this case is:
    SetToStr(Except(Axis(0), {[Date].[CalendarYear].[All]}))
    This is what the EnglishOccupation PivotTable should look like:
    image
  • Next, to make things easy, use Excel formulas to get the values from the top cell inside each PivotTable into cells elsewhere in the worksheet, and give these cells the names SelectedOccupations and SelectedYears.
    image
  • Then enter a CubeSet() function into a new cell using the following formula:
    =CUBESET(
    "PowerPivot Data",
    "Topcount(
    [Product].[EnglishProductName].[EnglishProductName].members,
    10,
    Sum(" & SelectedOccupations & " * " & SelectedYears & ",[Measures].[Sum of Sales])
    )",
    "Top 10 Set")
    What this does is use the TopCount() function to find the top 10 Products, and in the third parameter of this function which is the numeric expression to find the top 10 by, it crossjoins the two sets of selected occupations and selected years and then sums the output of the crossjoin by the measure [Sum of Sales].
  • Last of all, build your report using the Excel cube functions as normal, using the CubeRankedMember() function to get each item from the top 10 set created in the previous step.

image

You can download my sample workbook here.

The bad news about this technique is that it doesn’t work in Excel 2013 and Power Pivot. It’s no longer possible to create MDX calculated measures on Power Pivot models in Excel 2013, alas. It will work if you’re using any version of Excel from 2007 on against Analysis Services and, as I show here, Excel 2010 and PowerPivot. If you are using Power Pivot and Excel 2013 it might be possible to create a DAX measure to do the same as the MDX I’ve used here (I’m wondering if the technique Jason describes here will work). It would certainly be possible to use CubeRankedMember() to find each item selected in the slicer, as Erik Svensen shows here, and then use Excel formulas to find the MDX unique name for each selected member and concatenate these unique names to create the set expression that my calculated measures return, but that’s a topic for another post. This really should be a lot easier than it is…

Written by Chris Webb

June 20, 2014 at 10:59 am

Point-In-Time Dimension Reporting In DAX

with 8 comments

Before I start, I have to state that the technique shown in this post isn’t mine but was developed by my colleague Andrew Simmans, who has very kindly allowed me to blog about it.

Over the last few months I’ve been working on an SSAS Tabular project that has not only presented some interesting modelling challenges, but has shown how DAX can offer some new and interesting solutions to these challenges. Consider the following scenario: a supermarket sells products, and we have a fact table showing sales of products by day. Here’s some sample data:

image

To complicate matters, each product has one product manager but product managers for particular products change from time to time. Normally this might be solved by adding the product manager name to the Product dimension table and implementing a Type 2 Slowly Changing Dimension. In this case, though, we want something slightly different: instead of seeing sales attributed to the product manager who was in charge of the product at the time of the sale, and therefore seeing sales for the same product attributed to different product managers on different dates, we want to attribute all sales for a product to a single product manager but be able to use a second date dimension to be able to determine the point in time, and therefore the product manager in charge of each product at that point in time, that we want to report as of. To put it another way, we want to be able to find the state of a dimension on any given date and use that version of the dimension to do our analysis.

For example, we have the following table showing which product manager was in charge of each product at any given point in time:

image

Between January 1st 2013 and January 3rd 2013 Jim was the product manager for Orange, but from January 4th 2013 onwards Rob took over as product manager for Oranges; Fred was the product manager for Apples the whole time. We want a PivotTable that looks like this when we choose to report as of January 2nd 2013:

image

Notice how Jim is shown as the product manager for Oranges. If we wanted to report using the managers as of January 5th 2013, we would want to see Rob shown as the product manager for Oranges like so:

image

The solution to this problem involves a combination of two DAX techniques that have already been blogged about quite extensively and which I’d encourage you to read up on:

  • Many-to-many relationships, in this case the solution developed by Gerhard Brueckl, described on his blog here.
  • ‘Between’ date filters, which I wrote about recently but which Alberto has recently improved on in his must-read white paper here.

Here are the table relationships I’ve used for the sample scenario:

image

I’ve added a second date table called ReportingDate which contains the same rows as the Date table shown above; note that it has no relationship with any other table.

This problem is very similar to a many-to-many relationship in that a product can have many managers across time, and a manager can have many products. Indeed we could model this as a classic many-to-many relationship by creating a bridge table with one row for each valid combination of product and manager for each possible reporting date; on my project, however, this was not a viable solution because it would have resulted in a bridge table with billions of rows in it. Therefore, instead of joining the ReportingDate table directly to the ProductManager table, we can instead filter ProductManager using the between date filter technique.

Here’s the DAX of the Sum of Sales measure used in the PivotTables show above:

Sum of Sales:=

IF(

HASONEVALUE(ReportingDate[ReportingDate]),

CALCULATE(

SUM(Sales[Sales]), 

FILTER(ProductManager, MIN(ReportingDate[ReportingDate])>=ProductManager[StartDate] 

&& 

IF(ISBLANK(ProductManager[EndDate]), TRUE(), 

MIN(ReportingDate[ReportingDate])<=ProductManager[EndDate])

))

, BLANK()

)

 

This is not necessarily the best way to write the code from a performance point of view but it’s the most readable – if you need better performance I recommend you read Alberto’s white paper. What I’m doing is this:

  • Only return a value if a single reporting date is selected
  • Filter the ProductManager table so only the rows where the selected reporting date is between the start date and the end date are returned, ie we only get the rows where a manager was in charge of a product on the reporting date
  • Use the filtered ProductManager table to filter the main fact table using the Calculate() function, in exactly the same way that you would with a many-to-many relationship

You can download my sample workbook here.

Written by Chris Webb

July 19, 2013 at 11:41 pm

Posted in DAX, PowerPivot, Tabular

A New Events-In-Progress DAX Pattern

with 25 comments

I’ve been working on a very complex SSAS Tabular implementation recently, and as a result I’ve learned a few new DAX tricks. The one that I’m going to blog about today takes me back to my old favourite, the events-in-progress problem. I’ve blogged about it a lot of times, looking at solutions for MDX and DAX (see here and here), and for this project I had to do some performance tuning on a measure that uses a filter very much like this.

Using the Adventure Works Tabular model, the obvious way of finding the number of Orders on the Internet Sales table that are open on any given date (ie where the Date is between the dates given in the Order Date and the Ship Date column) is to write a query something like this:

EVALUATE

ADDCOLUMNS (

    VALUES ( 'Date'[Date] ),

    "OpenOrders",

    CALCULATE (

        COUNTROWS ( 'Internet Sales' ),

        FILTER( 'Internet Sales', 'Internet Sales'[Ship Date] > 'Date'[Date] ),

        FILTER( 'Internet Sales', 'Internet Sales'[Order Date] <= 'Date'[Date] )

    )

)

ORDER BY 'Date'[Date]

On my laptop this executes in around 1.9 seconds on a cold cache. However, after a bit of experimentation, I found the following query was substantially faster:

EVALUATE

ADDCOLUMNS (

    VALUES ( 'Date'[Date] ),

    "OpenOrders",

    COUNTROWS(

        FILTER(

            'Internet Sales',

            CONTAINS(

                DATESBETWEEN('Date'[Date]

                    , 'Internet Sales'[Order Date]

                    , DATEADD('Internet Sales'[Ship Date],-1, DAY))

                , [Date]

                , 'Date'[Date]

            )

        )

    )

)

ORDER BY 'Date'[Date]

On a cold cache this version executes in just 0.2 seconds on my laptop. What’s different? In the first version of the calculation the FILTER() function is used to find the rows in Internet Sales where the Order Date is less than or equal to the Date on rows, and where the Ship Date is greater than the Date. This is the obvious way of solving the problem. In the new calculation the DATESBETWEEN() function is used to create a table of dates from the Order Date to the day before the Ship Date for each row on Internet Sales, and the CONTAINS() function is used to see if the Date we’re interested in appears in that table.

I’ll be honest and admit that I’m not sure why this version is so much faster, but if (as it seems) this is a generally applicable pattern then I think this is a very interesting discovery.

Thanks to Marco, Alberto and Marius for the discussion around this issue…

UPDATE: Scott Reachard has some some further testing on this technique, and found that the performance is linked to the size of the date ranges. So, the shorter your date ranges, the faster the performance; if you have large date ranges, this may not be the best performing solution. See https://twitter.com/swreachard/status/349881355900952576

UPDATE: Alberto has done a lot more research into this problem, and come up with an even faster solution. See: http://www.sqlbi.com/articles/understanding-dax-query-plans/

Written by Chris Webb

June 13, 2013 at 10:32 am

Comments And Descriptions In DAX

with 2 comments

With my Technitrain hat on I’m sitting in on Marco’s Advanced DAX course in London today, and the question of comments in DAX came up – which reminded me that this is something I’ve been meaning to blog about. DAX as a language supports comments, but unfortunately it’s not possible to add comments inside a DAX measure or calculated column expression in either PowerPivot or SSAS Tabular right now (which is where they’re most needed – I hope this changes in the future). That said, there are some other things you can do to add textual explanations and descriptions to your DAX measure code.

Before we get onto the workarounds, a quick word about comments in DAX. These can only be used in DAX queries, and the types of comment supported are the same as in MDX: double-dashes and double-forward-slashes for single line comments, and forward-slash-asterisk to start a multi-line comment and asterisk-forward-slash to close a multi-line comment. Here’s an example:

--single line comment

//another single line comment

/*a multi-line

comment*/

evaluate table1

 

What can be done with measures though? After all, that’s where the most complex DAX is usually written.

First of all, you can add a description to a measure by right-clicking on it in the measure grid and selecting Description:

image

image

Unfortunately this description is not easily accessible to end users anywhere (it would be great if it appeared as a tooltip in a PivotTable, for example) but it can be seen in an Excel worksheet by running a DMV query. DMV queries can be run in Excel 2013 in the same way as DAX queries, using a query table as described here; the DMV query to use is:

select 

measure_name as [Measure Name], [description], measure_is_visible 

from $system.mdschema_measures

 

image

Unfortunately all hidden and implicit measures are returned, and even when the table is filtered so that only measure_is_visible=true there are still a lot of measures that probably shouldn’t be shown.

Similarly, descriptions can be added to any column (calculated or not) in your model, again by right-clicking on it and selecting Description.

image

This description can be displayed in the worksheet using the following DMV query:

select

hierarchy_name as [Column Name], [description] as [Description] 

from $system.mdschema_hierarchies

where cube_name='model'

 

image

You can also write text direct to cells in the measure grid too. When I first saw a customer do this I was worried that it might not be supported, but I’ve been told that it is; so long as you don’t use the =: used for defining measures then you should be ok.

image

This is probably the best way to add comments to your code, if only because it’s the most visible to anyone looking at your PowerPivot/SSAS Tabular model. Of course, for it to be effective you’ll need to have a system for arranging your measures in the measure grid; in “SQL Server Analysis Services 2012: The BISM Tabular Model”, Marco, Alberto and I recommended that you arrange all your measures in the top-left hand corner of the measure grid and I think that’s still a good idea, but the use of text in cells to create headings for groups of measures as well as descriptions can help a lot too.

Written by Chris Webb

May 15, 2013 at 12:01 pm

Posted in DAX, PowerPivot, Tabular

Accumulating Data In An Excel Table Using Data Explorer (Power Query) and PowerPivot

with 5 comments

NOTE: This post was written before Data Explorer was renamed as Power Query. All of the content is still relevant to Power Query.

One of the first questions I get asked after showing someone PowerPivot for the first time is “Can I add new data to a PowerPivot table that already has data in it?”. Out of the box, of course, the answer is no: when you process a table in PowerPivot you have to reload all the data from your data source, you can’t just append new data (unless you’re using copy/paste to load data, which isn’t a good idea). However, there are a lot of self-service BI scenarios where the ability to do this would be extremely useful: for example, you might want to scrape stock quotes from a web page every day and then, in an Excel workbook, accumulate that data in a table so you can analyse historical stock prices with PowerPivot. I ran into a scenario very much like this last week and I thought that Data Explorer should be able to help here. It can, but it’s not obvious how to do it – hence this blog post!

Here’s a super-simple example of how to accumulate data in a table then. Let’s start with a csv file that contains the following data:

Product,Sales
Apples,1
Oranges,2

It’s straightforward to import this data into Excel using Data Explorer and the ‘From csv’ data source:

image

 

Here’s the code that Data Explorer generates:

let

    Source = Csv.Document(File.Contents("C:\InputData.csv")),

    FirstRowAsHeader = Table.PromoteHeaders(Source),

    ChangedType = Table.TransformColumnTypes(FirstRowAsHeader,

                              {{"Product", type text}, {"Sales", type number}})

in

    ChangedType

 

Now, let’s imagine that you want to keep the data from this file in Excel and every time you click Refresh in Data Explorer you add the data from the file onto the end of the existing data you’ve already captured. The first thing you’ll probably want to do in this scenario is add a new column to the data that gives the date and time that the data was loaded, and you can do that quite easily in Data Explorer using the DateTimeZone.UtcNow() function as follows:

Table.AddColumn(ChangedType, “Load Date”, each DateTimeZone.UtcNow())

Data Explorer has functionality to append the data from one query onto the end of another query, but the problem you have to solve now is that when you click Refresh you want the new data to be appended onto the end of the data that has already been collected. It’s a recursive scenario not unlike the one I grappled with here. The solution to this problem is to first of all load the data into the PowerPivot (ie what we should be calling the Excel Data Model now) by clicking on the Load To Data Model link in the Data Explorer query pane:

image

Then, on a new sheet, create an Excel query table that returns all the data from the PowerPivot table that you’ve just loaded data into. Kasper shows how to do this here; there’s no need for any special DAX, you just need to connect to the PowerPivot table in the Existing Connections dialog:

image

image

At this point you should have two tables on two sheets that contain the same data. The next step is to modify the original Data Explorer query so that it contains a new step that appends data from the table you’ve just created (ie the table getting the data from PowerPivot) onto the data from the csv file. This can be done with three new steps, first to get the data from the new Excel table:

Excel.CurrentWorkbook(){[Name="ExistingData"]}[Content]

Then to make sure the Load Date is treated as a DateTimeZone type:

Table.TransformColumnTypes(GetExistingData,{{“Load Date”, type datetimezone}})

Then finally to combine the two tables:

Table.Combine({ChangedType1,InsertedCustom})

Now, whenever you Refresh your Data Explorer query, you will see the data from the csv file appended to the data that has already been loaded:

image

image

Here’s the complete code:

let

    Source = Csv.Document(File.Contents("C:\InputData.csv")),

    FirstRowAsHeader = Table.PromoteHeaders(Source),

    ChangedType = Table.TransformColumnTypes(FirstRowAsHeader,

                  {{"Product", type text}, {"Sales", type number}}),

    InsertedCustom = Table.AddColumn(ChangedType, "Load Date", each DateTimeZone.UtcNow()),

    Custom1 = Excel.CurrentWorkbook(){[Name="Table_Input_Data"]}[Content],

    ChangedType1 = Table.TransformColumnTypes(Custom1,{{"Load Date", type datetimezone}}),

    Custom2 = Table.Combine({ChangedType1,InsertedCustom})

in

    Custom2

Now as I said, this is just a super-simple example and in the real world you’d need extra functionality to do things like delete rows you’ve already loaded and so on; but that’s all doable I think. It’s also worth mentioning that I encountered some strange errors and behaviour when implementing this, partly due to Data Explorer still being in preview I guess, so if you want to recreate this query you’ll need to follow my instructions exactly.

You can download the sample workbook here, and the csv file here.

Written by Chris Webb

May 13, 2013 at 12:40 pm

Follow

Get every new post delivered to your Inbox.

Join 3,190 other followers