Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Importing Data From Multiple Log Files Using Data Explorer (Power Query)

with 17 comments

NOTE: This post was written before Data Explorer was renamed as Power Query. All of the content is still relevant to Power Query.

It’s only been two days since the official Preview release of Data Explorer and already the blog posts about it are coming thick and fast. Here are some of the more interesting ones that I’ve seen that show what’s possible with it:
http://sqlblog.com/blogs/jamie_thomson/archive/2013/02/28/traversing-the-facebook-graph-using-data-explorer.aspx
http://www.mattmasson.com/index.php/2013/03/access-the-windows-azure-marketplace-from-data-explorer/
http://community.altiusconsulting.com/best-oscar-winning-film-my-first-data-explorer-adventure/
http://www.spsdemo.com/blog/Lists/Posts/Post.aspx?List=c67861cd-a0d9-4ed8-9d9d-9b29652a516f&ID=371&Web=f74569c2-ae3f-42c6-a3fa-9f099dfaeb7f

Obviously I can’t let everyone else have all the fun, so I thought I’d show how you can use Data Explorer to import data from multiple files, clean it, load it into a single table and then report on it.

First of all, the data. Like all bloggers I have an unhealthy interest in my blog stats, and one of the ways I monitor the hits on this site is using Statcounter. I’m also a bit of a miser, though, so I only use their freebie service and that means that I only get to see stats on the last 500 site visits. How can I analyse this data then? Well, Statcounter allow you to download log data as a csv file, so at about 2:30pm I downloaded one file and at 8:30pm I downloaded another.

Now, the first cool thing to show about Data Explorer is that you can import and merge data from multiple files with the same structure if they’re in the same folder. With both of my files in a folder called Blog Logs, and Excel open, the first thing you need to do is to the Data Explorer tab and hit From File/From Folder:

image

The next step is to enter the name of the folder with the files in in the dialog:

image

With that done, a new Query screen appears with a list of the files in the folder:

image

You then need to hit the icon with the two down arrows and a horizontal line that I’ve highlighted in the screenshot above, next to the Content heading. This then shows the data in the files (obviously I’ve had to scrub out the sensitive data here):

image

You can then use the first row as the column headers:

image

Filter the data so that the row with the second set of column headers is removed (I wonder if there’s a way to do this automatically when importing multiple csv files?) by clicking on the Date and Time column and deselecting the value “Date and Time” as shown:

image

Right-click on each column you don’t want to import (such as IP Address) and selecting Hide:

image

Right-click on the Date and Time column and select Remove Duplicates to remove any records that appear in both log files (I’m assuming that there were no cases where two people hit a page at exactly the same date and time, which of course may not be completely correct):

image

And force the Date and Time column to be treated as a Date/Time type:

image

And bingo, you’re done. Here are all the steps in the import, all of which can be edited, deleted, reordered etc:

image

The data is then loaded into a table in a worksheet (though you can turn that off), and by clicking “Load to data model” in the Query Settings pane you can load the data into the Excel data model:

image

(NB I found some issues with loading date data into the data model and US/European date formats that I’ve reported here, but don’t forget this is beta software so there are bound to be problems like this)

You can build cool Power View reports using this data:

image

Or even explore it on a 3D map with GeoFlow:

image

Fun, isn’t it?

Written by Chris Webb

March 1, 2013 at 10:48 pm

17 Responses

Subscribe to comments with RSS.

  1. Just a quick question…what’s the fundamental functional difference between this and PowerPivot? Aside from the available data sources (and the product architecture) what would cause people to use one over the other?

    Andy

    March 6, 2013 at 7:01 pm

    • Hi Andy,

      Basically, Data Explorer is all about moving data from external data sources, transforming it, and bringing it into Excel, and also making that a repeatable process. PowerPivot is very different: it’s all about being able to do fast, ad-hoc analysis and complex reporting on the data. Although I highlighted the scenarios where they are interchangeable in this post, in a lot of ways they are very different tools.

      HTH,

      Chris

      Chris Webb

      March 6, 2013 at 7:08 pm

      • Thanks for the response Chris, best way to see similarities/differences is using it. And with that I’m off to run through your tutorial posts.

        Andy

        March 7, 2013 at 12:51 pm

  2. I tried to do the same thing but with Excel files. All workbooks in the folder have the same structure – only the data differs. New workbooks are regularily added to the list. I would like to combine specific sheets in each file. I cannot get the method that you describe to work; only the first file is imported. would you have any idea on how to solve this issue ? (Excel 2010)

    Bertrand

    Bertrand

    July 4, 2013 at 5:40 pm

    • Hi Bertrand, wouldn’t it be easier to use the ‘Import From Folder’ option? That might be easier.

      Chris Webb

      July 4, 2013 at 10:31 pm

      • This is what I tried to do, as presented in your post. The pb I have is that after the binarycombine step (when you hit the icon with the two down arrows), only the first Excel file appears.

        Bertrand

        July 5, 2013 at 10:14 am

      • In the Data Explorer window you might only see a sample of the data – what happens when the query actually executes in Excel?

        Chris Webb

        July 5, 2013 at 10:29 am

      • Same result. Only one file is processed. But things are getting worse: As I was making some more tests, I edited the values in one file, and now the query returns an error “DataFormat.Error: The input could not be recognized as a valid Excel document. Details: Binary”. Yet the files in my test could not be simpler !

        It seems that the Binary.Combine(Source[Content]) statement does not work with Excel files and that a different approach must be used.

        I have struggled to find a comprehensive documentation of the Data Explorer syntax, and I do not know whether loops are supported, but one could try to iterate through the table created by the Folder.Files intial statement. But this is beyond my current knowledge…

        Bertrand

        July 5, 2013 at 11:01 am

      • Strange. I’ve also seen some buggy behaviour on the current build when importing from multiple text files, so this could be related. I see you’ve asked this question on the Data Explorer forum – I think the dev team are going to be the best people to help you with this.

        Chris Webb

        July 5, 2013 at 11:07 am

      • FYI, I finally found a way to do what I wanted from the combination of a recent blog post of Devin Knight with yours. See http://tinyurl.com/lpwtc7a
        Chris, Devin, you made my day ! Thanks.

        Bertrand

        August 21, 2013 at 5:09 pm

    • I have the same problem as Bertrand. How did he solve the problem?
      See http://tinyurl.com/lpwtc7a

      Willem Eek

      August 29, 2013 at 7:15 pm

  3. DataFormat.Error: The input could not be recognized as a valid Excel document. Details: Binary
    I an getting the same error. Help

    DeAnna Humphrey

    September 24, 2013 at 3:09 am

  4. Does anyone know if the use of 32 bit excel and 32 bit power query will make a difference if your operating system is a 64 bit?

    DeAnna Humphrey

    September 24, 2013 at 3:10 am

    • No, it should not make any difference whether you use 32 bit or 64 bit Excel I think.

      Chris Webb

      September 24, 2013 at 12:36 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 2,867 other followers

%d bloggers like this: