Importing Data From Multiple Log Files Using Data Explorer
It’s only been two days since the official Preview release of Data Explorer and already the blog posts about it are coming thick and fast. Here are some of the more interesting ones that I’ve seen that show what’s possible with it:
Obviously I can’t let everyone else have all the fun, so I thought I’d show how you can use Data Explorer to import data from multiple files, clean it, load it into a single table and then report on it.
First of all, the data. Like all bloggers I have an unhealthy interest in my blog stats, and one of the ways I monitor the hits on this site is using Statcounter. I’m also a bit of a miser, though, so I only use their freebie service and that means that I only get to see stats on the last 500 site visits. How can I analyse this data then? Well, Statcounter allow you to download log data as a csv file, so at about 2:30pm I downloaded one file and at 8:30pm I downloaded another.
Now, the first cool thing to show about Data Explorer is that you can import and merge data from multiple files with the same structure if they’re in the same folder. With both of my files in a folder called Blog Logs, and Excel open, the first thing you need to do is to the Data Explorer tab and hit From File/From Folder:
The next step is to enter the name of the folder with the files in in the dialog:
With that done, a new Query screen appears with a list of the files in the folder:
You then need to hit the icon with the two down arrows and a horizontal line that I’ve highlighted in the screenshot above, next to the Content heading. This then shows the data in the files (obviously I’ve had to scrub out the sensitive data here):
You can then use the first row as the column headers:
Filter the data so that the row with the second set of column headers is removed (I wonder if there’s a way to do this automatically when importing multiple csv files?) by clicking on the Date and Time column and deselecting the value “Date and Time” as shown:
Right-click on each column you don’t want to import (such as IP Address) and selecting Hide:
Right-click on the Date and Time column and select Remove Duplicates to remove any records that appear in both log files (I’m assuming that there were no cases where two people hit a page at exactly the same date and time, which of course may not be completely correct):
And force the Date and Time column to be treated as a Date/Time type:
And bingo, you’re done. Here are all the steps in the import, all of which can be edited, deleted, reordered etc:
The data is then loaded into a table in a worksheet (though you can turn that off), and by clicking “Load to data model” in the Query Settings pane you can load the data into the Excel data model:
(NB I found some issues with loading date data into the data model and US/European date formats that I’ve reported here, but don’t forget this is beta software so there are bound to be problems like this)
You can build cool Power View reports using this data:
Or even explore it on a 3D map with GeoFlow:
Fun, isn’t it?