Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Using PowerPivot to Analyse Windows Search Results

with 6 comments

My whole life is on my laptop hard drive, and as a consequence my laptop hard drive has a lot of stuff on it. Luckily, when I need to find something, there’s always Windows Search to help me out – but even so a single search query can return a lot of data. How can we make sense of all the data in the Windows Search Index? Well, we can use PowerPivot of course!

I first got the idea for doing this when I saw ‘Microsoft OLE DB Provider for Search’ in my list of OLE DB Providers; a quick look around on the net revealed the following useful sources of information about it:
http://msdn.microsoft.com/en-us/library/bb231256(v=VS.85).aspx
http://www.thejoyofcode.com/Using_Windows_Search_in_your_applications.aspx

So if you can query Windows Search using SQL, I thought, then I should be able to take the data that is returned from running one of these SQL queries and load it into PowerPivot. And after a lot of trial and error, I managed it – and it works rather well. Here’s how to do it…

First of all, you need to make sure you have the OLE DB Provider for Search installed. If you don’t, you need to download and install the Windows SDK. Then you can open up a new Excel workbook and open the PowerPivot window. Next you need to create a new connection by going to the Home tab, clicking the Get External Data/From Other Sources button, and then clicking the Others(OLEDB/ODBC) option.

image

Next, type in the following connection string:

Provider=Search.CollatorDSO;Extended Properties=Application=Windows

Do not try to click the Build button and select the provider from the list – when I did this, I got the error “Provider is no longer available. Ensure that the provider is installed properly”. Next, choose the “Write a query to specify the data to import” option (again, if you choose the “Select from a list of tables…” you’ll get an error) and enter your SQL query.

Here’s where the fun begins. From the two links above, you can see that there are loads of possibilities as to what you can query. Here’s a sample query that returns a list of all the files in the Documents folder on my c:\ drive and below, along with their file types, the folder path and the file size in bytes:

SELECT  
System.ItemTypeText,
System.Size, System.ItemFolderNameDisplay,
CAST(System.ItemFolderPathDisplay as DBTYPE_BSTR),
System.DateCreated,
System.DateAccessed,
System.FileExtension,
CAST(System.ItemName AS DBTYPE_BSTR) 
FROM SYSTEMINDEX
WHERE SCOPE=’file:C:\Users\Chris Webb\Documents’

Notice that I’ve had to cast some columns to DBTYPE_BSTR – I found that if I didn’t do this, the columns simply didn’t appear in the query results in PowerPivot. Other things you can do here in the query include searching for all items that include particular words or phrases, or which are above a certain size, or have a particular file extension.

With that done, you’re good to go. In a pivot table you can slice and dice all the data returned to your heart’s content. Here, for example, are the top five files with the .ABF extension (ie SSAS backup files) from my c:\ drive:

image

With a separate time dimension table joining to System.DateCreated you can do even more. Here’s the total size of files on my c:\ drive in bytes broken down by the year they were created:

image

You can also use the DAX time intelligence functionality. I added a running total calculation that shows the growth in the total size in MB of all files, over time, based on the creation date of each file. Here’s the formula:

=CALCULATE(SUM(Files[SYSTEMSIZE]), DATESBETWEEN(Time[Date], BLANK(), LASTDATE(Time[Date])))/1024/1024

This chart shows that running sum from November 2008, when I bought the laptop, to today:

image

There are plenty of tools out there that help you analyse this type of data but I doubt any of them can do what the PowerPivot/Excel combo can do. And it’s this kind of personal BI that PowerPivot makes easy. The only thing missing is an API which would allow you to build the SQL query used here dynamically: imagine having an interface where users could type their own search terms and then be able to analyse the results in PowerPivot at the click of a button. Hopefully PowerPivot will get an API soon. And as I’ve said before in the past, wouldn’t it be cool if Bing could do this kind of thing with web search results and data found on the web?

Written by Chris Webb

June 16, 2010 at 10:44 pm

Posted in PowerPivot

6 Responses

Subscribe to comments with RSS.

  1. very cool Chris!

    Frank

    June 21, 2010 at 6:53 pm

  2. Awesome post! I will definitely try this when I get home. I love home BI!

    Nicolas

    June 29, 2010 at 8:44 am

  3. Very interesting and nice use of the tools. Great job Chris.

    Dan

    June 30, 2010 at 7:38 pm

  4. One word. Awesome.

    Jamie Thomson

    October 26, 2010 at 1:33 pm

  5. it’s great job Chris
    I would like to know two things please:
    1. how do you come up with the full connection string? as you mentioned it’s not selected form drop down list.
    2. how do you write the Sql statement? I mean the name of windows fields search result.

    thanks

    Firas

    January 21, 2014 at 5:28 am

    • Doesn’t the connection string in the post work for you? Re the SQL, I just read the docs, but I wrote this post quite a long time ago…

      Chris Webb

      January 21, 2014 at 6:54 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 3,301 other followers

%d bloggers like this: