Archive for the ‘PowerPivot’ Category
Self-Service BI Mapping with Microsoft Research’s Layerscape–Part 2
In my last post I showed how to load data from Excel into Layerscape, the new data visualisation tool from Microsoft Research; in the post before that I showed how to load UK weather data from Windows Azure Datamarket into PowerPivot. Now let’s take that weather data and plot it on a map!
When doing this, the first decision I made with this data was about what to show. The main problem is that the data volumes involved here are very close to the upper limit of what Layerscape is able to handle: in a few cases I crashed Layerscape, but that was when I was trying to load around 150000 rows of data into it; I found that just over 110000 rows of data was ok however. As a result I made the decision to only show data for cloud or rain, not sun (which is ok – if you want to visualise a clear day, you don’t want to show anything on a map I think) or mist or fog (which I was less happy about).
To achieve this I created a calculated column on my Three Hourly Forecast table in PowerPivot called Cloud with the following definition:
=if([SignificantWeatherId]>=7, TRUE, BLANK())
Then, after having created a flattened PivotTable with the data I wanted to display, I dropped the Cloud field into the Values box in my PivotTable and was able to filter it to only show Weather Stations and time periods where there was cloud:
I also created a few other calculated columns:
- ActualPredictionForStart and ActualPredictionForEnd: the former is mentioned in my previous post, and the latter returns a value three hours after the former:
=[Date] + (([TimeStep]+3)/24)
These two values represent the start time and the end time for each row in the Three Hourly Forecast table. - WeatherDepth: in the Significant Weather table there’s a distinction made between low-altitude cloud and high altitude cloud, and in LayerScape when you’re plotting data you can control how high off the ground a point is displayed, so the WeatherDepth column contains some fairly arbitrary numbers for cloud altitudes based on the [Code] column. Here’s the definition:
=SWITCH([Code], 5, 150, 6, 150, 7, 100, 8, 120, 10) - WeatherColor: again, in Layerscape you can control the colour of your points and their opacity, which again is useful for displaying dark/light and thin/thick clouds. The only compromise I had to make was to display dark clouds (ie where clouds are described as either black or where the weather is rainy or snowy) in blue rather than black, because I couldn’t get Layerscape to display black – white clouds are shown in white. To calculate WeatherColor I created two intermediate calculated columns on the Significant Weather table, InitialColor (which contains colours for weather that I’m not displaying in this demo – I thought dust storms should be shown in yellow for instance):
=SWITCH([Code], 1, "yellow", 2, "white", 3, "yellow", 4, "yellow", 5, "white", 6, "white", 7, "white", 8, "blue", "blue")
…and Opacity:
=SWITCH([Code], -99, 0, 0, 0, 1,10, 2, 25, 3, 10, 4, 100, 5, 25, 6, 75, 7, 50, 8, 50, 9, 60, 10, 60, 11, 60, 12, 65, 13, 75, 14, 75, 15, 80, 90)
…making the definition of WeatherColor:
=[Opacity] & "% " & [InitialColor] - Depth and Color: calculated columns on the Three Hourly Forecast table that copied the WeatherDepth and WeatherColor values down to the main fact table:
=RELATED(‘Significant Weather’[WeatherDepth])
=RELATED(‘Significant Weather’[WeatherColor])
This screenshot gives you some idea of the values that the depth and colour calculations return:
With this all done I was able to load the data into Layerscape in the way I showed in my previous post, tweak some of the settings for the markers and time decay, and come up with a nice-looking visualisation. The big difference in this case compared to my previous examples is that here we have time series data and Layerscape is able to show values changing over time. The only thing I needed to do to make this happen was to check the Time Series box in the Layers pane on the main screen; with this done I could show the data for a particular point in time or let Layerscape cycle through time showing how the forecast weather changed.
With that done, was able to save my demo as a very basic guided tour and upload it to the Layerscape site from where you can download it yourself :
http://www.layerscape.org/Content/Index/620
I also created a video showing the tour:
I’m quite happy to admit that this tour doesn’t show off the full capabilities of Layerscape (I’m not sure my graphics card is up to the job, frankly) and I’ve not taken much time to ensure that the visualisation is as accurate as it could be, but I’m still quite proud if it!
Self-Service BI Mapping with Microsoft Research’s Layerscape–Part 1
Sometimes you find a tool that is so cool, you can’t believe no-one else has picked up on it before. This is one of those times: a few month or so ago I came across a new tool called Layerscape (http://www.layerscape.org) from Microsoft Research which allows you to overlay data from Excel onto maps in Microsoft WorldWide Telescope (http://www.worldwidetelescope.org). “What is WorldWide Telescope?” I hear you ask – well, it’s basically Microsoft Research’s answer to Google Earth, although it’s not limited to the Earth in that it also contains images of the universe from a wide range of ground and space-based telescopes. It’s a pretty cool toy in its own right, but Layerscape – which seems to be aimed at academics, despite the obvious business uses – turns it into a pretty amazing BI visualisation tool.
Layerscape is very easy to use: it’s an Excel addin, and once you have it and WWT installed all you need to do is select a range of data in Excel to be able to visualise it in WWT. For some cool examples of what it can do, take a look at the videos posted on the Layerscape website like this one (Silverlight required):
http://www.layerscape.org/Content/Index/384
Here are some screenshots of two sample datasets that come with Layerscape. First, here’s some data on earthquakes in Excel with the Layerscape addin open:
Here’s an example of what this looks like visualised:
Here’s a second dataset with polygon data for the outlines of the countries of the world:
Now maybe you don’t have any really sexy scientific data to explore, but it’s increasingly likely that a business will have spatial data that needs visualising somehow. There are no end of ways this is possible in the SQL Server BI stack (here’s a good post by Alex Whittles about using maps in SSRS for example) but I think the most exciting thing about a tool like Layerscape is that it’s so easy to use that most reasonable competent, non-technical Excel users would have no trouble with it; also, because it integrates with Excel, it also plays nicely with PowerPivot.
Here’s a simple example of how to get data from PowerPivot into Layerscape. Let’s start with a dataset I found courtesy of this post on Alastair Aitchison’s superb spatial data blog:
http://alastaira.wordpress.com/2012/02/20/load-garmin-poi-data-to-sql-server/
Let’s imagine you’re a tourist visiting the UK – perhaps you’ve come over for SQLBits and you want to see some sights while you’re here. If you like castles and stately homes you might want to visit a National Trust property: the National Trust owns and protects over 500 historic buildings in England, Wales and Northern Ireland. How can we find out where these properties are?
Using one of the datasets listed in Alastair’s post above, I downloaded a CSV file containing the names, latitudes and longitudes of all the National Trusts properties and imported it into a table in PowerPivot. The data’s very simple: just a latitude, longitude and a site name, and the only cleanup I did was to create a new calculated column that removed the string “NatTrust” from the beginning of each site name:
Then on a blank sheet in Excel I created a new flattened PivotTable:
Added the Latitude, Longitude and Site Name columns onto rows:
Turned off subtotals and grand totals on the PivotTable:
And ended up with a PivotTable that looked like this:
All I then needed to do was select the whole table, right-click and choose “Visualize in WWT”, then in the Layer Manager pane ensure the Latitude and Longitude columns were all mapped correctly:
And change the following properties on the Marker tab: Scale Type to Power, Scale Factor to 16, Scale Relative to Screen and Marker Type to Pushpin.
Then finally click on the View in WWT button at the bottom of the Layer Manager pane to push the data over to WWT. Here’s the result with all the National Trust properties plotted on a map:
Of course the problem with visiting a National Trust property is that you won’t be able to do much outdoors if it’s raining. I wonder where I can get some weather data and add that to my map? We’ll find out how in part 2…
Importing UK Weather Data from Azure Marketplace into PowerPivot
I don’t always agree with everything Rob Collie says, much as I respect him, but his recent post on the Windows Azure Marketplace (part of which used to be known as the Azure Datamarket) had me nodding my head. The WAM has been around for a while now and up until recently I didn’t find anything much there that I could use in my day job; I had the distinct feeling it was going to be yet another Microsoft white elephant. The appearance of the DateStream date dimension table (see here for more details) was for me a turning point, and a month ago I saw something really interesting: detailed weather data for the UK from the Met Office (the UK’s national weather service) is now available there too. OK, it’s not going to be very useful for anyone outside the UK, but the UK is my home market and for some of my customers the ability to do things like use weather forecasts to predict footfall in shops will be very useful. It’s exactly the kind of data that analysts want to find in a data market, and if the WAM guys can add other equally useful data sets they should soon reach the point where WAM is a regular destination for all PowerPivot users.
Importing this weather data into PowerPivot isn’t completely straightforward though – the data itself is quite complex. The Datamarket guys are working on some documentation for it but in the meantime I thought I’d blog about my experiences; I need to thank Max Uritsky and Ziv Kaspersky for helping me out on this.
The first step in the process of importing this data is to go to the Azure Marketplace and construct a query to get the slice of data that you want – this is a big dataset and you won’t want to get all of it. Once you’ve signed in, go to https://datamarket.azure.com/dataset/0f2cba12-e5cf-4c6d-83c9-83114d44387a, subscribe to the dataset and then click on the “Explore this Dataset” link:
This takes you to the query builder page, where you get to explore the data in the different tables that make up this dataset:
You choose the table you want to explore in the confusingly-named ‘Query’ dropdown box on the right-hand side of the screen. The available tables are:
- ThreeHourlyForecast, a fact tables containing three hourly weather forecasts
- ThreeHourlyForecastArchive, a fact table containing aggregated, averaged values for the various forecasts for a given date and time
- SignificantWeather, a dimension table containing the different types of weather that can be forecast
- Visibility, a dimension table containing the different levels of visibility
- DailyForecast, a fact table containing daily weather forecasts
- Observations, a fact table containing observed weather
- Site, a dimension table containing all the UK’s weather stations
As far as I can tell, this data is more or less the same as what’s available through the Met Office’s own DataPoint service, and the documentation for this is here: http://www.metoffice.gov.uk/public/ddc/datasets-documentation.html
Once you’ve selected a table you can construct a filter by entering values in the Optional Parameters boxes below the query dropdown. These changes are then reflected in the URL shown at the top of the screen:
This URL represents an OData query. One thing I didn’t notice initially is that the query that is generated here includes a top 100 filter in it which you’ll need to remove (by deleting &$top=100 from the end of the URL) if you want to do anything useful with the data; you might also want to build a more complex query than is supported by the query builder, and you can learn how to do this by reading this article.
In my case I decided to look at the full three hourly forecast data. As I said, this is a big dataset – initially I thought I’d download the whole thing, but 18 million rows and several hours later I cancelled the import into PowerPivot. Instead I opted to look at data for the whole of the UK for just one forecast made on one day, which worked out at a more manageable 250000 rows. What’s not clear from any of the current documentation is what all of the columns in the three hourly forecast fact table represent:
- Date is the date the forecast is issued
- StartTime is the time the forecast is issued and is either 0, 6, 12 or 18, representing midnight, 06:00, 12:00 and 18:00 – new forecasts are issued every 6 hours
- PredictionTime is the time that an incremental update to a forecast is issued; these incremental updates appear every hour. PredictionTime is an actual time value going from 0 to 23 representing the hour the incremental update was issued.
- TimeStep is an offset in hours from the StartTime, and represents the time that the forecast is predicting the weather for. It ranges in value from 0 to 120, going up in 3s (so the values go 0, 3, 6, 9… 120), meaning we have weather predictions for 5 days into the future for each forecast.
Therefore, for any given row in the ThreeHourlyForecast table, if the Date is April 10th 2012, StartTime is 6, PredictionTime is 8 and TimeStep is 9, then this is data from a forecast that was issued on April 10th 2012 at 8am (the second incremental update to the 6am forecast) and this row contains the prediction for the weather for the time StartTime+TimeStep = 6 + 9 = 15:00 on April 10th 2012.
Here’s the OData url I used to grab data for the three hourly forecast issued on April 10th at midnight (StartTime=0 and PredictionTime=0) for all weather stations and all time steps:
To use this URL in PowerPivot, you need to create a new PowerPivot workbook, open the PowerPivot window and then click the From Azure DataMarket button:
Then enter your query URL and Account Key (which you can find on the Query Explorer page by clicking on the Show link, as indicated in the screenshot above):
Having imported this data I also imported the whole of Site (renamed here to Weather Stations) and SignificantWeather tables to give the following PowerPivot model:
Here are the joins I used:
I also created a few calculated columns, including one called ActualPredictionForStart which added the TimeStep to the Start Time and the Date to get the actual date and time that the prediction is for:
=[Date] + (([StartTime] + [TimeStep])/24)
With this all done, I was able to find out what the predicted weather for the current time and my home town was in this (as of the time of writing) two-day old forecast:
…and do all the usual PivotTable-y and chart-y things you can do with data once it’s in Excel:
Incidentally, the forecast is wrong – it’s not raining outside right now!
PivotTables and Excel charts are all very well, but there’s a better way of visualising this data when it’s in Excel – and in my next post I’ll show you how…
UPDATE: First of all, I owe an apology to the Met Office – as soon as I hit publish on this post it started pouring with rain, so they were right after all. Secondly, in a weird co-incidence, Rob Collie posted about using the other weather dataset in the DataMarket on his blog: http://www.powerpivotpro.com/2012/04/download-10000-days-of-free-weather-data-for-almost-any-location-worldwide/
Natural and Unnatural Hierarchies in the SSAS 2012 Tabular Model
I’m sure you’re all aware of the difference between natural and unnatural user hierarchies in the Analysis Services Multidimensional model (if you’re confused as to what I mean when I say ‘Multidimensional model’, have a quick read of this post from a few weeks ago which explains the terminology). To recap, natural user hierarchies in Multidimensional look like this in BIDS:
There is a one-to-many set of attribute relationships between each level, so each Calendar Year has multiple Calendar Semesters but one Calendar Semester has only one Calendar Year, and they are a Good Thing as far as query performance is concerned. Unnatural user hierarchies look like this:
They work, but there aren’t one-to-many relationships between every level and query performance may be worse than with a natural user hierarchy.
But what does all this have to do with the Tabular model? In SSDT when you create a hierarchy there’s no indication as to whether it’s natural or unnatural:
…and up to recently I assumed that this was an issue that simply wasn’t relevant to Tabular. However, after a recent conversation with Marius Dumitru from the dev team and Greg Galloway I now know this isn’t the case!
If you query the MDSCHEMA_HIERARCHIES DMV as follows:
SELECT
[DIMENSION_UNIQUE_NAME], [HIERARCHY_NAME], [STRUCTURE_TYPE]
FROM $SYSTEM.MDSCHEMA_HIERARCHIES
You can see whether a hierarchy in a Tabular model is natural or unnatural:
In this case you can see that the Calendar hierarchy that I created on the DimDate table is unnatural; SSAS has determined this during processing (specifically during the Process ReCalc stage) by examining the data itself automatically.
It turns out that natural hierarchies in Tabular can result in faster query performance because certain MDX and Formula Engine code paths in the SSAS engine are still not optimised for unnatural hierarchies. I don’t have any specific examples of when this occurs at the moment but if I do find them I’ll be sure to update this post. And if anyone else using Tabular, or even PowerPivot (and I assume this is relevant to PowerPivot too) finds a good example of how changing to a natural user hierarchy improves performance please leave a comment.
In the example above, I created the Calendar hierarchy in the Tabular model by simply dragging the CalendarYear, CalendarSemester, CalendarQuarter, EnglishMonthName and FullDateAlternateKey columns underneath each other in the new hierarchy. It’s unnatural because there are only two distinct values in Calendar Semester (the semester numbers 1 and 2), four distinct values in Calendar Quarter (the quarter numbers 1 to 4) and there are only twelve distinct values in EnglishMonthName (the names of the months), so there is a many-to-many relationship between the values in all these columns. I can make it natural by creating three calculated columns that concatenate CalendarYear and CalendarSemester, CalendarYear and CalendarQuarter, and CalendarYear and EnglishMonthName as follows:
DimDate[Calendar Semester of Year] =DimDate[CalendarYear] & " " & DimDate[CalendarSemester]
DimDate[Calendar Quarter of Year] = DimDate[CalendarYear] & " Q" & DimDate[CalendarQuarter]
DimDate[Calendar Month of Year] = DimDate[CalendarYear] & " " & DimDate[EnglishMonthName]
Using these calculated columns for the Semester, Quarter and Month levels of the hierarchy as follows:
…will make the Calendar hierarchy natural:
Self-service ETL with Data Explorer
One of the most interesting things I saw last week at the PASS Summit was Data Explorer, the cloud-based data transformation and publishing tool that was demoed in the keynote on day 1. While it was roundly dismissed as ‘yet more Excel’ by the disgruntled DBA faction I thought it showed some potential (you can see a walkthrough of what was shown here) – even if the fact that it was a SQL Azure Labs project suggested it was not destined to be a real product.
Today, however, I came across this post on Tim Mallalieu’s blog with a 10 minute video demo of an Excel addin version of Data Explorer, made earlier this year. Tim notes in his blog that:
We still have both the client and the cloud service but we only showed the cloud bits at PASS last week.
I would urge you to go and watch the video, because what’s shown is a very substantial, capable tool: an Excel addin for doing self-service ETL. Tellingly the name of the tool in the demo is “PowerImport” – and although Tim suggests in his blog that “some names and concepts have evolved quite a bit since March”, the choice of name speaks volumes. It looks like this could be to SSIS what PowerPivot is to SSAS, and a big selling point for Microsoft’s self-service BI story if it does get released.
Excel subtotals when querying Multidimensional and Tabular models
As I mentioned briefly in a recent post, the fact that Excel generates some pretty rubbish MDX for detail-level reports has been well documented by Richard Lees and Rui Quintino. The new Excel 2010 named set functionality allows you to work around these problems if you can write your own MDX, but let’s face it most people who are building Excel reports will not be able to do this so this problem can be a major headache and cause severe performance problems. One interesting point to note, however, is that Excel 2010 will generate slightly better MDX when querying a Tabular Model (and I’m including PowerPivot models here) compared to when it’s querying a Multidimensional Model (ie a SSAS cube). Take the following pivot table built using Excel 2010 against the Adventure Works cube running on SQL 2008 R2:
I’ve put the Internet Sales Amount measure on columns, the Calendar Year and Day Name hierarchies from the Date dimension on rows, and turned off all subtotals and grand totals. Here’s the MDX that Excel generates for this pivot table:
SELECT
NON EMPTY
CrossJoin(
Hierarchize(
{DrilldownLevel(
{[Date].[Calendar Year].[All Periods]},,,INCLUDE_CALC_MEMBERS)})
, Hierarchize(
{DrilldownLevel({[Date].[Day Name].[All Periods]},,,INCLUDE_CALC_MEMBERS)}))
DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME ON COLUMNS
FROM [Adventure Works]
WHERE ([Measures].[Internet Sales Amount])
CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS
If you run this query you’ll see it returns 40 columns and that the first few columns contain the subtotals and grand totals that we specifically said we didn’t want, which can be the cause of performance problems:
However, if you create a PowerPivot model containing DimDate and FactInternetSales and build exactly the same pivot table, you’ll see a different pattern of MDX being produced. To enable easy comparison, I’ve taken the MDX that my PowerPivot model generated and changed the hierarchy names so it will work on the SSAS Adventure Works cube:
SELECT
NON EMPTY
Hierarchize(
DrilldownMember(
CrossJoin(
{[Date].[Calendar Year].[All],[Date].[Calendar Year].[Calendar Year].AllMembers}
, {([Date].[Day Name].[All])})
, [Date].[Calendar Year].[Calendar Year].AllMembers
, [Date].[Day Name]))
DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME
ON COLUMNS
FROM [Adventure Works]
WHERE ([Measures].[Internet Sales Amount])
CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS
If you look at the results, you see that while the grand total is still being returned, the most of the unwanted subtotals are not and we only get 33 columns returned:
So if all other things were equal (and they’re not – the Tabular and Multidimensional engines are very different) then this MDX query has a big advantage over the first one because it’s doing much less work. Excel knows to use this new type of query by looking at the PREFERRED_QUERY_PATTERNS value returned by the MDSCHEMA_CUBES rowset; I’m told that the reason it isn’t used with Multidimensional models is that in many cases it could lead to worse, rather than better performance. This is another good reason to prefer Tabular models over Multidimensional models for detail-level reporting if you’re using Excel as a front-end.
An interesting side-note is that this new style of MDX is only possible in SSAS 2008 R2 because the DrillDownMember has got a new parameter called Target_Hierarchy, which allows you to specify which hierarchy you want to drill down on in a tuple (note there were some other changes with this type of function that I blogged about here). You can see the documentation here:
http://msdn.microsoft.com/en-us/library/ms145580(v=SQL.105).aspx
Here’s a quick example on Adventure Works. Consider the following query, where we’re drilling down on a tuple containing the all members from the Day Name and the Calendar Year hierarchies:
SELECT {[Measures].[Internet Sales Amount]} ON 0,
DRILLDOWNMEMBER(
{([Date].[Calendar Year].[All Periods],[Date].[Day Name].[All Periods])}
,{([Date].[Calendar Year].[All Periods],[Date].[Day Name].[All Periods])}
)
ON 1
FROM [Adventure Works]
Here are the results:
You’ll see that only the Day Name hierarchy has been drilled down on, and this is (as far as I can see) because it’s the last hierarchy that appears in the tuple. However, with the new parameter, we can specify that we want the Calendar Year hierarchy drilled down on instead:
SELECT {[Measures].[Internet Sales Amount]} ON 0,
DRILLDOWNMEMBER(
{([Date].[Calendar Year].[All Periods],[Date].[Day Name].[All Periods])}
,{([Date].[Calendar Year].[All Periods],[Date].[Day Name].[All Periods])}
, [Date].[Calendar Year]
)
ON 1
FROM [Adventure Works]
Thanks, as always, to Akshai, Marius, Greg Galloway, Teo Lachev and the usual string of SSAS obsessives for providing the background info for this post…
Installing SSAS 2008R2 SP1 breaks PowerPivot
Here’s something I found out a few weeks ago, just before I went on holiday: installing SP1 for SSAS 2008R2 breaks a PowerPivot installation on the same machine. After I’d installed SP1 on my laptop I found I got an “Unable to open file” error message in PowerPivot whenever I tried to import data, at which point PowerPivot crashed. For me the fix was easy – reinstall PowerPivot and it worked again – but I’m glad I found this out before I needed to use PowerPivot in front of a customer.
Querying PowerPivot DMVs from Excel
One of the more popular posts on my blog is one I wrote just over a year ago on binding the results of an MDX query to a table inside Excel. I was thinking about it again recently when I was looking at the list of DMVs (=Dynamic Management Views – views that can be queried using SQL in SSAS and which contain all kinds of useful admin data) available in Analysis Services and noticed several new ones in 2008 R2 that are PowerPivot-related; I assume these are the DMVs that the Sharepoint management dashboard uses to track usage of PowerPivot models after they’ve been uploaded, but it struck me that it would also be cool to have this information available for PowerPivot models while they’re still in Excel. Wouldn’t it be good to query a DMV from Excel? Well, here’s how.
First of all, take an Excel workbook with a PowerPivot model in it. Go to the Data tab and click on Connections, and you’ll see the connection that is created automatically to the PowerPivot model:
This is the connection we want to use to run our DMVs. We now need to be able to use a table to show the results of our query, and this requires something similar to the method Greg Galloway described after I published the above post. First, on a new sheet open a connection to any relational data source you have handy such as SQL Server and import a table from that data source into a table in Excel. I used the DimProductCategory table from Adventure Works, and did this by going to the Data tab, clicking on From Other Data Sources and then From SQL Server, and running the wizard. The result is this:
Then go to the Connections dialog and copy the connection string from the PowerPivot connection shown in the first screenshot above (found when you click Properties and go to the Definition tab), then go to the SQL table you’ve just created, right-click and select Table and Edit Query, then paste the PowerPivot connection string into the Connection textbox, change the Command Type to Default, and then put your query into the Command Text box. I also had to add an extra connection string property setting Locale Identifier=1033 to get things working on my machine (and re-add it every time I edited the query), but I suspect this might not be necessary if you have a US English machine. Anyway, here’s what my connection string looked like:
Provider=MSOLAP.4;Persist Security Info=True;Initial Catalog=Microsoft_SQLServer_AnalysisServices;Data Source=$Embedded$;MDX Compatibility=1;Safety Options=2;MDX Missing Member Mode=Error;Optimize Response=3;Cell Error Mode=TextValue; locale identifier=1033
And here’s the dialog:
Having done this, when you click ok you’ll see the table update with the contents of the query.
Of course you can enter any MDX query here but I’m going to stick to talking about DMVs. So what useful information can you get from a DMV then? Vincent Rainardi has a great post on SSAS DMVs here which covers all the useful ones and has plenty of syntax examples, but here are some things you might want to do with PowerPivot.
First of all, to get a list of all the DMVs supported you can run the query:
select * from $system.discover_schema_rowsets
To get a list of tables in your model along with the dates they were last edited and when the data was last updated, use the following query:
select cube_name, last_schema_update, last_data_update from $system.mdschema_cubes
…although I’ve noticed some weird behaviour with the dates for some tables, so be careful using it.
To get a list of the number of distinct values in each column, use:
select dimension_name, table_id, rows_count from $system.discover_storage_tables
The query:
select * from $system.discover_storage_table_columns
gives more metadata on table columns; however:
select * from $system.discover_storage_table_column_segments
…although it gives some information on the amount of memory allocated to different columns, does not give the complete picture on memory usage. For that you need to use:
select * from $system.discover_object_memory_usage
This gives a full breakdown of memory usage (in the OBJECT_MEMORY_NONSHRINKABLE column) by each object in the PowerPivot model. It’s not all that easy to interpret this information though, because it only gives the memory used directly by each object and you also need to take into account the memory used by all the objects ‘owned’ by a given object too. It’s also worth pointing out that this is not the same view of memory usage that is given by looking at the temp folder created by Vertipaq, which Vidas has blogged about here and here; it shows the size of the database when it has been loaded into memory as opposed to the size of the database when it is persisted to disk, and there can be a big disparity between the two.
How can we make sense of the data returned by discover_object_memory_usage? We load it back into PowerPivot of course! I created a linked table and then a calculated column called OBJECT_PATH concatenating OBJECT_PARENT_PATH and OBJECT_ID using the following expression:
=[OBJECT_PARENT_PATH]&"."&[OBJECT_ID]
This gave me the full path of each object in a format that’s directly comparable with the object’s parent as stored in OBJECT_PARENT_PATH.
I then created a calculated measure with the following expression to return the amount of memory used by each object, including the objects it owns, in KB:
=(SUM(Memory[OBJECT_MEMORY_NONSHRINKABLE]) + CALCULATE(SUM(Memory[OBJECT_MEMORY_NONSHRINKABLE]),FILTER(ALL(Memory), COUNTROWS(FILTER(VALUES(Memory[OBJECT_PATH]), IFERROR(SEARCH(Memory[OBJECT_PATH],EARLIER(Memory[OBJECT_PARENT_PATH])), 0)=1))>0)))/1024
It’s then easy to see the memory used by the cubes and dimensions that make up the PowerPivot model:
And the memory used by objects associated with the columns in a particular table:
All of which is very useful if you’re trying to work out what’s eating memory in your PowerPivot model. If anyone comes across any other interesting thing to do with DMVs for PowerPivot then please let me know…
PowerPivot Videos from SQL Server Day 2010
Last month I had the pleasure of speaking at SQL Server Day 2010 in Belgium (I had a nightmare getting there and back because of the snow, but that’s another story). I’ve just seen that all the videos from the event are now available to view online and download here:
http://sqlserverday.be/video/
Both the sessions I gave are up: “Implementing Common Business Calculations in DAX” and “Comparing Analysis Services with PowerPivot”.
On a related note, we’ve aggregated all the content we’ve got for every single SQLBits on a new page on the SQLBits site:
http://sqlbits.com/content/
As far as I can see there are 152 videos on there alone!
PowerPivot Training in London (and elsewhere)
As you’re probably aware by now, PowerPivot and DAX are important parts of the Microsoft BI story going forward – self-service BI is the big new thing, while on the corporate side BISM will use DAX as its calculation language and BISM models will essentially be PowerPivot models. So it makes sense to learn PowerPivot and DAX as soon as possible to get a head start, right?
While that thought’s in your head, I’m pleased to mention that I’m helping Marco Russo and Alberto Ferrari (authors of the excellent book “PowerPivot for Excel 2010”) organise the London leg of their European PowerPivot training tour. Marco has more details on the course on his blog here, and you can find the course website here:
http://www.powerpivotworkshop.com/
Whether you’re a BI professional or an Excel power user, I think this represents an excellent opportunity to get training from acknowledged PowerPivot gurus. I’ll be there in the audience!
