Archive for April 2012
I’m pleased to announce that a whole bunch of new dates for SSAS and MDX training courses in London, Manchester and Dublin are now up on the Technitrain site, along with several other SQL Server BI and engine courses from the likes of Jen Stirrup, Andy Leonard and Christian Bolton.
Here are the details:
Server Analysis Services 2012 Tabular Workshop with Marco Russo and Chris Webb 28 – 29 May, London
In association with SQL BI (http://www.ssasworkshop.com/)
If you want to learn about creating Tabular models with Microsoft SQL Server Analysis Services 2012, then this is the course for you. It covers all the content in the new Tabular book that Marco, Alberto and I have been working on and it goes into a lot more detail than the standard tutorials. I’m co-promoting this course in London with Marco and Alberto, and as detailed below I’ll be teaching it solo in Dublin in the autumn; it’s going to be run in several other European cities so take a look here for more details.
Cost – £850 (exc VAT)
Data Visualisation with SQL Server 2012 Tabular Models, Excel and Power View with Jen Stirrup 30 May 2012, London
A one day course aimed at BI professionals who use Excel and Power View and want to learn how to create effective data visualisations. Jen is the acknowledged expert in the field of data visualisation for Microsoft BI, and this course is immediately after the SSAS 2012 Tabular course – so why not attend both?
Cost: – £299 (exc VAT)
Real World Cube Design and Performance Tuning with Analysis Services Multidimensional with Chris Webb 3-5 July 2012, Manchester
A three day course aimed at intermediate-to-experienced Analysis Services Multidimensional developers, looking at more advanced cube design topics and query performance tuning. I’ll be speaking at the Manchester and Leeds SQL Server user groups in the evenings while I’m up in there too.
Cost: – £749 (exc VAT)
Upgrade your SQL Server Integration Services skills to 2012 with Andy Leonard 12-14 September 2012, London
An introduction to the new features in SSIS 2012 for SQL Server Integration Services professionals. Andy’s last course for us went down very well indeed – take a look at this review of it!
Cost – £749 (exc VAT)
Server Analysis Services 2012 Tabular Workshop with Chris Webb 19 -20 September 2012, Dublin
This is the same course as the one listed above in London in May.
Cost – €950
Data Visualisation with SQL Server 2012 Tabular Models, Excel and Power View with Jen Stirrup 21 September 2012, Dublin
Once again, this is the same course as the one listed above in London in May.
Cost – €335
Advanced Internals and Troubleshooting Workshop for SQL Server with Christian Bolton 9 – 10 October 2012, London
The Advanced Troubleshooting Workshop for SQL Server 2005, 2008, R2 and 2012 provides attendees with SQL Server relational engine internals knowledge, practical troubleshooting skills and a proven methodical approach to problem solving. The workshop will enable attendees to tackle complex SQL Server problems with confidence.
Cost – £990 (exc VAT)
Introduction To MDX, with Chris Webb – 5-7 December 2012, London
A three day course designed for those with little or no experience of MDX, by the end of this course, you will be able to write MDX queries and calculations for Analysis Services with confidence.
Cost:- £749 (exc VAT)
I was studying a Profiler trace run on a SSAS 2012 Multidimensional instance today, and I noticed some new information was being returned in the Query Subcube Verbose event relating to caching. The Query Subcube Verbose event is fired every time the Formula Engine requests data from the Storage Engine, specifically when the Formula Engine requests a subcube of data from a single measure group, and one of these requests may or may not be answered using the cache associated with that measure group. If the request is answered from cache then performance is likely to be very good; however in some (but by no means all) cases, if the Storage Engine has to go to disk to answer the request then performance could be poor.
It has always been possible to tell whether the subcube request associated with a Query Subcube Verbose event is answered from cache or not. Let’s take as an example a query that returns a single cell of data from a super-simple cube. Here’s what I see in Profiler when I run my query on a cold cache:
Out of the three red boxes in this screenshot, in the top box you can see the Progress Report Begin/End events associated with the Query Subcube Verbose event that’s in the middle red box – these show the Storage Engine is reading data from disk. The Event Subclass column for the Query Subcube Verbose event shows that it is requesting non-cache data, and the Resource Usage event shows the number of rows scanned and returned. If I run the query a second time without clearing the cache – ie on a warm cache – then I see the following:
The Event Subclass column for the Query Subcube Verbose event shows the data it is requesting is being retrieved from cache, there are no Progress Report Begin/End events (as you would expect) and the Resource Usage event shows no rows read.
Looking at the Query Subcube Verbose event itself, there are now two groups of information returned:
The top red box shows information on the granularity of the request, as has been the case since SSAS 2005. The bottom red box contains the new stuff. These four values are reported at the point in time after the Query Subcube Verbose event has executed and are:
- CacheEntries: the number of entries in the measure group cache after the subcube request has been executed. This value will continue to increment and only be reset back to zero after the measure group cache has been cleared (either as a result of an XMLA clear cache or some other event, like processing, that clears the cache). The execution of the subcube request will itself usually result in an entry being made into the cache, so it’s unlikely you’ll see this value showing as zero. If you don’t see this value incrementing you might be running into this issue.
- CacheHits: the number of times that a subcube request has been answered from this particular measure group cache since the last time this measure group was processed (note, not the last time the cache was cleared).
- CacheSearches: the number of times that the measure group cache has been searched since the last time this measure group was processed. This value may increment by more than one for each subcube request.
- CacheEvictions: the number of times that an entry has been evicted from the measure group cache since the last time the measure group was processed. You’ll see this value go up, for instance, if you run an XMLA clear cache; if you see it go up at other times then Bad Things are likely to be happening – you may be running out of memory on the server for example, and SSAS might be evicting data from the cache as a result.
Together these values give us a much more detailed view of what’s going on in the individual storage engine caches than we’ve ever had before, and will prove to be very useful for performance tuning and also when building cache warmers. That said it’s not going to be easy to use this information when you’ve got complex queries that generate a lot of subcube requests, but it’s better than not having the information at all.
[Thanks once again to Akshai Mirchandani for answering my questions on this!]
In my last post I showed how to load data from Excel into Layerscape, the new data visualisation tool from Microsoft Research; in the post before that I showed how to load UK weather data from Windows Azure Datamarket into PowerPivot. Now let’s take that weather data and plot it on a map!
When doing this, the first decision I made with this data was about what to show. The main problem is that the data volumes involved here are very close to the upper limit of what Layerscape is able to handle: in a few cases I crashed Layerscape, but that was when I was trying to load around 150000 rows of data into it; I found that just over 110000 rows of data was ok however. As a result I made the decision to only show data for cloud or rain, not sun (which is ok – if you want to visualise a clear day, you don’t want to show anything on a map I think) or mist or fog (which I was less happy about).
To achieve this I created a calculated column on my Three Hourly Forecast table in PowerPivot called Cloud with the following definition:
=if([SignificantWeatherId]>=7, TRUE, BLANK())
Then, after having created a flattened PivotTable with the data I wanted to display, I dropped the Cloud field into the Values box in my PivotTable and was able to filter it to only show Weather Stations and time periods where there was cloud:
I also created a few other calculated columns:
- ActualPredictionForStart and ActualPredictionForEnd: the former is mentioned in my previous post, and the latter returns a value three hours after the former:
=[Date] + (([TimeStep]+3)/24)
These two values represent the start time and the end time for each row in the Three Hourly Forecast table.
- WeatherDepth: in the Significant Weather table there’s a distinction made between low-altitude cloud and high altitude cloud, and in LayerScape when you’re plotting data you can control how high off the ground a point is displayed, so the WeatherDepth column contains some fairly arbitrary numbers for cloud altitudes based on the [Code] column. Here’s the definition:
=SWITCH([Code], 5, 150, 6, 150, 7, 100, 8, 120, 10)
- WeatherColor: again, in Layerscape you can control the colour of your points and their opacity, which again is useful for displaying dark/light and thin/thick clouds. The only compromise I had to make was to display dark clouds (ie where clouds are described as either black or where the weather is rainy or snowy) in blue rather than black, because I couldn’t get Layerscape to display black – white clouds are shown in white. To calculate WeatherColor I created two intermediate calculated columns on the Significant Weather table, InitialColor (which contains colours for weather that I’m not displaying in this demo – I thought dust storms should be shown in yellow for instance):
=SWITCH([Code], 1, "yellow", 2, "white", 3, "yellow", 4, "yellow", 5, "white", 6, "white", 7, "white", 8, "blue", "blue")
=SWITCH([Code], -99, 0, 0, 0, 1,10, 2, 25, 3, 10, 4, 100, 5, 25, 6, 75, 7, 50, 8, 50, 9, 60, 10, 60, 11, 60, 12, 65, 13, 75, 14, 75, 15, 80, 90)
…making the definition of WeatherColor:
=[Opacity] & "% " & [InitialColor]
- Depth and Color: calculated columns on the Three Hourly Forecast table that copied the WeatherDepth and WeatherColor values down to the main fact table:
This screenshot gives you some idea of the values that the depth and colour calculations return:
With this all done I was able to load the data into Layerscape in the way I showed in my previous post, tweak some of the settings for the markers and time decay, and come up with a nice-looking visualisation. The big difference in this case compared to my previous examples is that here we have time series data and Layerscape is able to show values changing over time. The only thing I needed to do to make this happen was to check the Time Series box in the Layers pane on the main screen; with this done I could show the data for a particular point in time or let Layerscape cycle through time showing how the forecast weather changed.
I also created a video showing the tour:
I’m quite happy to admit that this tour doesn’t show off the full capabilities of Layerscape (I’m not sure my graphics card is up to the job, frankly) and I’ve not taken much time to ensure that the visualisation is as accurate as it could be, but I’m still quite proud if it!
Sometimes you find a tool that is so cool, you can’t believe no-one else has picked up on it before. This is one of those times: a few month or so ago I came across a new tool called Layerscape (http://www.layerscape.org) from Microsoft Research which allows you to overlay data from Excel onto maps in Microsoft WorldWide Telescope (http://www.worldwidetelescope.org). “What is WorldWide Telescope?” I hear you ask – well, it’s basically Microsoft Research’s answer to Google Earth, although it’s not limited to the Earth in that it also contains images of the universe from a wide range of ground and space-based telescopes. It’s a pretty cool toy in its own right, but Layerscape – which seems to be aimed at academics, despite the obvious business uses – turns it into a pretty amazing BI visualisation tool.
Layerscape is very easy to use: it’s an Excel addin, and once you have it and WWT installed all you need to do is select a range of data in Excel to be able to visualise it in WWT. For some cool examples of what it can do, take a look at the videos posted on the Layerscape website like this one (Silverlight required):
Here are some screenshots of two sample datasets that come with Layerscape. First, here’s some data on earthquakes in Excel with the Layerscape addin open:
Here’s an example of what this looks like visualised:
Here’s a second dataset with polygon data for the outlines of the countries of the world:
Now maybe you don’t have any really sexy scientific data to explore, but it’s increasingly likely that a business will have spatial data that needs visualising somehow. There are no end of ways this is possible in the SQL Server BI stack (here’s a good post by Alex Whittles about using maps in SSRS for example) but I think the most exciting thing about a tool like Layerscape is that it’s so easy to use that most reasonable competent, non-technical Excel users would have no trouble with it; also, because it integrates with Excel, it also plays nicely with PowerPivot.
Here’s a simple example of how to get data from PowerPivot into Layerscape. Let’s start with a dataset I found courtesy of this post on Alastair Aitchison’s superb spatial data blog:
Let’s imagine you’re a tourist visiting the UK – perhaps you’ve come over for SQLBits and you want to see some sights while you’re here. If you like castles and stately homes you might want to visit a National Trust property: the National Trust owns and protects over 500 historic buildings in England, Wales and Northern Ireland. How can we find out where these properties are?
Using one of the datasets listed in Alastair’s post above, I downloaded a CSV file containing the names, latitudes and longitudes of all the National Trusts properties and imported it into a table in PowerPivot. The data’s very simple: just a latitude, longitude and a site name, and the only cleanup I did was to create a new calculated column that removed the string “NatTrust” from the beginning of each site name:
Then on a blank sheet in Excel I created a new flattened PivotTable:
Added the Latitude, Longitude and Site Name columns onto rows:
Turned off subtotals and grand totals on the PivotTable:
And ended up with a PivotTable that looked like this:
All I then needed to do was select the whole table, right-click and choose “Visualize in WWT”, then in the Layer Manager pane ensure the Latitude and Longitude columns were all mapped correctly:
And change the following properties on the Marker tab: Scale Type to Power, Scale Factor to 16, Scale Relative to Screen and Marker Type to Pushpin.
Then finally click on the View in WWT button at the bottom of the Layer Manager pane to push the data over to WWT. Here’s the result with all the National Trust properties plotted on a map:
Of course the problem with visiting a National Trust property is that you won’t be able to do much outdoors if it’s raining. I wonder where I can get some weather data and add that to my map? We’ll find out how in part 2…
I don’t always agree with everything Rob Collie says, much as I respect him, but his recent post on the Windows Azure Marketplace (part of which used to be known as the Azure Datamarket) had me nodding my head. The WAM has been around for a while now and up until recently I didn’t find anything much there that I could use in my day job; I had the distinct feeling it was going to be yet another Microsoft white elephant. The appearance of the DateStream date dimension table (see here for more details) was for me a turning point, and a month ago I saw something really interesting: detailed weather data for the UK from the Met Office (the UK’s national weather service) is now available there too. OK, it’s not going to be very useful for anyone outside the UK, but the UK is my home market and for some of my customers the ability to do things like use weather forecasts to predict footfall in shops will be very useful. It’s exactly the kind of data that analysts want to find in a data market, and if the WAM guys can add other equally useful data sets they should soon reach the point where WAM is a regular destination for all PowerPivot users.
Importing this weather data into PowerPivot isn’t completely straightforward though – the data itself is quite complex. The Datamarket guys are working on some documentation for it but in the meantime I thought I’d blog about my experiences; I need to thank Max Uritsky and Ziv Kaspersky for helping me out on this.
The first step in the process of importing this data is to go to the Azure Marketplace and construct a query to get the slice of data that you want – this is a big dataset and you won’t want to get all of it. Once you’ve signed in, go to https://datamarket.azure.com/dataset/0f2cba12-e5cf-4c6d-83c9-83114d44387a, subscribe to the dataset and then click on the “Explore this Dataset” link:
This takes you to the query builder page, where you get to explore the data in the different tables that make up this dataset:
You choose the table you want to explore in the confusingly-named ‘Query’ dropdown box on the right-hand side of the screen. The available tables are:
- ThreeHourlyForecast, a fact tables containing three hourly weather forecasts
- ThreeHourlyForecastArchive, a fact table containing aggregated, averaged values for the various forecasts for a given date and time
- SignificantWeather, a dimension table containing the different types of weather that can be forecast
- Visibility, a dimension table containing the different levels of visibility
- DailyForecast, a fact table containing daily weather forecasts
- Observations, a fact table containing observed weather
- Site, a dimension table containing all the UK’s weather stations
As far as I can tell, this data is more or less the same as what’s available through the Met Office’s own DataPoint service, and the documentation for this is here: http://www.metoffice.gov.uk/public/ddc/datasets-documentation.html
Once you’ve selected a table you can construct a filter by entering values in the Optional Parameters boxes below the query dropdown. These changes are then reflected in the URL shown at the top of the screen:
This URL represents an OData query. One thing I didn’t notice initially is that the query that is generated here includes a top 100 filter in it which you’ll need to remove (by deleting &$top=100 from the end of the URL) if you want to do anything useful with the data; you might also want to build a more complex query than is supported by the query builder, and you can learn how to do this by reading this article.
In my case I decided to look at the full three hourly forecast data. As I said, this is a big dataset – initially I thought I’d download the whole thing, but 18 million rows and several hours later I cancelled the import into PowerPivot. Instead I opted to look at data for the whole of the UK for just one forecast made on one day, which worked out at a more manageable 250000 rows. What’s not clear from any of the current documentation is what all of the columns in the three hourly forecast fact table represent:
- Date is the date the forecast is issued
- StartTime is the time the forecast is issued and is either 0, 6, 12 or 18, representing midnight, 06:00, 12:00 and 18:00 – new forecasts are issued every 6 hours
- PredictionTime is the time that an incremental update to a forecast is issued; these incremental updates appear every hour. PredictionTime is an actual time value going from 0 to 23 representing the hour the incremental update was issued.
- TimeStep is an offset in hours from the StartTime, and represents the time that the forecast is predicting the weather for. It ranges in value from 0 to 120, going up in 3s (so the values go 0, 3, 6, 9… 120), meaning we have weather predictions for 5 days into the future for each forecast.
Therefore, for any given row in the ThreeHourlyForecast table, if the Date is April 10th 2012, StartTime is 6, PredictionTime is 8 and TimeStep is 9, then this is data from a forecast that was issued on April 10th 2012 at 8am (the second incremental update to the 6am forecast) and this row contains the prediction for the weather for the time StartTime+TimeStep = 6 + 9 = 15:00 on April 10th 2012.
Here’s the OData url I used to grab data for the three hourly forecast issued on April 10th at midnight (StartTime=0 and PredictionTime=0) for all weather stations and all time steps:
To use this URL in PowerPivot, you need to create a new PowerPivot workbook, open the PowerPivot window and then click the From Azure DataMarket button:
Then enter your query URL and Account Key (which you can find on the Query Explorer page by clicking on the Show link, as indicated in the screenshot above):
Having imported this data I also imported the whole of Site (renamed here to Weather Stations) and SignificantWeather tables to give the following PowerPivot model:
Here are the joins I used:
I also created a few calculated columns, including one called ActualPredictionForStart which added the TimeStep to the Start Time and the Date to get the actual date and time that the prediction is for:
=[Date] + (([StartTime] + [TimeStep])/24)
With this all done, I was able to find out what the predicted weather for the current time and my home town was in this (as of the time of writing) two-day old forecast:
…and do all the usual PivotTable-y and chart-y things you can do with data once it’s in Excel:
Incidentally, the forecast is wrong – it’s not raining outside right now!
PivotTables and Excel charts are all very well, but there’s a better way of visualising this data when it’s in Excel – and in my next post I’ll show you how…
UPDATE: First of all, I owe an apology to the Met Office – as soon as I hit publish on this post it started pouring with rain, so they were right after all. Secondly, in a weird co-incidence, Rob Collie posted about using the other weather dataset in the DataMarket on his blog: http://www.powerpivotpro.com/2012/04/download-10000-days-of-free-weather-data-for-almost-any-location-worldwide/
SQLBits X finished over a week ago and so it seems a bit late to be blogging about it. To be honest, though, I needed that week to recover from the conference, a cold I picked up the day before it started, and all of the other work-related stress that’s been building up over the last few months.
So, the obligatory bit of reportage. SQLBits X was indeed as big as we were hoping and behind the scenes everything went surprisingly smoothly; I guess after nine previous events we must have learned something about running a tech conference! I don’t know what the official numbers are but we packed 1400 swag bags on the Wednesday afternoon before it all started in a five-hour bag stuffing marathon, and on the Saturday evening we only had about 100 or so left, so that makes it easily 50% larger than any other SQLBits. We had a great line-up of speakers including a large number of international SQL celebrities, more than we’ve ever had before, and the parties on Thursday and Friday night were well-attended and more importantly good fun. I don’t know how SQLBits could get any better except by getting bigger and offering even more of the same, although if we did get bigger it would probably end up killing a lot of what makes SQLBits special and push us beyond the limits of what an amateur organisation can cope with. My thanks go out to my fellow committee members Simon Sabin, Allan Mitchell, Martin Bell, James Rowland-Jones, Darren Green, Chris Testa-O’Neill, Tim Kent and Christian Bolton, as well as all of the team of helpers who gave up their time free of charge including Annette Allan, Helen Lau, and many others.