Archive for February 2011
Late last year I blogged about icCube, a new OLAP server that supports MDX. I’ve just heard they’ve released version 1.1 and decided to give it away free (see here) – an interesting development for anyone looking at low-cost or open source OLAP tools, although I suspect it represents competition more for Mondrian than it does for SSAS.
This week, the SQLBits committee (which I’m a member of) announced our new event: SQLBits Insight. It’s aimed at CIOs, architects and other senior technical decision makers who are interested in finding out what’s new in SQL Server and how it can help their business. Although it will be taking place on the first day of SQLBits, on April 7th at the Grand Hotel in Brighton, it’s a separate event and very different to anything we’ve done before since it’s aimed at a completely new audience for us.
We have got an amazing line-up of speakers for it:
- Steve Wozniak, co-founder of Apple and Chief Scientist of Fusion-io.
- Guy Lucchi, CTO of CSC
- Mark Souza, head of the SQLCat team at Microsoft
- Ross Mistry, an Enterprise Architect for Microsoft in the Silicon Valley
- Richard Tkachuk, ex of the SSAS team, now a program manager on the Parallel Data Warehouse team
Topics covered during the day will include private clouds, handling big data with PDW, new storage technologies and more; there will also be a reception afterwards where attendees get to mingle with the speakers, including ‘the Woz’ (as apparently we should refer to him). For full details on the speakers, the agenda and how to register, go to http://insight.sqlbits.com
We’d also really like to get your help to make it a success. If you’re a fan of SQLBits it would be great if you could forward details of SQLBits Insight on to any senior IT people that you know who might be interested in attending. After all, if they come back to the office enthused about what they’ve seen it might mean some juicy new projects being started which you’ll get to work on…
Steve Wozniak will of course be sticking around for the evening to dole out the prizes at the Crappy Code Games, so if you want some free food and booze and a picture of yourself with the great man to make all those Apple fanboy friends of yours weep, then don’t forget to register for it. You don’t need to take part in the games if you’re feeling shy – you can just watch – but of course if you do that you won’t win anything!
One of the more popular posts on my blog is one I wrote just over a year ago on binding the results of an MDX query to a table inside Excel. I was thinking about it again recently when I was looking at the list of DMVs (=Dynamic Management Views – views that can be queried using SQL in SSAS and which contain all kinds of useful admin data) available in Analysis Services and noticed several new ones in 2008 R2 that are PowerPivot-related; I assume these are the DMVs that the Sharepoint management dashboard uses to track usage of PowerPivot models after they’ve been uploaded, but it struck me that it would also be cool to have this information available for PowerPivot models while they’re still in Excel. Wouldn’t it be good to query a DMV from Excel? Well, here’s how.
First of all, take an Excel workbook with a PowerPivot model in it. Go to the Data tab and click on Connections, and you’ll see the connection that is created automatically to the PowerPivot model:
This is the connection we want to use to run our DMVs. We now need to be able to use a table to show the results of our query, and this requires something similar to the method Greg Galloway described after I published the above post. First, on a new sheet open a connection to any relational data source you have handy such as SQL Server and import a table from that data source into a table in Excel. I used the DimProductCategory table from Adventure Works, and did this by going to the Data tab, clicking on From Other Data Sources and then From SQL Server, and running the wizard. The result is this:
Then go to the Connections dialog and copy the connection string from the PowerPivot connection shown in the first screenshot above (found when you click Properties and go to the Definition tab), then go to the SQL table you’ve just created, right-click and select Table and Edit Query, then paste the PowerPivot connection string into the Connection textbox, change the Command Type to Default, and then put your query into the Command Text box. I also had to add an extra connection string property setting Locale Identifier=1033 to get things working on my machine (and re-add it every time I edited the query), but I suspect this might not be necessary if you have a US English machine. Anyway, here’s what my connection string looked like:
Provider=MSOLAP.4;Persist Security Info=True;Initial Catalog=Microsoft_SQLServer_AnalysisServices;Data Source=$Embedded$;MDX Compatibility=1;Safety Options=2;MDX Missing Member Mode=Error;Optimize Response=3;Cell Error Mode=TextValue; locale identifier=1033
And here’s the dialog:
Having done this, when you click ok you’ll see the table update with the contents of the query.
Of course you can enter any MDX query here but I’m going to stick to talking about DMVs. So what useful information can you get from a DMV then? Vincent Rainardi has a great post on SSAS DMVs here which covers all the useful ones and has plenty of syntax examples, but here are some things you might want to do with PowerPivot.
First of all, to get a list of all the DMVs supported you can run the query:
select * from $system.discover_schema_rowsets
To get a list of tables in your model along with the dates they were last edited and when the data was last updated, use the following query:
select cube_name, last_schema_update, last_data_update from $system.mdschema_cubes
…although I’ve noticed some weird behaviour with the dates for some tables, so be careful using it.
To get a list of the number of distinct values in each column, use:
select dimension_name, table_id, rows_count from $system.discover_storage_tables
select * from $system.discover_storage_table_columns
gives more metadata on table columns; however:
select * from $system.discover_storage_table_column_segments
…although it gives some information on the amount of memory allocated to different columns, does not give the complete picture on memory usage. For that you need to use:
select * from $system.discover_object_memory_usage
This gives a full breakdown of memory usage (in the OBJECT_MEMORY_NONSHRINKABLE column) by each object in the PowerPivot model. It’s not all that easy to interpret this information though, because it only gives the memory used directly by each object and you also need to take into account the memory used by all the objects ‘owned’ by a given object too. It’s also worth pointing out that this is not the same view of memory usage that is given by looking at the temp folder created by Vertipaq, which Vidas has blogged about here and here; it shows the size of the database when it has been loaded into memory as opposed to the size of the database when it is persisted to disk, and there can be a big disparity between the two.
How can we make sense of the data returned by discover_object_memory_usage? We load it back into PowerPivot of course! I created a linked table and then a calculated column called OBJECT_PATH concatenating OBJECT_PARENT_PATH and OBJECT_ID using the following expression:
This gave me the full path of each object in a format that’s directly comparable with the object’s parent as stored in OBJECT_PARENT_PATH.
I then created a calculated measure with the following expression to return the amount of memory used by each object, including the objects it owns, in KB:
=(SUM(Memory[OBJECT_MEMORY_NONSHRINKABLE]) + CALCULATE(SUM(Memory[OBJECT_MEMORY_NONSHRINKABLE]),FILTER(ALL(Memory), COUNTROWS(FILTER(VALUES(Memory[OBJECT_PATH]), IFERROR(SEARCH(Memory[OBJECT_PATH],EARLIER(Memory[OBJECT_PARENT_PATH])), 0)=1))>0)))/1024
It’s then easy to see the memory used by the cubes and dimensions that make up the PowerPivot model:
And the memory used by objects associated with the columns in a particular table:
All of which is very useful if you’re trying to work out what’s eating memory in your PowerPivot model. If anyone comes across any other interesting thing to do with DMVs for PowerPivot then please let me know…
Over the last few years I’ve been tracking Google’s slow progress towards offering a cloud-based BI solution. Here’s a new development: I see from the Official Google Blog that you can now upload your own data to the Google Public Data Explorer (which I blogged about last year):
There’s more background here:
How long will it be before it can access data from BigQuery, and is integrated into Google Docs I wonder?
I see on Amyn Rajan’s blog there’s another platform supporting MDX: Kognitio WX2 with its new Pablo (a pun on ‘Picasso’, which was an old code name for SSAS?) product. More details here:
It’s interesting that tight integration with Excel, which in theory should be Microsoft BI’s trump card, is being so widely copied by its competitors. After all this product, which sounds similar to what Teradata released last year and what’s possible with Oracle Exadata – all possible through the efforts of Simba – is what SSAS in ROLAP mode delivers on top of PDW and what BISM in passthrough mode will also deliver on PDW. Looking at it from another angle, however, it’s beneficial for Microsoft because all of these solutions provide more reasons for users to stick with Excel instead of moving to web-based or open source competitors, and they help cement Excel’s position as the BI client tool of choice; I suspect this is the more important consideration for Microsoft.
I wonder if other companies will be allowed to implement DAX as a query interface for their products, or be interested in doing so if it is possible?
Today, the SQLBits committee made a really big announcements about SQLBits 8: we’ve got Steve Wozniak coming! Yes, Steve Wozniak, co-founder of Apple and tech industry legend will be coming by virtue of his current job as Chief Scientist of our platinum sponsor Fusion-io. If you want to meet him and have the chance to win some seriously cool prizes, you’ll need to come to one of a series of events we’re running in conjunction with Fusion-io called “The Crappy Code Games”, a competition where DBAs and SSIS developers will compete to write the worst-performing code possible. There will be qualifying events in Manchester on March 17th and London on March 31st, and the third qualifier as well as the grand finale will be in the evening of April 7th at SQLBits in Brighton. Prizes include:
- Gold: A hands-on, high performance flying day for two at Ultimate High plus Fusion-io flight jackets
- Silver: One day racing experience at Palmer Sports where you will drive seven different high performance cars
- Bronze: Pure Tech Racing 10 person package at PTR’s F1 racing facility includes FI tees, food and drinks.
…plus iPods, Windows Mobile phones, X-box 360s, t-shirts and much more.
If you want to take part you’ll need to register, and since places are limited we suggest you do so fast; it’s also worth bearing in mind that you’ll have a better chance of reaching the final if you go to the London or Manchester qualifier. For registration for the games and more details (and to kill some time playing a cool retro game) go to: http://www.crappycodegames.com/. Note that registration for the games is separate from the main SQLBits registration.
UPDATE Simon has a lot more information on the event here: http://sqlblogcasts.com/blogs/simons/archive/2011/02/08/what-are-the-crappy-code-games-the-background.aspx
I got an email earlier this week from Eric Nelson telling me about a new Silverlight parameter prompting application for Reporting Services called “Prompts for Reporting Services” that he’s developed and open-sourced, and since it’s got some features that look useful for anyone building SSRS reports on SSAS I thought I’d share it here.
Some of the features Eric highlighted in his mail are:
Internal/Global Prompts: An internal prompt is just a regular parameter. A Global prompt is a report that’s parameters are used as a report (you can create the prompt once and reference it from multiple reports).
Tree Prompt: This prompt uses cascading parameters for fetching its data which makes it perform really well compared to an indented hierarchy parameter.
Cascading Search Prompt: This prompt fetches no data to begin with and only queries the cube when a search is executed. I have found this really useful when I parameter is required that has 1,000+ members that tend to lock up the web browser when rendering and are really hard for the user to navigate.
A few screenshots:
It’s available for download here:
As I’ve said before, I’m involved with the organisation of the SQLBits conferences here in the UK and at the moment the SQLBits committee is busy preparing for SQLBits 8 in April (make sure you come – it’s going to be great!). This eats up a lot of my spare time – spare time that I usually spend blogging – so I thought I’d kill two birds with one stone and blog about some of the BI-related stuff I’m doing for SQLBits (I’ve done this before but there’s plenty more mileage in this subject). It turns out a lot of the things SQLBits needs to do requires classic ‘self-service BI’: solve a business problem as best you can with whatever data and tools are to hand. It’s good to see things from the end user’s point of view for a change!
First of all, let’s take a look at scheduling: how can we make sure that we don’t run two sessions in the same time slot that are interesting to the same type of attendee? If attendees are put in a situation where they are forced to choose between two sessions they want to see they won’t be happy – we want to be able to create a schedule where there are as few difficult choices as possible. Unfortunately we don’t collect data about which sessions attendees actually go to, and even if we did it would be no use because of course by the time the session runs it’s too late to fix the agenda. However, well before the conference we allow people to vote for the ten sessions out of all those that have been submitted that they’d like to see (voting has just opened for SQLBits 8, incidentally), and we use this data to help us decide which ones make it onto the agenda; we can therefore use this data to help avoid overlaps.
This data can be visualised very effectively using NodeXL. To do this, I ran a SQL query on the SQLBits database that gave me every combination of two sessions that had been picked by the same user, so for example if a user had selected sessions A, B and C my query returned the pairs A-B, A-C and B-C. This gave me my list of edges for the graph and for the size of the edges I used the number of times the combination of sessions occurred, so I could see the most popular combinations. Unfortunately with 107 sessions on the list and thousands of edges, I got something that looked like one of my four-year-old daughter’s scribbles rather than a useful visualisation, so I decided to filter the data and look at one session at a time. Here’s what I got for my session ‘Implementing Common Business Calculations in DAX’:
Still not great, but at least with the thicker lines you can see where the strongest relationships are and when you select these relationships it highlights them and the nodes on either end, so you can read the names of the sessions. I then realised you could use the ‘dynamic filters’ functionality to filter out the weaker relationships, making it even easier to pick out the strongest ones:
So we can now see that the strongest relationships were with the sessions “You can create UK maps with SSRS 2008 R2” and “Data Mining with SQL Server 2008”. I’m still getting to grips with NodeXL which, I have to say, I like more and more and which deserves more visibility in the MS BI world.
Anyway, since this is a basket analysis problem I also thought of using the Data Mining Addin for Excel, but since I have Office 2010 64-bit I couldn’t. Luckily though the nice people at Predixion do have a version of their addin that works on 64-bit, and they gave me another eval license to use on my data. Getting useful results out of Predixion turned out to be ridiculously easy: I just copied the raw data into Excel, clicked the ‘Shopping Basket Analysis’ button on the ribbon and it spat out a pair of nicely-formatted reports. The first shows ‘Shopping Basket Recommendations’, ie if you select one session it recommends another one you might like:
And the second shows the most commonly-occurring ‘bundles’ of sessions that were picked together:
It almost feels too easy… but I think you can see that the results look correct and to be honest it’s much easier to do something useful with this than the NodeXL graph. When we close the voting for SQLBits 8 I’ll repeat the exercise and hand the results over to Allan, who’s in charge of speakers, and he’ll be able to use them to put together our agenda for Saturday April 9th.