Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for September 2010

Browse the SQLBits Agenda with PivotViewer

with 2 comments

I like the Silverlight PivotViewer control; I like SQLBits; so naturally I like something that combines the two. And that’s what the nice people at Xpert360 have done: they’ve taken the SQLBits agenda and put it into the Silverlight PivotViewer control. Take a look here:
http://xpert360.wordpress.com/2010/09/25/sqlbits-7-pivotview-demo-collection/
http://xpert360.com/SQLBits/SQLBits.htm

Now why isn’t my photo showing against my sessions?

Written by Chris Webb

September 27, 2010 at 12:04 pm

Posted in Events

Excel 2010 and Windows HPC Server 2008 R2… and PowerPivot?

with 6 comments

Something I first heard about last year was the ability to run Excel calculations in parallel on a Windows HPC cluster; it’s been in the news again recently, with the release of Windows HPC Server 2008 R2:
http://www.theregister.co.uk/2010/09/20/microsoft_hpc_server_r2/

I did some digging and found the following white papers that give more details on how exactly you can do this:
Accelerating Excel 2010 with Windows HPC Server 2008 R2: Technical Overview
Accelerating Excel 2010 with Windows HPC Server 2008 R2: Building VBA applications and workbooks for a Windows HPC Cluster

Here’s a summary of what you can do, taken from the first paper above:

As models grow larger and workbooks become more complex, the value of the information generated increases. However, more complex workbooks also require more time to calculate. For complex analyses, it is not uncommon for users to spend hours, days, or even weeks completing such complex workbooks. The problem this white paper addresses is how organizations can reduce the calculation time required for long-running workbooks to give users faster access to business-critical information.

One solution is to use Windows® HPC Server 2008 R2 to scale Office Excel 2010 calculations across multiple nodes in a Windows high-performance computing (HPC) cluster in parallel. This paper presents three methods for running Office Excel 2010 calculations in a Windows HPC Server 2008 R2 based cluster: running Office Excel 2010 workbooks in a cluster, running Office Excel 2010 user-defined functions (UDFs) in a cluster; and using Office Excel 2010 as a cluster service-oriented architecture (SOA) client.

Windows HPC Server 2008 R2 now enables running multiple instances of Office Excel 2010 in a Windows HPC cluster, where each instance is running an independent calculation or iteration of the same workbook with a different dataset or parameters. This solution allows near-linear performance increases for iterative spreadsheets, such as those running a Monte Carlo algorithm.

Running Office Excel 2010 UDFs in a cluster is a new ability of Office Excel 2010 for running complex or time-consuming UDFs—functions contained in Excel link libraries (XLLs)—in a Windows HPC Server 2008 R2–based cluster. If a workbook includes long-running UDFs, moving calculations to the cluster can result in significant performance improvements.

Using the Windows HPC Pack software development kit (SDK), Office Excel 2010 can function as a cluster SOA client to run complex and time-consuming calculations across a set of servers in a Windows HPC cluster. Any Microsoft .NET or Component Object Model (COM) application can use the Windows HPC Pack SDK: This paper uses Microsoft Visual Studio® Tools for Office (VSTO) to construct an Office Excel 2010 add-in that connects to the cluster as a cluster SOA client.

From a purely Excel point of view, this is very interesting – and it’s no wonder that Excel power users everywhere are drooling over it. The obvious next question is: can I use PowerPivot with this as well? From reading the papers I don’t see why not. I’m not talking about simply taking a pivot table connected to a single PowerPivot model and somehow making it all run faster – I don’t think that would work (yet?) – but I can imagine a scenario where you used VBA and the Excel cube functions, which in turn got the value of DAX calculated measures in PowerPivot models distributed over a cluster, as part of a larger solution for example. Now how cool would that be? Sadly I don’t have a HPC cluster to test this on, but if anyone does have one and has tried this, please let me know…!

Written by Chris Webb

September 22, 2010 at 9:30 pm

Posted in Excel

Data Mining SQLBits Registration Data With Predixion Insight

with one comment

As you’re probably aware, in my spare time I’m one of the people who help organise the SQLBits SQL Server conference in the UK. For the last few conferences I’ve worked on marketing, which basically means writing all those SQLBits emails that get sent out every week and thinking of ways to get ever more people to come along to the conference. Given that we handle all our registrations via RegOnline and can download all of the data we capture during the registration process I thought I ought to practice what I preach and do something BI-ish with it.

The first thing I did, of course, was to load all the data into PowerPivot. That worked well but, let’s be honest, you’ve already read 500 blog posts about loading data into PowerPivot and don’t want to read another one. Then I remembered the emails I’d been getting from Predixion, the data mining startup founded by amongst others Jamie MacLennan (who used to be on the SSAS data mining dev team), about their beta and thought I’d see what I could do with it.

Predixion have got a number of videos on their site to help you get started:
http://www.predixionsoftware.com/spv-90.aspx
… and Kaspar has already blogged about what he’s done with Predixion twice (here and here), as has Mark Tabladillo, so I won’t  repeat any of what they say and just concentrate on my results; suffice to say that anyone that used the free SSAS data mining addin will feel very at home with using Predixion Insight – in many respects it’s a much improved version 2.0 of that tool.

The most useful results I got were using the ‘Detect Categories’ tool on a table containing all my registration data to identify groups of attendees with similar profiles. For obvious reasons I can’t show you the raw data or anything derived from it that might identify individual attendees, but here’s a screenshot of the Excel report that got generated for me:

image 

This report also shows data on the first cluster or category that the tool identified, and which I named ‘Hardcore SQLBits Fans’. You can see that these people are extremely likely to have registered very early: we opened registrations at the end of July, the ‘RegDate’ column shows the date they registered (DaysToConference shows the number of days from RegDate to the conference date), and this group was highly likely to have registered within two weeks of registration opening. Similarly they are very likely to be attending the full conference and not just individual days, and as a result are spending a lot of money with us. We’ve always known there’s a core group of people who come to SQLBits regardless of where or when it is, and this is that group – they’re our biggest fans and therefore we’ve got a duty to keep them happy!

Here’s another group, equally important for us, which I named ‘European Big Spenders’:

image 

As you can see, the people in this group also spend a lot of money with us (perhaps more than the hardcore fans); registered slightly later, in mid-August; and are coming outside the UK. We made a special effort to attract more attendees from Europe this time so this is all good to see, but interestingly this group is very likely not to be using a discount code to get money off the price of the paid parts of the conference. I spent hours setting up discount codes for various European user groups and it doesn’t look like this strategy has paid off; maybe the fact that these people are coming is totally unconnected to any of the efforts we made?

Last of all, let’s look at the ‘Delayed Friday’ group:

image

The people here are only coming for the Friday and not attending the training day; consequently they’re spending less money overall. They registered in mid-to-late August and the discount codes used (which I’ve had to black out, sorry) suggest they heard about us from user groups, emails that our sponsors sent out to their customers, or places like SQL Server Central. They’re not such rabid fans of SQLBits that they would come automatically, but they’ve been successfully marketed to, seen what we’ve got to offer and registered. These are the attendees we’ve had to work hard to attract, and so the ones we need to understand best to ensure they turn into the hardcore SQLBits fans of the future!

So, as you can see, with one click of a button Predixion Insight came up with loads of interesting, useful information for me – information I probably wouldn’t have found if I was browsing through the data myself using SSAS or PowerPivot. And this is only a taste of what it can do; I didn’t even try out much of the more advanced functionality. I can imagine Predixion’s cloud-based data mining offering is going to generate a lot of interest, and maybe after many years the time has come for data mining for the masses? 

Written by Chris Webb

September 20, 2010 at 9:00 pm

Posted in Data Mining

icCube

with 5 comments

A few months ago I got a message via LinkedIn inviting me to check out a new OLAP server called icCube that supported MDX:
http://www.iccube.com/

Of course, me being me, this was an invitation I couldn’t refuse and I duly downloaded a copy and then had a chat on Skype with some of the guys from the company. This blog post isn’t a review (in keeping with my new policy of not doing reviews here) but I will say that even though it’s very early days for them – they haven’t released a version 1.0 just yet – they’re making good progress and have some interesting ideas on where to take the product too. Certainly the version I played around with worked fine and was very straightforward to use, and it’s worth checking out if you’re looking for a low-cost OLAP server to use or resell.

The thing that really interested me, though, was their willingness to add new functionality to MDX. Maybe because we’ve not seen anything new in MDX from Microsoft for five years now (will we ever?) that I get so excited by things like this… Anyway, one thing they’ve done is add the ability to declare new MDX functions in MDX itself, similar to something I asked for in SSAS MDX in a while ago:
http://www.iccube.com/support/documentation/mdx/DeclaredFunctions.html

Here’s an example:

​WITH FUNCTION REVERSESET(MYSET) AS
​GENERATE(MYSET AS TEMP, {MYSET.ITEM(COUNT(MYSET) – TEMP.CURRENTORDINAL)})
SELECT {[Measures].[Amount]} ON 0,
REVERSESET(​ [Time].[Calendar].[Month]​.MEMBERS)
ON 1
FROM ​ [Sales]​

What I’m doing here is declaring a function called REVERSESET that takes a set as a parameter and then returns the same set with the members in reverse order. I can see how this would be very useful for improving the readability and maintainability of code where the same logic is repeated across a number of complex MDX calculations.

Oh well, we can keep our fingers crossed and hope that the SSAS dev team haven’t decided to let MDX stagnate in favour of building up DAX.  

Written by Chris Webb

September 13, 2010 at 10:27 pm

Posted in MDX

SQLCat ROLAP White Paper

with 3 comments

As you may or may not have seen, a month or so ago the SQLCat team published a white paper on optimising the performance of ROLAP inside SSAS. You can download it here:
 
I wouldn’t normally bother commenting on something like this, not because it’s not worth reading, but because so much good stuff comes out of the SQLCat team you should be reading it all anyway! In summary, the paper shows some useful tricks to make ROLAP run faster and points out not only that ROLAP is the only option at data volumes of more than 5TB but (surprisingly) it can sometimes outperform MOLAP at smaller data volumes.
 
That’s not what I wanted to talk about though. While I was reading this paper it struck me that something that was not mentioned was Parallel Data Warehouse, which is going to be released relatively soon I assume. If you’ve got a lot of data (and cash) and you want to build a data warehouse on the Microsoft platform then you’re going to want to want to look at it, and if you’re investing this much in one area of the MS BI stack then it makes sense to invest in other areas like SSAS. And as you probably know ROLAP has always been a bit of a pain point for SSAS, so maybe the prospect of customers building ROLAP cubes on Parallel Data Warehouse has spurred the SQLCat team on to do some research in this area? Moving forward it would be good if the dev team spent some time optimising SSAS in ROLAP mode for both regular SQL Server and Parallel Data Warehouse because I suspect that there’s a lot of scope for optimisation. It will also be interesting to see where the dividing line is drawn for customers with large data volumes: should they be using ROLAP on Parallel Data Warehouse, or will the new super-scalable Vertipaq engine be able to handle multi-terabyte volumes even better?

Written by Chris Webb

September 10, 2010 at 6:07 pm

Posted in Analysis Services

BI Survey 9

with 11 comments

I’ve just got hold of my freebie copy of the latest edition of the BI Survey and, as I do every year, I get to blog about some of the juicier SSAS-related findings it contains. So what’s interesting in the BI Survey 9…? Here are some points/thoughts –

  • The survey finds that SSAS is most often used in companies with below the average size both in terms of the number of employees and in turnover. It’s also most popular in IT companies and with technical (as opposed to business) users. Note that this doesn’t mean that SSAS isn’t used in larger companies, because the survey shows it is most likely to be evaluated in the largest companies and (as we see below) it has a very good win rate when it is evaluated. It’s just that it is used by a much wider range of company sizes, unlike many products which are only used in the largest companies. This accounts for its dominant market share. This also, I think, is a result of Microsoft’s use of partners and influence over internal IT departments to sell its products, as opposed to large, expensive teams of salespeople and consultants, and also its licensing strategy. 
  • Only 1% of SSAS users in the survey were using AS2K; 45% were using 2005 and 48% 2008. This tallies with my experience out in the field; in fact I can’t remember when I last saw an AS2K installation in production. Good to see these levels of migration to the latest versions.
  • SSAS comes third in the rankings for the percentage of employees in an organisation using using a BI tool: on average 26% of employees use it (SSRS comes second in the rankings at 30%). This is surely ‘BI for the masses’ in action and however much the likes of me might moan about the shortcomings of Excel as a client tool, high usage like this is only possible because everyone already has Excel on their desktop so there are no extra costs involved in rolling BI out to large numbers of users.
  • Interestingly SSAS is one of the least likely products to undergo a formal evaluation when buying BI tools, but when it does get a formal evaluation it still has a very respectable 70% win rate, although this seems to be decreasing over time.
  • SSAS has the third-lowest number of technical problems reported and the second-lowest number of complaints about reliability. My feeling is that while SSAS was never all that buggy, at least in comparison to other software, it seems to have got even better recently; indeed it’s been a while now since I came across a really nasty bug. Maybe that’s because all the dev team have been away working on PowerPivot?
  • As far as SSAS client tools go, Excel again takes the top spot with 76% of users using it. Depressing but believable: 22% of users still have Proclarity, and are probably wondering how to migrate to something else. More surprising is that almost 25% of users claim to be using Excel Services – I know it’s out there, but I haven’t seen a single customer of mine use it yet.

That’s probably enough – if you want to know more, go out and buy the survey! But the generally positive ratings that SSAS has received has cheered me up somewhat; it’s always nice to see the product your livelihood depends on getting good reviews.

Written by Chris Webb

September 9, 2010 at 11:46 pm

Posted in Analysis Services

One Cube vs Multiple Cubes

with 25 comments

One of the questions discussed in the book that Marco, Alberto and I wrote last year, “Expert Cube Development with SSAS 2008” (available in all good bookshops, folks!) was whether, if you have multiple fact tables, you should create one big cube with multiple measure groups or multiple cubes each with a single measure group. While I still stand by what we wrote then, I recently took part in an interesting debate on this subject in the MSDN Forum with Akshai Mirchandani from the dev team about the pros and cons of each approach where some interesting new details came to light:

http://social.technet.microsoft.com/Forums/en-US/sqlanalysisservices/thread/681e59bd-93ca-4a91-9f26-8ed96e825553

Here are the main points you need to consider when deciding whether to use the single cube approach or the multiple cube approach. In favour of the multiple cube approach:

  • Having multiple, smaller cubes may result in faster query performance than one large cube in some cases, especially if your fact tables have very different dimensionality. This was true in SSAS 2005, and while it’s less obvious in 2008 it’s apparently still there. This was what I’d previously not been sure about: I’d heard rumours about this, and seen it happen in some cases myself with 2005 – although in other cases when I’d tested this out I’d seen no difference in performance – and I wasn’t sure what the situation was with 2008. You’d need to test the two approaches yourself on your cubes and queries to be sure. Here’s what Akshai says on the matter:

    if you keep adding lots of dimensions to a cube, then the virtual space of the cube grows — it does not add to the storage cost, but it does hurt formula engine performance in some scenarios because the cell coordinates are based on the number of attributes in the cube space. Increasing the number of attributes in the cube space will start costing performance in lots of small ways and result in performance regressions. Adding lots of unrelated measure groups would result in you adding lots unrelated dimensions to the cube space and cause a performance slowdown — if you had 10 measure groups and they all shared lots of common dimensions, then one cube makes the most sense…

    …As I already explained… it affects the sizes of data structures inside the formula engine that are based on the number of attributes in the cube space. When those data structures get larger, there is an incremental cost that can add up (depending on your calculations and query patterns).

    For example, you see the Query Subcube Verbose events in Profiler — they show you the subcubes that are used for accessing measure groups. There are similar subcubes that are used for calculating formulas and cell coordinates — all those subcubes get wider and wider as you start adding more attributes into the cube. The cost of accessing and indexing those data structures is what we’re talking about here. If adding new measure groups doesn’t require adding new attributes/dimensions, then there is no problem…

    We had measured the difference before 2005 shipped for some real customer cubes and found there there was a noticeable performance improvement to split up into multiple cubes…

  • While it is possible to apply dimension security to the Measures dimension, it is much easier to allow or deny access to a cube with the multiple cube approach than it is to apply security to all the measures in a measure group using the single cube approach.
  • Having multiple, simpler cubes can be much more user friendly than one monster cube with loads of dimensions and measure groups. If you have Enterprise Edition you can of course use Perspectives to counter this, but if you are using Standard Edition then Perspectives aren’t available.
  • Maintenance can be easier and less disruptive with multiple cubes: if you need to make changes to a cube while users are querying it, you might end up invalidating users’ connections and dropping caches. With one cube the chances of this disruption affecting more users increases.
  • It’s easier to scale out with multiple cubes: if you find your server is maxing out, you can simply buy another server and distribute your cubes equally between the two. With a single cube approach you end up having to look at (admittedly not that much) more complex scale-out scenarios like network load balancing.

On the other side, here are the arguments in favour of the single cube approach:

  • If you ever need to work with data from two fact tables in the same query or calculation, or if you think you might ever need to in the future, you should go with the single cube approach. The two options for cross-cube querying, linked measure groups and the LookUpCube MDX function, should be avoided. Linked measure groups are a pain to manage, carry a slight query performance overhead, and can result in the same MDX calculations being duplicated across the original cube and the cube containing the linked measure group (which means maintenance becomes more difficult). The LookUpCube function is probably the worst MDX function to use in a calculation from a performance point of view and should be avoided at all costs. So a single cube is the only feasible option.
  • Even if your users tell you they will not ever need to analyse data from two fact tables in the same query, be prepared for them to change their minds. In my experience, SSAS projects have a tendency to grow in complexity over time, and cubes that start out simple in a first release often grow lots of new functionality as time goes on – and the more successful the project, the quicker things get complicated. As soon as your users see what’s possible with SSAS they will start to have new, more ambitious ideas about the kind of analysis they want to do with their data, and it’s very likely that they will realise they do need to do cross-measure-group queries and calculations. If you started out on the multiple cube approach and then this happens you will have no choice but to use linked measure groups, and as I said this can make maintenance difficult; using the single-cube approach from the start means you won’t have this problem.

My personal preference is to use the single cube approach by default, and then move to multiple cubes if there are pressing reasons to do so, for example if query performance is a problem. This might seem a bit strange given the number of reasons I’ve given for the multiple cube approach, but frankly the need to support cross-measure-group querying and calculations trumps them all. As I said, if you need to do it (and 99% of the time you will), or you even half suspect you might need to do it sometime in the future, you have to go with the single cube approach. That said, I know other people are more inclined to the multiple cube approach than I am and to a certain extent it’s a matter of taste.

Written by Chris Webb

September 2, 2010 at 12:15 am

Posted in Analysis Services

Follow

Get every new post delivered to your Inbox.

Join 2,859 other followers