Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for November 2010

Two client tools – Bonavista Dimensions and Varigence Vivid – and yet more idle speculation

with 8 comments

In early September I attended two webcasts introducing new client tools. I’ve been meaning to blog about them ever since but haven’t got round to it until now (so I have some apologising to do); as I’ve said before, I’ve given up writing reviews on this blog but both products have some interesting features and are therefore worth a closer look.

First of all, Varigence Vivid. It’s an Excel addin that does all the kind of complex query things that a proper SSAS client tool does but which Excel pivot tables don’t do; which is all very well, but there are plenty of Excel addins like this already. However its key selling point is this: unlike every other Excel addin client tool for SSAS, instead of trying to replace the native functionality completely it actually builds on and extends existing Excel pivot table functionality. This means that users who don’t have Vivid installed can still use worksheets and pivot tables created using it, which I think is pretty cool.

Second up, Bonavista Dimensions. It’s another Excel addin that can connect to SSAS but it can also create SSAS local cubes from a variety of data sources (I suspect if/when PowerPivot gets an API it will make sense to add support for creating PowerPivot models too). The main differentiating feature in this case is visualisation, and it supports a wide variety of Tableau-like charts which look very impressive; you can also export dashboards created in Excel up to a server to allow for web-based consumption, rather like Excel Services without the cost and hassle of Sharepoint.

Talking of Tableau, ever since it was launched I’ve thought Microsoft should buy the company – it would catapult MS into a genuine leadership position in BI, and almost incidentally solve the whole client tool problem for SSAS, PowerPivot and BISM (and incidentally, has anyone else noticed how much exposure Tableau is getting on Azure Datamarket?). The topic came up on Jen Stirrup’s blog recently in relation to Project Crescent and was dealt with very intelligently; unlike Jen, though, I don’t think Crescent is a reason for Microsoft not to buy Tableau. For a start Crescent comes from the SSRS team and if anyone in MS was going to buy Tableau it would be the Office group – and I don’t think they’d change their plans just because of what the SQL Server guys are doing. Can you imagine what a big deal it would be if Tableau appeared as a new tool in Office 2010? It would certainly be a major reason for many companies to upgrade, and therefore generate more cash for MS than Crescent will ever make – not that Crescent is bad, on the contrary it looks quite promising, but Office revenues are on a different scale to SQL BI. Hmm, however much sense it makes I’m not sure it will ever happen though…

Written by Chris Webb

November 23, 2010 at 10:45 pm

Posted in Client Tools

Missing Members and the Formula Engine Cache

with one comment

Continuing my occasional series on ways to prevent the Formula Engine cache from being able to cache values for longer than the lifetime of a query (ie forcing it to use ‘query’ scope instead of ‘global’ scope), here’s something new I found the other day: the presence of any ‘missing’ members in a query forces ‘query’ scope for the FE cache.

Take, for example, the following query in Adventure Works:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
{[Customer].[Country].&[Australia]
,[Customer].[Country].&[DoesNotExist]}
ON 1
FROM [Adventure Works]

When you run this query, assuming you’ve not altered any of the default property settings on Adventure Works, you’ll see results that include just one row for Australia. The country ‘DoesNotExist’ does not exist on the Country hierarchy of the Customer dimension; but you don’t get an error because of how the MDXMissingMemberMode property has been set (this blog post gives a good overview of what this property does).

However, if you regularly delete members from your dimensions and you rely on MDXMissingMemberMode to avoid errors in queries you’ve written that reference these members, there’s a hidden performance penalty. Here’s an illustration: if you add the following calculated measure onto the MDX Script of the Adventure Works cube -

CREATE MEMBER CURRENTCUBE.MEASURES.CACHEABLE AS 1;

And then clear the cache and run the following query twice:

SELECT MEASURES.CACHEABLE ON 0,
{[Customer].[Country].&[Australia]}
ON 1
FROM [Adventure Works]

If you run a Profiler trace that includes the Get Data From Cache event, the second time the query runs you’ll see the Get Data From Cache event appear because SSAS is retrieving the value of the single cell returned in the cellset from the Formula Engine cache:

However, if you write a query that includes a member that does not exist and do exactly the same thing, you’ll see that the FE cache is no longer used in the second query:

SELECT MEASURES.CACHEABLE ON 0,
{[Customer].[Country].&[Australia]
,[Customer].[Country].&[DoesNotExist]}
ON 1
FROM [Adventure Works]

So, clearly, the presence of a ‘missing’ member is preventing the use of global scope in the FE cache. And as I said, if you’re relying on this functionality in production, you’re going to be seriously limiting the effectiveness of FE caching and overall query performance will be worse.

Written by Chris Webb

November 22, 2010 at 9:26 pm

Posted in MDX

PowerPivot vs SSAS Quiz

with 9 comments

Last week, at the PASS Summit, I did a session on ‘Comparing PowerPivot with Analysis Services’. The aim of the session was to compare the two products in terms of positioning and functionality, and help people work out which tool would be appropriate for their product  – and the reason I submitted this session was because I’ve seen an awful lot of people over the last year who are confused about this issue, and it’s not an easy question to answer. Although there are many things that both tools do equally well, there are some things that PowerPivot is good at and that SSAS is not, and there are other things that SSAS is good at and which PowerPivot is not.

Anyway, to make the presentation a bit more fun I came up with the idea of creating a quiz (like the kind you find in women’s magazines) to help make the decision. It took the form of an Excel spreadsheet with a series of yes/no questions, and once the questions had been answered the spreadsheet would tell you which tool you should use. Since a number of people have since asked me to share the workbook, I’ve decided to make it available via the Excel Web App here:

http://cid-7b84b0f2c239489a.office.live.com/view.aspx/Public/PowerPivot%20SSAS%20Comparison/SSASvsPowerPivotQuiz.xlsx

All you need to do is answer each question by entering 1 under either the Yes or the No column, and then when you’re finished look in cell C67 for the answer. You might also want to download a local copy to Excel and play with it there, rather than edit the document online. The way it works is that each question has a weight attached to the yes or no answer, and that’s found in the hidden columns F and G. A positive weight favours SSAS, a negative weight favours PowerPivot; I should also point out that the weights aren’t always equal. So, for example, in the question about security, if you answer that everyone in your organisation should be able to see all your data that favours neither SSAS or PowerPivot, but if you answer that you do need to restrict access to data then that favours SSAS (because only SSAS has features like dimension and cell security).

Before anyone complains to me about the questions being stacked in favour of SSAS or PowerPivot (and I’d like to point out one more time that I am not some kind of PowerPivot-hating BI Luddite, I do like PowerPivot and I’m also excited about using BISM too, so there), I’m going to add the following disclaimer: these questions should only act as a guide, and I cannot guarantee that this worksheet will give the correct answer in every case. It only represents a personal opinion! Before you use it, I suggest you review the weights associated with each question and change them according to your own ideas. Oh, and before you show this sheet to the boss you might want to delete the pictures of hearts at the top…

Written by Chris Webb

November 19, 2010 at 2:50 pm

PASS Summit Day 3

with one comment

After the distraction of the last few days, here’s a quick post about what I saw on day three of the PASS Summit last week. In fact it wasn’t quite as exciting as the previous two days but I did find out a bit more about two products I was interested in: Impact Analysis & Lineage, and Data Quality Services.

In fact I didn’t find out much about Impact Analysis & Lineage – it didn’t get much exposure at all for some reason; that’s a shame because I think it’s potentially very useful service. It allows you to see what impact any changes you make in one place in your BI solution have elsewhere: for example, if you rename a column in a dimension table it could break SSIS packages, SSAS dimensions and SSRS reports, but today it’s pretty difficult to know what the impact of any changes will be. Impact Analysis & Lineage, as far as I gathered, is a service that crawls all the files associated with each part of your BI solution and builds up a list of dependencies between them to help solve this problem. I suspect it wasn’t demoed because it’s not finished yet.

We saw a lot more about Data Quality Services. DQS is a substantial new product that allows end users and BI developers to create rules about what data is valid in a given scenario, and then apply these rules to perform automated data cleansing; it’s not, apparently, a rebadged version of the Zoomix product that MS bought a while ago, it’s a lot more ambitious than that. Example scenarios include cleaning addresses by comparing them to a master address list, possibly sourced from the Azure Datamarket; using regular expressions to ensure that valid urls and stock ticker symbols were stored in a table containing information about companies; and using fuzzy matching on names and addresses to find groups of customers who live together in the same household. Although some people I talked to were a bit put off by the bug-ridden demo, I was quite impressed by what I saw – a lot of thought seemed to have gone into it and the UI looks good. I think there’ll be an SSIS component that will allow you to apply your rules within a data flow too, but that wasn’t shown.

Written by Chris Webb

November 16, 2010 at 9:21 pm

Posted in ETL

Performance Tuning SSAS 2008 Webinar

with 4 comments

I probably should have mentioned this a while ago, but on Thursday at 8-9pm GMT (which is 3-4pm EST) I’m doing a webinar for the guys at Pragmatic Works, on “Performance Tuning Analysis Services 2008″. It’s going to be a basic-level introduction to the subject, covering topics such as partitioning, aggregations and MDX calculations. You can sign up here:
https://www1.gotomeeting.com/register/824246961

 

 

Written by Chris Webb

November 16, 2010 at 1:32 pm

Posted in Events

PASS Summit Day 2: The Aftermath

with 21 comments

Well, that last blog post sparked a bit of a discussion, didn’t it? Indeed, I’ve spent the last few days doing a lot of talking to various different groups of people – PASS attendees, fellow MVPs, Microsoft – about was or wasn’t said in the various announcements made at PASS, what I did or didn’t mean, and how people are interpreting or misinterpreting the news. And now it’s time to follow up with another blog post to explain myself better and say what’s happened since Thursday; you may also want to read this official statement about the roadmap from TK Anand here before carrying on reading this post:
http://blogs.technet.com/b/dataplatforminsider/archive/2010/11/12/analysis-services-roadmap-for-sql-server-denali-and-beyond.aspx

First of all, let me start by making it clear that Analysis Services overall is alive and well, and in fact has a greatly increased role to play in the BI stack in Denali. My original post pretty much said as much. Some of the confusion, though, stems from the fact that ‘Analysis Services’ in Denali will have two distinct parts:

1) The UDM, or Analysis Services cubes, which is what we have today. Some people refer to it as MOLAP SSAS but I don’t like this description: it highlights the storage mode when in fact I consider its distinguishing feature to be its multidimensional view of the world. Personally I couldn’t care less about storage modes and can’t wait to see Vertipaq replace MOLAP, but I do care about multidimensionality and its advantages when it comes to BI – some BI applications, typically ones which need complex calculations, can only be built using a true multidimensional OLAP database. I’d say anyone that thinks that the point of using the UDM is because MOLAP is (or has been) faster than relational database engines has completely missed the point. However, multidimensionality is complex and somewhat inflexible and that’s what puts lots of people off.

2) The new BI Semantic Model, BISM. This is what’s new in Denali, and features a more relational, tabular way of modelling data as well as the new Vertipaq storage engine. BISM is a little bit multidimensional (it is after all still SSAS under the covers) but not much: that’s exactly why it’s easier to use, more flexible and appropriate for a wider range of BI applications. It will be a massive asset to the MS BI stack and make building many types of BI applications quicker and easier. It will probably not do everything that the UDM does, though, precisely because it is not as multidimensional.

The point I was trying to make in my original post was that the announcements made at PASS, as I and everyone I spoke to there interpreted them, made me very concerned (to say the least) for the future of the UDM and the multidimensional model. First of all there was the news that Microsoft was putting all of its development efforts into Vertipaq and BISM, while the UDM was (for yet another release) getting very few obvious improvements. Then there was the news that Project Crescent was only going to support BISM as a data source and not the UDM, which made it seem like the UDM was a second class citizen in this regard. And finally there was a lack of clarity in the roadmap which meant I wasn’t sure whether BISM was meant to replace the UDM or not, or whether BISM would ever be able to do the same things that the UDM can do today.

This is what caused all the commotion, and I’m pleased to say that after a lot of what’s generally referred to as ‘free and frank discussion’ behind the scenes the guys at Microsoft understand what happened. In part there was a failure of communication because I don’t think the Analysis Services team ever meant to send out a negative message about the UDM and were a bit surprised at my reaction. TK’s recent post that I link to above is a very clear and positive statement about the future of the UDM. But words need to be backed up by actions and Microsoft know there need to be some changes to the Denali roadmap so that customers receive the right signals. As a result I hope to see a little bit more love shown to the UDM in Denali as a result, to prove to all of us who have invested in the UDM to show Microsoft still cares about it; I also know that Microsoft are looking again at ways that Crescent can work with existing UDM applications; and I hope to see a clearer long-term vision to show how anyone investing in the UDM today will have the option, if they want, to move smoothly over to BISM when they feel they are ready. An argument about semantics is in no-one’s interests (I couldn’t help thinking of this); what I care about is that I’ll have all the cool new stuff that BISM will give me and I’ll still be able to do everything I can do today in the UDM, and that we’ll have all the power of relational and multidimensional modelling when we’re building our BI solutions.

So let’s be positive. There was a bit of a bust-up, but we’re all friends again now and I think the SSAS community is better off for having had this all come out now rather than later – and the fact that we can even have this type of discussion shows the strength and vibrancy of the community. I’m not afraid of change and I know it has to happen; I’m confident that the changes we see coming in Denali will be for the better. However I’m also a lot happier now that existing Microsoft BI customers have had this reassurance that they won’t be left stranded by these changes.

 

Written by Chris Webb

November 14, 2010 at 11:26 pm

Posted in Analysis Services

PASS Summit Day 2

with 72 comments

UPDATE – after you read this post, you should also read the follow-up here:
http://cwebbbi.wordpress.com/2010/11/14/pass-summit-day-2-the-aftermath/

The last few days have been quite emotional for me. I’ve gone from being very angry, to just feeling sad, to being angry again; I’m grateful to the many members of the SSAS dev team who’ve let me rant and rave at them for hours on end and who have patiently explained their strategy – it’s certainly helped me deal with things. So what’s happened to make me feel like this? I’ll tell you: while it’s not true to say that Analysis Services cubes as we know them today and MDX are dead, they have a terminal illness. I’d give them two, maybe three more releases before they’re properly dead, based on the roadmap that was announced yesterday. And this is incredibly hard for me to write because I’ve spent the majority of my working life, about 12 years now, working with them; I live and breathe these technologies; and I have built up a successful consulting business around them. Neither is it true to say that they are struggling in the marketplace: on the contrary they have gone from strength to strength even in spite of the fact that, apart from the important performance improvements in SSAS 2008, we haven’t had any substantial new functionality since SSAS 2005. SSAS has been the most popular OLAP tool on the market for years, has loads of very happy users, and continues to be used on new projects all the time. Hell, on stage the other day at the keynote there was a guy from Yahoo talking about his 12TB cube, which loaded 3.5 billion rows of data per day, and which he was planning to grow to 40TB! The SSD revolution has given SSAS cubes a massive boost. So this is one very successful product and no other company would be allowed to do what Microsoft is proposing to do with it because if they did customers would be up in arms, calling their account managers, and the account managers would go straight to the CEO and demand that the product was not only retained but given the development resources it deserves. But this is Microsoft we’re talking about, and they have the luxury of being able to ignore this kind of pressure from their customers and partners and do whatever they want. And they have quite convincing reasons for doing what they’re doing, albeit ones I’m having severe difficulty coming to terms with.

So let me get round to explaining in detail what was announced yesterday at the PASS Summit. Quite a few BI related things were aired that I won’t talk about in detail: the move to Visual Studio 2010 for all BI development, and the integration of SQL Management Studio functionality into VS2010 too; FileTable; the Master Data Services Excel addin; Data Quality Services; loads of new SSIS stuff including impact analysis and lineage; and there was yet more buzz on Project Crescent. But I’m going to concentrate on what came out in TK Anand’s presentation on the future of Analysis Services. Here are the main points:

  • The BISM – BI Semantic Model – is the name for the new type of Analysis Services database that gets created when you publish a PowerPivot model up to the server. It’s SSAS running in the special in-memory mode, and SSAS instances will either work in this mode or in regular MOLAP mode. In Denali we’ll be able to install a standalone instance of SSAS running in in-memory, BISM mode without needing Sharepoint around.
  • We’ll be able to create BISM models in BIDS, so we get full support for stuff like source control. The experience is very similar to what we get in PowerPivot today though; one of the points that was made again and again yesterday was that they wanted to make things as easy as possible for BI developers; the implication is that today the learning curve for SSAS is too steep, which is why many database people have been put off using it; I would argue that any rich, sophisticated tool is going to have a learning curve though and I bet nobody would dare to go to the C# community and tell them that C# is too intimidating, and wouldn’t it be nice if they had the friendliness and flexibility of VBA!
  • BISM models are the UDM 2.0. Everything that the UDM was meant to do in SSAS 2005, and didn’t, are serious objectives here: BISM aims to replace traditional SSAS and SSRS report models, and be good for the kind of low-level relational reporting that SSAS today simply can’t do as well as the high-level, aggregated data it handles so well today. Business Objects universes were mentioned several times as being a very close comparison. Project Crescent will only work against BISM models.
  • BISM models will support MDX querying in some cases (see below) but DAX has grown to become a query language. We only had a brief look at it and basically it seems like you use a CalculateTable DAX function to return a tabular result set. You can also define query-scoped calculations just as you do with the WITH clause in MDX today. That’s a gross simplification, but you get the idea. DAX queries do not do any pivoting, so you only get measures on columns; it’s up to the client tool to do any pivoting. It was remarked that this made it much easier for SSRS to consume. SSRS couldn’t deal with multidimensional resultsets, and so instead of fixing this they made SSAS less multidimensional!
  • BISM models are massively scalable. They have no aggregations, there are no indexes to tweak, but they demoed instant querying on a 2 billion row fact table on a fairly average server, roughly the same spec that I see most people using for SSAS installations today. They’re achieving massive compression on the data, often anything up to 100 or more times. Of course all the data has to sit in memory after it’s been loaded but they’re going to support paging to disk if it won’t; we’ll also be able to partition tables in the BISM so we can control what gets loaded when. There will also be perspectives.
  • Miscellaneous PowerPivot functionality that was demoed included: a nice new window for creating KPIs easily; new DAX functions for flattening out parent/child hierarchies, similar to what the ‘Parent Child Naturaliser’ does today in BIDS Helper (plenty of people, including me, pointed out that this was not proper support for parent/child hierarchies); a new RANKX function for doing rank calculations; Distinct Count will be a native engine feature, and you’ll be able to have as many distinct count measures on a table as you want; drillthrough will also be supported.
  • There will be role-based security in BISM, where you can secure either tables, rows or columns.
  • BISM models will also be able to operate in ‘passthrough’ mode. This is essentially ROLAP done right, and a lot of work has gone on around this; in Denali it will only be available for SQL Server and only if you’re issuing DAX queries, not MDX. In the future other RDBMSs will be supported, plus possibly MDX querying of BISM when it’s in passthrough mode. Essentially in this scenario when you query the model your query is translated direct to SQL, and the results returned are (as far as possible) passed back to you directly with the minimum of interference. In some cases, for example where there are calculations, BISM will do some stuff with the resultset before it hands it over to you, but the aim is to push as much of the logic into the SQL query that it generates. If it works well, it sounds like at long last we’ll have a serious ‘realtime’ BI option, though I’m still not sure how well it will perform; I suppose if there are Vertipaq indexes inside SQL Server and/or if you’re using PDW, the performance should be good.
  • There are only going to be a few improvements for regular, MOLAP-based SSAS – four bullet points in TK’s presentation! They are: the 4GB string store limit has been fixed; we’ll get XEvents and better monitoring functionality; Powershell support; and there’ll be some performance, scalability and reliability improvements.
  • BISM will not handle advanced calculations really in Denali. Yes, you’ll be able to do cool stuff in DAX expressions, but you won’t get the equivalent of calculated members on non-measures dimensions (so no time utility dimensions) or MDX Script assignments. ‘Advanced business logic’ is on the roadmap for post Denali, whatever that means exactly; the aim will be to support things like assignments but not necessarily exactly what we have now. To me this is going to be one of the main reasons why people will not adopt BISM in Denali – most enterprise customers I see have pretty complex calculations.
  • Role-playing dimensions, translations, actions, writeback and a better API (AMO will still work for creating BISM objects in Denali, but it is going to be difficult to work with and an object model that’s more closely aligned to BISM concepts will be needed) are all planned for beyond Denali.
  • There are going to be some tools to help migration from SSAS cubes to BISM, but they won’t get you all the way. Some redesigning/rethinking is going to be needed, and it’s likely that some of the things you can do today with SSAS cubes you might never be able to do in the same way with BISM.

MS are clear that BISM is the priority now. While MOLAP SSAS isn’t deprecated, the efforts of the SSAS dev team are concentrated on BISM and PowerPivot and we shouldn’t expect any radical new changes. I asked why they couldn’t have just kept SSAS as it is today and bolted Vertipaq storage on as a new storage mode (we will, of course, be able to use SSAS cubes in ROLAP mode against SQL Server/PDW with Vertipaq relational indexes) but I was told that it was seriously considered, but didn’t turn out to be easy to implement at all. The other question I asked was why they are abandoning the concept of cubes and explicitly multidimensional ideas in favour of a simpler, relational model, and they told me that it’s because multidimensionality put a lot of people off; I can see that’s true – yes, a lot of people have been converted to the OLAP cause over the years, but we all know that many relational people just can’t stomach/understand SSAS today. The vast majority of people who use SSRS do so directly on relational sources, and as we know while there’s a great demand for things like Report Builder, Microsoft has had nothing that worked really well to enable end user reporting in SSRS; BISM, as I said, is aimed at solving this problem.

So this is a radical departure for Microsoft BI, one that could go badly wrong, but I can understand the reasoning for it. I’ve been impressed with the technology I’ve seen over the last few days and I know that if anyone can pull this off, the SSAS dev team can. However, the fact remains that in the short term BISM models won’t be able to handle many enterprise projects; SSAS cubes, which can, will be seen as a bad choice because they have no long-term future; and we’re all going to have to tie ourselves in knots explaining the roadmap and the positioning of these products to all of our customers. There’s going to be a lot of pain and unpleasantness over the next few years for me and all Microsoft BI partners. Hohum. As I said, I’ve felt pretty angry over the last few days about all this, but now that’s turned to resignation – I can see why it’s happening, it’s going to happen whether I like it or not, and whether I kick up a fuss or not (I did consider trying to whip up a kind of popular rebellion of SSAS users to protest about this, but doubt it would have had any impact), so I might as well get on with learning the new stuff and making sure I still have a career in MS BI in two or three years time.

What do you think? I would really be interested in hearing your questions and comments, and I know the guys at Microsoft who read this blog would also want to see them too. I’m going to be in Seattle for the next two days and I’ll have the chance to pass on any comments that you leave here to the dev team, although I suspect some of them might be too rude to repeat. I certainly feel better just for having written this post and gotten things off my chest, and maybe you will too.

Written by Chris Webb

November 11, 2010 at 1:35 pm

Posted in Analysis Services

Follow

Get every new post delivered to your Inbox.

Join 3,144 other followers