Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for March 2009

SQL Solutions OLAP Heartbeat and OLAP Performance Advisor

leave a comment »

Hmm, you wait for years for commercial tools for monitoring Analysis Services (the only one I’d ever seen before was Companion for Analysis Services from SQLMinds) and then two come out at once. One of these tools I’ll be blogging about towards the end of this week, hopefully – I’ve had a sneak preview and it looks very cool – but today I found out the following from SQL Solutions:

I’ll download both and give them a thorough test as soon as I can.

Written by Chris Webb

March 31, 2009 at 2:36 pm

Posted in Analysis Services

EXISTING AND NONEMPTY

with 6 comments

Here’s an interesting MDX nugget. The other day I was working on something similar to the following query, that finds the count of the number of customers who bought something in the current month as well as the previous month:

WITH
MEMBER MEASURES.TEST AS
COUNT(NONEMPTY(
NONEMPTY(
EXISTING [Customer].[Customer].[Customer].MEMBERS
, [Measures].[Internet Sales Amount])
, ([Measures].[Internet Sales Amount], [Date].[Calendar].CURRENTMEMBER.PREVMEMBER))
)
SELECT MEASURES.TEST
*
[Product].[Subcategory].[Subcategory].MEMBERS ON 0,
[Date].[Calendar Year].[Calendar Year].MEMBERS
*
[Customer].[Country].[Country].MEMBERS ON 1
FROM [Adventure Works]

On my laptop, this runs in almost 16 seconds – not good. What I was doing here was making the assumption that it was a good idea to do the EXISTING first, to filter the set of customers down as much as possible, then do the two NONEMPTYs. In fact, what I found was that by moving the EXISTING to after the NONEMPTYs had been evaluated the query ran much faster:

WITH
MEMBER MEASURES.TEST AS
COUNT(EXISTING 
NONEMPTY(
NONEMPTY(
[Customer].[Customer].[Customer].MEMBERS
, [Measures].[Internet Sales Amount])
, ([Measures].[Internet Sales Amount], [Date].[Calendar].CURRENTMEMBER.PREVMEMBER))
)
SELECT MEASURES.TEST
*
[Product].[Subcategory].[Subcategory].MEMBERS ON 0,
[Date].[Calendar Year].[Calendar Year].MEMBERS
*
[Customer].[Country].[Country].MEMBERS ON 1
FROM [Adventure Works]

On my laptop now, the above query executes in 3 seconds. Why is this? To be honest, I’m not completely sure… looking in Profiler and Perfmon I see differences, but not many; these things are better left to the likes of Mosha who know what’s going on inside in the formula engine. But anyway I thought I’d post this up because it’s a common form of calculation – so if you are using NonEmpty and Existing together, do the NonEmpty first and then the Existing.

Written by Chris Webb

March 31, 2009 at 12:27 pm

Posted in MDX

SQLBits IV

with 3 comments

So, yet another SQLBits finished, and judging by the feedback we got it was the best yet. Certainly we had our highest ever attendance and the sponsors all seemed very happy (which is important because, after all, they’re the ones paying for it); we’ve also had a number of very kind emails thanking us too. Now for a few months of peace and quiet before we start planning for the next one, which I suppose will be sometime in the autumn. Any suggestions on where to hold it? Bristol, Oxford, Leeds?

Incidentally, if you took any photos or videos at SQLBits why not share them on the SQLBits Facebook group? There’s also a SQLBits LinkedIn group and the #sqlbits tag on Twitter that you might want to check out too. Oh, and while you’re at it why not join the UK SQL Server User Group Facebook group as well? We’re planning some BI user group events soon, I promise.

I suspect the current economic problems have had a lot to with the success of SQLBits. Apart from the fact that a free conference is the perfect answer to a reduced training budget, I guess everyone’s worried about job security and therefore keen to invest in their own education – Simon Munro wrote a good blog entry on exactly this topic last week which is worth a read. However, I found it interesting talking to other BI professionals at the conference (and yes, networking is another great reason to come along) about their pipelines because everyone was saying how busy they were at the moment. When the recession hit I was sceptical of predictions that the BI sector would survive unscathed but, although I’ll admit I’ve not been as busy this year as I was last, I’m still doing more than OK. Looking at the visitor stats for this blog and my company website, I’ve noticed hits from a number of companies that have been in the news for their, ahem, financial problems – and the conclusion I draw is that there are a lot of organisations out their who have realised how little they know about their own financial situation and hope BI software is the answer.

Written by Chris Webb

March 29, 2009 at 10:55 pm

Posted in Events

Aleri Live OLAP

leave a comment »

There’s no real reason for this post except to indulge my trainspotterish habit of making a note whenever I find another OLAP product that supports MDX. From a recent post on Curt Monash’s blog I came across Aleri Live OLAP (see also here), an MDX-capable OLAP server that works on streaming data coming from their complex event processing platform.

Written by Chris Webb

March 26, 2009 at 11:56 am

Posted in MDX

SDS, Analysis Services in the cloud, and local cubes?

with one comment

I was just reading through a list of questions and answers about the new, more SQL Server-like SDS, on the SDS team blog and had a thought. Here’s three points that are made in the post:

When? or to quote JamieT “When do I get to party on this new service”?

We’re on track to deliver a public CTP mid-calendar year 2009 and ship in the second half of calendar year 2009.

The blog entry states “If it works with SQL Server, it will largely work with SQL Data Services.”. That word “largely” bothers me a little – it suggests the functionality is going to be reduced slightly. Details please?

We will be providing documentation soon on what is and is not supported in SDS. I’ll post an entry to the blog once the guidance is available and you can also keep an eye out for it on our MSDN Dev Center. But, to answer the question – We say *largely* due to the fact that there are things that just don’t apply in a cloud based world like setting the location of a data or log file or making server wide configuration changes. In v1 we expect to deliver a surface area that will support the vast majority of SQL Server database applications.

Will you offer hosted SSIS/SSAS/SSRS?

It’s on the product roadmap, but I can’t comment on specifics or timing.

So, we’ll get a CTP in a few months, it’s going to be mostly compatible with existing SQL Server apps, but we’re not going to get Analysis Services in the cloud just yet. What can we do while we’re waiting for cloud-based SSAS then? Well…

  • It seems highly likely that we’ll be able to hook a normal, server-based instance of SSAS up to SDS and use it as a data source for building cubes. It would be a pretty silly thing to do though, I’m sure, because it would take ages to process a cube, but…
  • Wouldn’t that make ROLAP a more attractive option as a storage mode? No processing needed then, just SQL queries generated whenever the data is needed. However, ROLAP is slow now and is likely to be even slower when you’re querying SDS, but…
  • For some OLAP apps, you could dispense with a server-based instance of SSAS altogether. One little-known (and little-used) feature of SSAS is the ability to build ROLAP local cubes. As you probably know, a local cube (.cub file) is a portable cube that doesn’t need full Analysis Services installed anywhere. Since storing a local cube file somewhere in the cloud would be dead easy, I can imagine a scenario where you create a ROLAP local cube file – which would be no bigger than a few KB in size – allow people to download it, and then when they connect to their cube from Excel or wherever the local cube would then in turn retrieve the data it needs from SDS. Not exactly SSAS in the cloud, and probably only likely to work for small amounts of data and simple queries, but it’s an approach and not unlike what CubeSlice has been offering for a few years.
  • With local cubes you could always convert them to MOLAP storage if you wanted faster query performance (at the expense of having longer processing times) and of course local cubes seem to be an important part of the Gemini story too. What we’d really need are easier ways to create local cubes and support for easy switching of storage modes (from ROLAP to MOLAP/Gemini) to make this as smooth as possible.

Written by Chris Webb

March 23, 2009 at 10:40 am

Posted in Cloud

Query behaviour when you cross the LowMemoryLimit

with 2 comments

Here’s something that had me stumped for quite a long time earlier this week…

I’m doing some performance tuning work for a customer at the moment, and following my usual approach I took a test query, cleared the cache, ran it and noted how long it ran for and what I saw in Profiler and MDX Studio. However this time I saw some strange things happening: I’d run the same query on a cold cache more than once and the query times would differ by anything up to a minute; even more confusing, the information I got from Profiler and MDX Studio would be different too – for example, the number of Query Subcube Verbose events in recorded in Profiler for a query might differ from run to run by several hundred. This had me completely confused. There was no-one else on the server and I was running on a cold cache. What was going on?

It turned out that I was running out of memory. I was working on a 32-bit box with a lot of databases on, so even on a cold cache the amount of memory used by SSAS was approaching the LowMemoryLimit threshold (see Marco’s post here, the AS2005 Performance Guide and this white paper for an explanation of what this is). When I ran the query – which had a lot of calculations – memory usage went above the LowMemoryLimit so the cleaner thread was waking up and evicting elements of shrinkable memory, which presumably included data in cache that had been put there by the current query. Since the exact moment that this threshold was crossed would vary from run to run, and as I assume that what the cleaner thread would evict from memory would also vary, this explained the different query times and Profiler activity. As soon as I moved onto a 64-bit box with no other databases this stopped happening.

Of course nowadays I would expect that most people are running on 64-bit so it’s much more unlikely that you’ll ever cross the LowMemoryLimit, but I do see it happen from time to time. So if you’re in the same position and suffer from inconsistent query performance, check your memory usage!

Written by Chris Webb

March 19, 2009 at 11:28 pm

Posted in Analysis Services

BI Survey 8

leave a comment »

As I think I’ve probably said before, I don’t get many freebies as a blogger but one that I do get and I really appreciate is my review copy of the BI Survey. It being that time of year, I got the latest edition – the BI Survey 8 – last week along with an email encouraging me to blog about it, and who am I to refuse a request to blog about something as fascinating as this? It’s a whopping 489 pages long so I can’t even begin to summarise it, but there are some points regarding Microsoft BI that I’d like to pick up on.

In general, the findings show the Microsoft BI stack as a solid and successful suite, extremely good value for money, but by no means a stellar performer. There’s a long section at the end of the report showing a wide range of technical and project-related KPI ratings (for example query performance, business benefits achieved, quality of support, cost of deployment and so on) and in almost every respect Analysis Services and Reporting Services come out in the middle of the rankings. This doesn’t really surprise me much: let’s face it, what we’ve got works well, but from a technical perspective there’s not been much new and exciting in the world of Microsoft BI for a while now (although with Gemini and Madison coming soon that will change), the platform in general still has some glaring holes and overlaps, and as the PerformancePoint debacle showed recently Microsoft’s overall BI strategy is somewhat confused. In fact, I suspect Microsoft’s entire BI strategy is not a BI strategy at all but a get-people-to-upgrade-to-the-latest-version-of-Office strategy, but I digress…

For Analysis Services in particular, the survey showed 15% of those surveyed were on AS2K, 79.6% were on AS2005 and 3.5% on AS2008. Given that most people must have been surveyed last summer, well before the RTM of AS2008 that’s pretty good and with the migration path from 2005 to 2008 very smooth I should think the AS2008 percentage will look much better next year. Analysis Services remains the top-ranked BI tool used against Microsoft’s own databases, as you’d expect since it’s essentially free with SQL Server, but it also comes in a close third for Oracle (while Oracle’s own BI tools come in 12th!), second for IBM, third for Teradata and top for Open Source databases.

In terms of the client tools used with Analysis Services, just over 70% of people were using one of Microsoft’s own tools – mostly pivot tables and Reporting Services. 25% of SSAS users had Proclarity, and interestingly there were more people using the old free Excel addin than PerformancePoint , and a sizeable minority still using Data Analyzer. There’s clearly a lot of demand for a client tool from Microsoft but from the looks of things most people are still stuck with Office 2003; this just adds weight to my argument that Microsoft coupling its BI strategy so closely to Office might help Office adoption rates but has a serious negative impact on the success of BI strategy itself… sorry… there I go again… As far as third-party tools go Panorama is still the #1 vendor, but it only has an 11% share and only 22% of SSAS users overall had any kind of third party tool in use. Incidentally Panorama was treated as a BI vendor in its own right for the purposes of the survey and came out top in a lot of the high-level KPIs including the overall KPI ranking – they obviously have some very enthusiastic customers.

So who, apart from Panorama, seems to be doing well? Qlikview certainly is, which at least validates Microsoft’s decision to go after that market with Gemini; Microstrategy does well too. Essbase comes out badly with great query performance offset by poor support and product quality; Cognos, apart from TM1, doesn’t seem to do too well either. Clearly it’s the vendors who are 100% focused on BI that are the most successful, which is as you’d expect.

Written by Chris Webb

March 12, 2009 at 7:03 pm

Posted in Analysis Services

SDS: the new relational features announced

leave a comment »

After all the rumours, here’s the official announcement of the new relational features that are coming to SQL Data Services:
http://blogs.msdn.com/ssds/archive/2009/03/10/9469228.aspx

Given that the team have already made noises about adding BI features to SDS soon, I can’t wait to see what form they’ll take. Of course there are already lots of ways of doing BI with data stored online as my last blog entry showed; there are also couple of startups like Birst and GoodData who do very sophisticated BI things in the cloud already. But I hope Microsoft has something up its sleeve, and that I can run an MDX query against it…

Written by Chris Webb

March 10, 2009 at 7:32 pm

Posted in Cloud

Guardian Data Store – free data, and some ideas on how to play with it

with 2 comments

I was reading the Guardian (a UK newspaper) online today and saw that they have just launched something called Open Platform, basically a set of tools that allow you to access and build applications on top of their data and content. The thing that really caught my eye was the Data Store, which makes available all of the numeric data they would usually publish in tables and graphs in the paper in Google Spreadsheet format. Being a data guy I find free, interesting data irresistible: I work with data all day long, and building systems to help other people analyse data is what I do for a living, but usually I’m not that interested in analysing the data I work with myself because it’s just a company’s sales figures or something equally dull. However give me information on the best-selling singles of 2008 or crime stats for example, I start thinking of the fun stuff I could do with it. If you saw Donald Farmer’s fascinating presentation at PASS 2008 where he used data mining to analyse the Titantic passenger list to see if he could work out the rules governing who survived and who didn’t, you’ll know what I mean.

Given that all the data’s in Google Spreadsheets anyway, the first thing I thought of doing was using Panorama’s free pivot table gadget to analyse the data OLAP-style (incidentally, if you saw it when it first came out and thought it was a bit slow, like I did, take another look – it’s got a lot better in the last few months). Using the data I mentioned above on best-selling singles, here’s what I did to get the gadget working:

  1. Opened the link to the spreadsheet: http://spreadsheets.google.com/pub?key=phNtm3LmDZEP4i_r7RdVhUg
  2. Followed the link at the very bottom of the page to edit the page.
  3. On the new window, clicked File/Create a Copy on the menu to open yet another window, this time with a version of the data that can be edited (the previous window contained only read-only data)
  4. Right-clicked on column J and selected Insert 1 Right, to create a new column on the right-hand side.
  5. Added a column header, typed Count in the header row, and then filled the entire column with the value 1 by typing 1 into the first row and then dragging it down. I needed this column to create a new measure for the pivot table.
  6. Edited the ‘Artist(s)’ column to be named ‘Artist’ because apparently Panorama doesn’t like brackets
  7. Selected the whole data set (the range I used was Sheet1!B2:K102) and then went to Insert/Gadget and chose Analytics for Google Spreadsheets. It took me a moment to work out I had to scroll to the top of the sheet to see the Panorama dialog that appeared.
  8. Clicked Apply and Close, waited a few seconds while the cube was built, ignored the tutorial that started, spent a few minutes learning how to use the tool the hard way having ignored the tutorial, and bingo! I had my pivot table open. Here’s a screenshot showing the count of singles broken down by gender and country of origin.

image 

Of course, this isn’t the only way you can analyse data in Google spreadsheets. Sisense Prism, which I reviewed here a few months ago, has a free version which can connect to Google spreadsheets and work with limited amounts of data. I still have it installed on my laptop, so I had a go connecting – it was pretty easy so I won’t go through the steps, although I didn’t work out how to get it to recognise the column headers as column headers and that polluted the data a bit. Here’s a screenshot of a dashboard I put together very quickly:

image

Lastly, having mentioned Donald Farmer’s Titanic demo I thought it would be good to do some data mining. The easiest way for me was obviously to use the Microsoft Excel data mining addin: there are two flavours of this: the version (available here) that needs to be able to connect to an instance of Analysis Services, and the version that can connect to an instance of Analysis Services in the cloud (available here; Jamie MacLennan and Brent Ozar’s blog entries on this are worth reading, and there’s even a limited web-based interface for it too). Here’s what I did:

  1. Installed the data mining addin, obviously
  2. In the copy of the spreadsheet, I clicked File/Export/.xls to export to Excel, then clicked Open
  3. In Excel, selected the data and on the Home tab on the ribbon clicked the Format as a Table button
  4. The Table Tools tab having appeared on the ribbon automatically, I then pressed the Analyze Key Influencers button
  5. In the dialog that appeared, I chose Genre from the dropdown to try to work out which of the other columns influenced the genre of the music
  6. Clicked I Agree and Do Not Remind Me Again on the Connecting to the Internet dialog
  7. Added a report comparing Pop to Rock

Here’s what I got out:

image

From this we can see very clearly that if you’re from the UK or under 25 you’re much more likely to be producing Pop, Groups are more likely to produce Rock, and various other interesting facts.

So, lots of fun certainly (at least for a data geek like me), but everything I’ve shown here is intended as a serious business tool. It’s not hard to imagine that, in a few years time when more and more data is available online through spreadsheets or cloud-based databases, we’ll be doing exactly what I’ve demonstrated here with that boring business data you and I have to deal with in our day jobs.

Written by Chris Webb

March 10, 2009 at 7:12 pm

Posted in On the internet

Analysis Services and the System File Cache

with 6 comments

Earlier this week Greg Galloway sent me an email about some new code he’d added to the Analysis Services Stored Procedure Project to clear the Windows system file cache:
http://asstoredprocedures.codeplex.com/Wiki/View.aspx?title=FileSystemCache

I thought this was quite interesting: several times when I’ve been doing performance tuning I’ve noticed that the same query, running on a cold Analysis Services cache, runs much slower when the cube has just been processed. This I put down to some form of caching happening at the OS level or below that; I also vaguely knew that it was a good idea to limit the system file cache, having seen the following on Jesse Orosz’s blog:
http://jesseorosz.spaces.live.com/blog/cns!E322FD91218E57CF!295.entry

Anyway, doing some more research on this subject I came across the following blog entry that discusses the problem of excessive caching in more detail:
http://blogs.msdn.com/ntdebugging/archive/2009/02/06/microsoft-windows-dynamic-cache-service.aspx
…and announces a new tool called the Microsoft Windows Dynamic Cache Service that aims to provide a better way of managing the system file cache:
http://www.microsoft.com/downloads/details.aspx?FamilyID=e24ade0a-5efe-43c8-b9c3-5d0ecb2f39af&displaylang=en

Has anyone got any experience with this? From what I can see, installing the Dynamic Cache Service on a 64-bit SSAS box with a big cube on looks like a good idea – has anyone tried it? If you have, or are willing to, can you let me know how you get on? Comments are welcome…

Written by Chris Webb

March 4, 2009 at 10:40 pm

Posted in Analysis Services

Follow

Get every new post delivered to your Inbox.

Join 3,083 other followers