Archive for the ‘Uncategorized’ Category
The third and final keynote at the PASS Summit, and this morning I’ve been given a space on the official blogger table at the keynote! Actually this just means I’ve got a table to rest my laptop on and access to a power strip, but it’s an honour nonetheless.
There are several things that I saw yesterday that are worth mentioning. Probably the most interesting session was from Cathy Dumas about Tabular: among other things she demoed a DAX Editor plugin for BIDS SQL Server Data Tools that is going to make everyone’s life soooo much easier; it will give us something like an MDX Script editor, intellisense, colour coding and so on. She has blogged about it here and I can’t wait for it to be made available. Also I came across the Data Explorer team site and blog; if you are interested in getting to play with it when it’s ready then you get your email address added to the invite list.
Anyway, back to the keynote and today it’s PASS Summit favourite Dr David DeWitt covering Big Data. It’s not a standard marketing session, more of a lecture, and all the better for it; DeWitt is a very talented speaker and more importantly takes a balanced approach to describing the SQL vs NoSQL debate. Interesting points to note:
- He thinks that the arrival of NoSQL is not a paradigm shift, in the way that the move from hierarchical databases to relational databases was. The assertion that SQL is not dead, not surprisingly, goes down well with this audience.
- Hadoop. I’m not even going to try to summarise this section of the talk but it is an excellent introduction to how it works, and if you’re even vaguely interested in Hadoop (which you should be given Thursday’s announcements) then you need to watch this – I believe will be available to view on demand somewhere (the slide deck is here). It is, honestly, the best explanation of all this I’ve ever seen and there are several thousand people here in this room who agree with me…
- He does a comparison of Hive vs Parallel Data Warehouse v.next on the same queries, same data and same hardware, and shows that PDW can outperform Hive by up to 10x. This demonstrates that a parallel database still has advantages over a NoSQL approach as far as performance goes in many cases, although of course each has its own strengths and weaknesses and performance isn’t the only consideration.
This was not only an enthralling and educational talk, but it was also great marketing from Microsoft’s point of view. You can announce Hadoop for Windows to a room full of SQL Server types and however many whizzy demos you do, and however much woo-hooing goes on, if we don’t really understand the technology we’ll go back to our day jobs and ignore it. On the other hand, teach us what the technology actually does and you’ll get us interested enough to try it out for ourselves and maybe even use it on a project.
Finally, if you’re at the Summit today come and find me at the Birds of a Feather lunch on the SSAS Performance Tuning table, or later on the BI Expert Pod this afternoon.
Sorry for the relative quiet on here recently, I’ve been caught up with helping to organise SQLBits and other stuff… But anyway, this week I’m working with a company who asked me to tackle what must be a very common business problem: they have a fact table where one row represents an invoice they have generated, and they need to know at any point in time what the average age of all their unpaid invoices is, and monitor whether this value goes up or down. The point of this is, of course, is that if the average age of outstanding invoices starts to go up over time then it means that their customers are getting slower to pay them and they may potentially have a cashflow problem.
The solution is a variation on the ‘events in progress’ model that I blogged about recently but with an interesting twist that I thought was worth describing. First of all, we need to prepare some data so we can reproduce the scenario in Adventure Works – let’s pretend that the FactInternetSales table actually contains invoice data. We’ll need to massage the data a bit for two reasons:
- In Adventure Works if you look at the OrderDate, ShipDate and DueDate columns there’s always exactly the same number of days between the dates, which doesn’t make for a great demo
- In the real world some invoices will be unpaid when the fact table is loaded, others will be paid. We need to be able to handle both scenarios properly.
Once again, we need two fact tables. The first of these will contain all of our invoices, and for this I used the following named query in the Adventure Works DSV:
SELECT CustomerKey, OrderDateKey
The second will be a fact table that contains information on invoices that have been paid:
CASE WHEN SalesTerritoryKey > 5 THEN ShipDateKey ELSE DueDateKey END AS PaidDateKey,
CASE WHEN SalesTerritoryKey > 5 THEN 7 ELSE 12 END AS DaysToPayment
WHERE (PromotionKey < 3)
A few things to notice in this second query:
- I’ve filtered the fact table to remove a very small number of rows where PromotionKey is less than 3 – this will give us a few unpaid invoices to make the numbers more interesting.
- I’ve created a new date key called PaidDateKey to make the invoices have different payment lengths.
- Because these are paid invoices, we know how long they took to be paid and we can precalculate this and put it in the fact table in the DaysToPayment column.
We can now build the cube. It needs two measure groups, one based on each fact table, and we can add the Customer dimension too. Three measures need to be built: one count measure on each measure group and a sum measure on the DaysToPayment column. The Date dimension joins to the OrderDateKey on the All Invoices measure group, and to the PaidDateKey on the Paid Invoices measure group.
Now for the MDX. The first thing we need to do is find the total number of invoices generated to date and the total number of invoices that have been paid to date (and also to make sure that these totals aggregate up properly through the time dimension); we can then subtract the second from the first to find the total number of unpaid invoices:
CREATE MEMBER CURRENTCUBE.MEASURES.[Total Invoices To Date] AS NULL; SCOPE(MEASURES.[Total Invoices To Date]); SCOPE([Date].[Date].[Date].MEMBERS); THIS = SUM(NULL:[Date].[Date].CURRENTMEMBER, [Measures].[All Invoices Count]); END SCOPE; SCOPE([Date].[Date].[All Periods]); THIS = (TAIL(EXISTING [Date].[Date].[Date].MEMBERS, 1).ITEM(0).ITEM(0) , MEASURES.[Total Invoices To Date]); END SCOPE; END SCOPE; CREATE MEMBER CURRENTCUBE.MEASURES.[Total Paid Invoices To Date] AS NULL; SCOPE(MEASURES.[Total Paid Invoices To Date]); SCOPE([Date].[Date].[Date].MEMBERS);
THIS = SUM(NULL:[Date].[Date].CURRENTMEMBER, [Measures].[Paid Invoices Count]); END SCOPE; SCOPE([Date].[Date].[All Periods]); THIS = (TAIL(EXISTING [Date].[Date].[Date].MEMBERS, 1).ITEM(0).ITEM(0) , MEASURES.[Total Paid Invoices To Date]); END SCOPE; END SCOPE; CREATE MEMBER CURRENTCUBE.MEASURES.[Unpaid Invoices Count] AS MEASURES.[Total Invoices To Date] - MEASURES.[Total Paid Invoices To Date];
This is the easy bit. In order to find out what the average of the number of days the unpaid invoices have actually been unpaid, we need to find the number of days that each unpaid invoice has been open for, sum that up, and divide it by the total number of unpaid invoices. And to find this sum of the number of days that all unpaid invoices are open, we need to do the following:
- Find the total up to yesterday of the [Unpaid Invoices Count] measure. This gives the total number of days that every invoice has ever been open.
- Then sum the total to date of the Days to Payment measure we created on the Paid Invoices fact table, to find the total number of days that all paid invoices were ever open
- Then subtract this second figure from the first, which will give us the sum of the number of days all currently unpaid invoices have been open
Here’s the code:
CREATE MEMBER CURRENTCUBE.MEASURES.[Sum of Ages of Invoices To Date] AS NULL; SCOPE(MEASURES.[Sum of Ages of Invoices To Date]); SCOPE([Date].[Date].[Date].MEMBERS); THIS = SUM(NULL:[Date].[Date].CURRENTMEMBER.PREVMEMBER , MEASURES.[Unpaid Invoices Count]); END SCOPE; SCOPE([Date].[Date].[All Periods]); THIS = (TAIL(EXISTING [Date].[Date].[Date].MEMBERS, 1).ITEM(0).ITEM(0) , MEASURES.[Sum of Ages of Invoices To Date]); END SCOPE; END SCOPE; CREATE MEMBER CURRENTCUBE.MEASURES.[Sum of Ages of Paid Invoices To Date] AS NULL; SCOPE(MEASURES.[Sum of Ages of Paid Invoices To Date]); SCOPE([Date].[Date].[Date].MEMBERS); THIS = SUM(NULL:[Date].[Date].CURRENTMEMBER, [Measures].[Days To Payment]); END SCOPE; SCOPE([Date].[Date].[All Periods]); THIS = (TAIL(EXISTING [Date].[Date].[Date].MEMBERS, 1).ITEM(0).ITEM(0) , MEASURES.[Sum of Ages of Paid Invoices To Date]); END SCOPE; END SCOPE; CREATE MEMBER CURRENTCUBE.MEASURES.[Difference of Sum of Ages] AS MEASURES.[Sum of Ages of Invoices To Date] - MEASURES.[Sum of Ages of Paid Invoices To Date];
Finally, we can calculate the average like so:
CREATE MEMBER CURRENTCUBE.MEASURES.[Average Age of Unpaid Invoices] AS IIF(MEASURES.[Unpaid Invoices Count]=0, NULL, MEASURES.[Difference of Sum of Ages]/MEASURES.[Unpaid Invoices Count]) , FORMAT_STRING='#,#.00';
It’s pretty complex I’ll admit, but it does perform well. Here’s what the results look like in Excel:
I only wish I could come up with a way of getting some of my customers to pay this quickly…!
Every year on December 30th (or thereabouts) I write a blog post summarising what I’ve been up to this year, to mark this blog’s birthday. Today is my blog’s sixth birthday which, to be honest, I find pretty amazing – but then every year I’m amazed at how long I’ve kept at this! I only wish I could apply the same kind of willpower to making cash (I’d be a billionaire by now) or dieting, exercise or anything else…
From a blog point of view, the big change this year was moving away from Windows Live Spaces, where I’d started out, to WordPress. I’d been wanting to make the move for ages but had been too lazy to do so, and also unwilling to lose all my old posts and traffic; luckily when MS made the decision to retire Live Spaces they made sure it was very easy to migrate all my old content to WordPress and for that I’m very grateful – it almost makes up for the shabby treatment Live Spaces users had been getting in the preceding years.
I also got one of my biggest ever days from a blog traffic point of view during the controversy over the future of SSAS at PASS; I’m not even going to link to that particular post because it still gets plenty of hits and it’s something I’d prefer to move on from. It’s better to look forward to the undoubtedly cool stuff that will be coming with Denali rather than brood over any disappointment about what has happened, and one thing I’m certainly relieved about is that in 2011 there will be lots of cool stuff to blog about – at some points in the last year I’ve struggled to find anything new or interesting to blog about for SSAS (although as Greg Galloway has just shown, there’s still lots of interesting SSAS topics to blog about if you look). Bring on BISM!
Professionally things go from strength to strength and I’m going to be busier in the first half of 2011 than I’ve ever been in January/February, which are usually relatively quiet times for me. Again, with significant new technology arriving in the BI stack in Denali I would expect a lot of consultancy and training work to follow; I’m going to be expanding my training activities in other ways next year too, and I’ll be making a separate announcement about that here any day soon. SQLBits also continues to grow and grown, and I’m really excited about the plans for SQLBits 8 in Brighton this April. Since it doesn’t look like there’s going to be a PASS Europe this year I would say that anyone who’s serious about SQL Server in Europe should plan to come to what’s going to be the biggest and best SQL conference in Europe bar none; and even if you live outside Europe, but fancy a vacation, why not come too? We have an ever increasing number of US attendees and speakers!
Anyway, it’s time for me to go and do some real work and stop being self-indulgent. I hope everyone reading this has a great 2011. Thanks for reading!
I’m just making my final preparations for my trip to Seattle and the PASS Summit 2010. There are going to be some big announcements this year so I’m really looking forward to going! If you see me around make sure you say hello – I like to meet people who read my blog.
I’m going to be pretty busy this year. I’m speaking on Tuesday, a session on “Comparing Analysis Services and PowerPivot” (it’s made it onto several people’s must-see lists); also on Tuesday I’ll be hosting a table at the birds of a feather lunch, where the theme will be “Performance Tuning SSAS“; and Marco, Alberto and I will be doing a book signing session on Thursday from 12:30pm to 1:00pm, at the summit bookstore, where we’ll be hoping to flog a few more copies of “Expert Cube Development” and Marco and Alberto’s excellent new PowerPivot book.
Hope to see you there!
Today I was with a customer and created a calculated measure that was formatted as a percentage. The formatting showed up fine in the cube browser but did not in Excel, which was a bit strange given that other percentage calculated measures on the cube seemed to be working fine. Now I knew that Excel doesn’t pick up the formatted_value of a cell in a cellset, but instead takes the format_string associated with each cell and interprets this as an Excel format inside a pivot table, and this explains why sometimes SSAS formats don’t work in Excel. So clearly something was going wrong with Excel interpreting the format string I’d defined.
Finally the guy I was working with, Andrew Baker, worked it out. Look at the two following calculated measures:
CREATE MEMBER CURRENTCUBE.MEASURES.C1 AS 0.99, FORMAT_STRING=’PERCENT’;
CREATE MEMBER CURRENTCUBE.MEASURES.C2 AS 0.99, FORMAT_STRING=’Percent’;
When you add them to a cube and browse the cube in the cube browser, you see this:
In Excel you see this:
What’s the difference? Yes, you guessed it, when it comes to the built-in SSAS format string types like ‘Percent’, Excel is case-sensitive. So a format string like ‘PERCENT’, while it’s valid from an MDX point of view, gets ignored by Excel whereas ‘Percent’ is correctly interpreted as a percentage format.
And Vidas has just noticed there’s a new release of PowerPivot that integrates with Windows Azure Marketplace – Microsoft’s data marketplace that was previously known as Project Dallas.
There’s also a separate Excel addin that allows you to import data from Windows Azure Marketplace direct into Excel:
One last thing to note – among all the cool data that’s available at the Windows Azure Marketplace, I see that you can get access to data from Wolfram Alpha too!
At long last I’ve got round to organising another BI user group meeting in London, on the evening of November 17th. We’ve got Ian Marrit talking about Master Data Services in the first session, that’s followed by Jamie Thomson and me talking about all the cool new stuff that will be announced for BI at PASS the week after next. For more details and registration see:
Hope to see you there!
So my blog birthday has come around again, and five years seems like some kind of milestone – I’m surprised anything in IT lasts five years. But here I am still going…
It’s been a pretty good year. Half of it was taken up with writing “Expert Cube Development with Analysis Services 2008” with Marco and Alberto, which was published in the summer; I was very happy with how it turned out, and we got some really good reviews on various blogs (see here for a list) as well as six five-star reviews on Amazon.com. Even the one bad review we got on Amazon, in my opinion, proves that we achieved our objective to write a book specifically for more advanced users rather than beginners. We heard the other day that we’ve already sold more than a thousand copies which I don’t think is bad at all for a book with a relatively small target readership.
The blog itself did quite well too: I was very proud when won an award for best BI blog post in the PASS Log Reader Awards for my post on SSRS Drilldown, and I was equally chuffed when Donald Farmer named me in his top ten SQL bloggers of 2009 the other day. It’s been harder to find SSAS and MDX-related issues to write about because, let’s face it, there hasn’t been much new SSAS and MDX functionality in the past few years, but I’ve got a lot of PowerPivot and DAX posts planned and there’s plenty of new stuff in the wider MS BI stack that will be worth a look. So far I’ve resisted the temptation to move with the times and start Twittering as well, not because I have anything against it but more because I’m sure I’d like it too much, and I spend far too much time tapping away at my laptop already.
Business-wise things have certainly been slower in 2009 than they were in 2008 or 2007, which was only to be expected. Luckily the quietest months for work were also the months when I was working hardest on the book, and it now looks like things have turned a corner (for the European market, if not the UK) and I’m extremely busy once again. I’m not sure whether PowerPivot and DAX will present many opportunities for me to make money from consultancy, but given that I’ve been doing a lot more training recently I’ll probably set aside some time to write a course on them too; I’ll be on the lookout for other opportunities to diversify away from my core expertise of Analysis Services and MDX, but I don’t think I’ve found a new technology that really grabs me yet.
Last of all, I couldn’t let Mosha’s ‘farewell to BI’ blog post from earlier today pass without comment. I started working with OLAP Services as it then was during the beta for SQL Server 7.0, more than ten years ago now, and right from the beginning Mosha made an incredible effort to engage with the user community, giving us advice, answering our questions and taking the time to understand what we were trying to do with the product. His posts on the microsoft.public.sqlserver.olap newsgroup, and latterly on his blog, were a goldmine of information and I firmly believe his efforts were one of the main reasons why Analysis Services has been as successful as it has been; his example is one other development teams at Microsoft would do well to follow. Of course it’s not like he’s died or anything and I don’t want to sound as though I’m writing his obituary, but he’ll be missed and Bing’s gain is our loss.
Here’s to 2010!
I thought I’d do a quick round-up on some of the interesting links that I’ve found concerning Gemini:
- Donald Farmer has a number of short screencasts on different areas of Gemini functionality:
- Kasper de Jonge has a pretty detailed overview of Gemini:
- Vidas Matelis tests Gemini’s scalability, load speeds and levels of compression:
- Aurelien Koppel compares Gemini with its obvious close competitor, Qlikview (in French but Google translates pretty well):
Something else I’m a bit late on, having been away on holiday this week, is the announcement that there won’t be a Microsoft BI Conference this year:
Apparently the same customers who told Microsoft to kill PerformancePoint planning have also been saying that they should only hold the BI Conference every other year. As an aside, did these marketing people ever get told on their MBAs not to treat their customers like idiots? Is there anyone out there that still falls for that ‘our customers have told us’ line? Surely the whole point of blogging – even when it’s done by a marketing person – is that it’s meant to be (or should appear to be) an honest, person-to-person type of communication and doesn’t work when you simply reuse the same content you would have stuck in a press release?
Anyway, while I agree with many of the points Marco makes about a separate BI Conference being necessary, it makes my life easier in a way – it means I no longer have to make a decision about whether to attend the PASS Summit or the BI Conference. PASS is the conference to attend and will have extra BI content; I’ve already submitted one abstract and will think of some more potential topics to talk about (any suggestions?), so with a bit of luck I’ll see you there.