Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for December 2011

Seventh Blog Birthday

with 4 comments

So here we are again, yet another blog birthday – the seventh anniversary of my first ever post on this blog and a chance to review 2011.

From a technical point of view this year has been dominated by learning SSAS 2012 Tabular. I’m right in the middle of writing a book about it with Marco and Alberto, and it’ll be published sometime in the first half of next year. As usual writing a book turns out to be ten times more effort than I’d originally planned but it is undoubtedly the best way of learning a new technology: however well you think you know a product, you always find gaps in your knowledge when you have to write about it. My opinion of Tabular continues to grow the more I use it and DAX so I’m feeling very positive about the future and I’m interested to see what the uptake of it will be like after RTM. I’m also very excited to see how Microsoft’s cloud-BI strategy pans out in the next year: products like Data Explorer suggest that Microsoft is serious about this area and, more importantly, coming up with innovative new solutions to its challenges rather than just producing cloud-hosted versions of the software we’ve already got. There are going to be a lot of cool blog posts to write in 2012…

Business-wise I’ve continued to spend more and more time running training courses and a few months ago launched my new training website Technitrain to market not only my SSAS and MDX courses but those run by other well-known SQL Server pros. Our first third-party course with Christian Bolton went well, and next year we’ve got several others booked in. You can see the full course catalogue here; I’d also like to point out the new, lower prices on Jeremy Kashel’s Master Data Services course and Andy Leonard’s SSIS course. Frankly £1248 for a full week of training, let alone from someone of the calibre of Andy Leonard, is a bargain… although I’d argue it only looks that way because we’re too used to the inflated prices of most traditional training companies. More course dates will be announced very soon.

As far as the SQL community goes, I’m very busy speaking at user groups and helping to run SQLBits. The two SQLBits conferences last year in Brighton and Liverpool were great successes, and I still can’t believe we had Steve Wozniak turn up to speak at Brighton; SQLBits 10, which will be running from the 29th-31st of March next year, is already looking like it will beat all previous records for attendance (we’ve got 582 registrations at the time of writing) and will be the biggest and best yet. Next month I’ll be speaking in the UK at the Southampton SQL Server User Group on January 11th, and on January 12th I’ll be speaking at the PASS BI Virtual Chapter so I hope to see some of you there.

Anyway, that’s enough of me going on – it’s time to get back to my writing. Happy New Year everyone!

Written by Chris Webb

December 30, 2011 at 11:32 am

Posted in Off-topic

BISM Normalizer

with 4 comments

I’ve just come across a new, free utility for SSAS 2012 Tabular called BISM Normalizer:

http://visualstudiogallery.msdn.microsoft.com/5be8704f-3412-4048-bfb9-01a78f475c64

Basically it’s a tool that allows you to compare the metadata of two Tabular models and optionally copy parts of one model over to another. It’s very interesting because it addresses what I think is one of the big challenges encountered when bridging the gap between self-service and corporate BI: how can you take a bunch of independently-produced PowerPivot models and merge them into a single, coherent, Analysis Services Tabular model? Ultimately this is a problem that Microsoft is going to have to deal with itself in future versions of SSAS, along with the related problem of how parts of a ‘corporate’ Tabular model can be reused inside PowerPivot models (I’d like some form of object inheritance or ‘linked tables’, rather than have to copy data and rebuild metadata from scratch as we have to today).

Written by Chris Webb

December 29, 2011 at 10:36 am

Replacing Cell Security with Dimension Security

with 5 comments

Cell security, as you probably know, is best avoided because of the impact it can have on query performance; dimension security is much easier to work with and its performance impact is much less. Luckily even when it seems as though cell security is the only option, in at least some scenarios with a bit of extra cube modelling work dimension security can still do the job. Here’s how…

Let me give you an example that I worked on with a customer recently, translated to Adventure Works. The Internet Sales measure group in the Adventure Works cube contains three role-playing dimensions based on the Date dimension: Date (which represents the Order Date), Ship Date and Delivery Date. Let’s imagine that we want to implement dynamic security so that a user can only see data if either the Date or the Ship Date is in one particular year. For example if a user has access to the year 2003 they should be able to see data if the year on the Date dimension was 2003 or if the year on the Ship Date dimension was 2003, as shown in the cells highlighted in the following screenshot (note that the grand totals should also reflect only values from the two 2003 years as well, so on the last row below we should see a grand total of $387662.64 for the CY 2004 column):

image

How can dimension security be used to implement this? The trick is that it can’t without creating a new, hidden dimension to apply security to. To create this dimension, go to the DSV and create a new named query called DimSecurity with the following SQL:

select distinct OrderDateKey, ShipDateKey
, d.CalendarYear as OrderDateYear, s.CalendarYear as ShipDateYear
from
dbo.FactInternetSales inner join DimDate d
on OrderDateKey = d.DateKey
inner join DimDate s
on ShipDateKey = s.DateKey

This gets all the distinct combinations of order date and ship date from the fact table plus their related calendar years. Obviously this exact query might not be practical on a large fact table, but you get the idea – in order to apply dimension security to the range of cells we want, we need to create a new dimension designed so that its members can be mapped onto those cells.

Next, create an SSAS dimension from this table as follows, with three attributes:

  • A key attribute called OrderShipDate that has a composite key based on the OrderDateKey and ShipDateKey columns. I used the OrderDateKey column as the name column, but that’s not too important because this dimension will be hidden anyway. This attribute represents the distinct combination of order date and ship date.

image

  • Two other attributes, Order Date Year and Ship Date Year, based on the OrderDateYear and ShipDateYear columns.

image

This new dimension should now be added to the Adventure Works cube and a relationship added with the Internet Sales measure group as follows:

image

You can now set this dimension’s Visible property to False in the Cube Structure tab of the Cube Editor.

Then create a new role, grant it access to the Adventure Works cube, go to the Dimension Data tab, select the Security dimension and the OrderShipDate attribute and go to the Advanced tab. Then enter the following MDX in the Allowed Member Set box:

union(
exists([Security].[OrderShipDate].[OrderShipDate].MEMBERS, {[Security].[Order Date Year].&[2003]})
,
exists([Security].[OrderShipDate].[OrderShipDate].MEMBERS, {[Security].[Ship Date Year].&[2003]})
)

This grants access to all combinations of Order Date and Ship Date – all the members on the OrderShipDate hierarchy – that are either in the Order Date year 2003 or the Ship Date year 2003. Don’t forget to check the Enable Visual Totals box too.

image

Deploy then go to the cube browser and test the role. You should see the following results:

image

So we have the basic role working, but how can we make it dynamic? We don’t want to create a factless fact table that contains rows for every combination of Order Date and Ship Date because that could get very large very quickly; we only want to grant access at the year level.

Going back to the DSV, create a new named query called FactSecurity using the following SQL (inserting your own username as appropriate):

SELECT        ‘Mydomain\Myusername’ AS UName, 2003 AS CalendarYear

This will be our factless fact table that grants a user access to the year 2003. Then build a new dimension called User from the UName column to give you a dimension containing all your users, as you would in any normal dynamic security implementation; also build a new measure group in the Adventure Works cube from this table and give it a regular relationship with the User dimension and a regular relationship with the Security dimension at the OrderDateYear granularity:

image

Process, then go back to the role and change the MDX as follows:

union(
exists([Security].[OrderShipDate].[OrderShipDate].MEMBERS
, {nonempty([Security].[Order Date Year].[Order Date Year].MEMBERS
, (strtomember("[User].[Security User Name].&[" + Username() + "]", constrained)
, [Measures].[Fact Security Count]) )})
,
exists([Security].[OrderShipDate].[OrderShipDate].MEMBERS
, generate({nonempty([Security].[Order Date Year].[Order Date Year].MEMBERS
, (strtomember("[User].[Security User Name].&[" + Username() + "]", constrained)
, [Measures].[Fact Security Count]) )}
, {linkmember([Security].[Order Date Year].currentmember, [Security].[Ship Date Year])}
)
)
)

This is basically the same as the previous expression but it’s now dynamic and it now supports securing more than one year in the FactSecurity measure group for a single user. Let’s look at parts of this expression:

The following expression returns the set of members on the Order Date Year attribute of the Security dimension that the current user has access to:

nonempty([Security].[Order Date Year].[Order Date Year].MEMBERS
, (strtomember("[User].[Security User Name].&[" + Username() + "]", constrained)
, [Measures].[Fact Security Count]) )

We can use this directly in the first set passed into the Union function, but to find the equivalent Ship Date Years for each Order Date Year we need to use a combination of the Generate and LinkMember functions, as follows:

generate({nonempty([Security].[Order Date Year].[Order Date Year].MEMBERS
, (strtomember("[User].[Security User Name].&[" + Username() + "]", constrained)
, [Measures].[Fact Security Count]) )}
, {linkmember([Security].[Order Date Year].currentmember, [Security].[Ship Date Year])})

This takes the set returned by the previous expression and then loops over it using Generate; for each member in the set we then find the equivalent Ship Date Year by using LinkMember.

In summary, while this might seem quite complicated the MDX used for the equivalent cell security implementation would be equally bad and the performance overhead much worse, so this is a very useful technique to have in your toolkit. Thanks are due to Peter Holzner, Ramon Mueller, Beat Stuenzi, Chris Fleming and Adam Widi for working through all this with me the other week.

Written by Chris Webb

December 22, 2011 at 9:31 pm

Bringing It All Together In The Cloud and Excel

with 2 comments

A few things got released in the past week or so that I would normally have devoted a short blog post to; now I’ve finally made it onto Twitter I tend to just tweet about them instead, but I still think a blogging is the best way for me to get my thoughts together about what they actually mean. Let’s consider this ‘stuff’ in isolation first:

  • http://www.excelmashup.com/ went live. It’s a JavaScript API for the Excel Web App which is useful, but as Jamie Thomson notes here it’s not the API he and I have been asking for for a long time, alas.
  • The first CTP of Hadoop on Azure went live. Denny Lee’s blog post is probably the best place to start to learn about it; the video in the post is a very useful quick overview of what it does too. I noticed that it supports importing data from the Windows Azure Marketplace (what used to be called the Azure Datamarket).
  • The Azure Marketplace also got a speed upgrade, as Boyan Penev notes here. This is important because whenever I’d tried to use it in the past its appalling performance had stopped me doing anything interesting with it at all. This, plus the fact that you can now publish your own data there, turns what was something of a white elephant into what could be an integral part of the Microsoft cloud BI platform.
  • Version 2 of Google BigQuery went live, although it’s still in beta.
  • The CTP of Data Explorer got released, which of course I’ve blogged about already here and which Jamie blogged about here and here.
  • Microsoft announced Office 365 compliance with leading EU and US standards for data protection and security, which means less of those legal worries about whether you’re allowed to put all that interesting data you want to use for BI into the cloud.

From this it’s clear that Microsoft’s cloud BI platform is beginning to take shape, as are competing cloud BI platforms (if we assume that Google actually has a cloud BI strategy, and I think it has), and I think it’s fair to say Microsoft is well placed. There’s also yet more evidence, as if it was not blindingly obvious already, that Excel is at the heart of Microsoft’s BI strategy. Look at how data from Hadoop on Azure can be imported directly into Excel, and how this is flagged up as an important selling point, in the video on Denny’s blog post. However I think Microsoft needs to make even more of an effort to make everything Excel-like: now that it’s building a new BI platform from scratch it has a very rare opportunity to do this and to produce a truly coherent set of tools rather than the traditional grab-bag of technologies that make up the current Microsoft BI stack and which the cloud platform could also end up as too. Actually I’d like to go further and say that rather than have a bunch of separate cloud BI tools MS should make everything BI a feature of Excel, whether it be Excel on the desktop or Excel in the cloud. This might seem a controversial thing to say, but if MS is committed to true mass-market, self-service BI then Excel has to be the platform and MS needs to base its cloud BI strategy on it 100%.

Here are a couple of the things I’d like to see happen to achieve this:

  • Data Explorer’s expression language needs to be made consistent with Excel syntax, in the way that DAX is consistent with Excel syntax. While I like what Data Explorer can do at the moment I hate having to learn a new syntax and a new set of functions for each tool I use (think SSRS expressions, SSIS expressions, MDX and so on) and it makes me much less productive when I do. I want to use the same syntax to write formulas in my spreadsheet, calculations in DAX and calculations/transformations in Data Explorer too – after all, we want to do a lot of the same things in ETL and reporting like lookups, aggregation and pivoting.
  • Hadoop on Azure is an important tool to have, not least because Hadoop is already so widely used in the wider world so it adds some credibility to the MS BI stack, but I’d like to be able to control massively parallel calculations from inside Excel and not just consume the results of them there. I’m thinking something like DataScope or the way you can scale out Excel calculations on Windows HPC, maybe driven though a PowerPivot-like interface with calculations expressed in DAX, or Data Explorer (which links back to my last point, because if Data Explorer expressions were DAX it needn’t be an either/or choice). It could of course still be Hadoop at the back-end with a layer on top to make it consistent with Excel.
  • We desperately need a proper API for the Excel Web App (yes, here we go again…). An OData API for importing and exporting data from spreadsheets on the web is only the first step; in the long term I’d want the Excel Web App to have capabilities like Project Dirigible, so that Excel on the desktop could become a kind of client tool for a massively scalable Excel Server in the cloud (note that I didn’t say Excel Services, which is more like Excel-on-the-server than a real Excel Server in my opinion). I’d want to be able to store data in the cloud and automatically synchronise it with Excel on multiple desktops or elsewhere in the cloud. I’d also want to create Excel spreadsheets in the cloud that acted like functions (similar to Project Dirigible), where these functions could be called from other Excel spreadsheets which again could be on the desktop or be in the cloud.
  • We need Analysis Services in the cloud. We’ve known it’s been coming for a long time, but not what form it will take. Again, I’d like to see tight integration with Excel similar to how PowerPivot works on the client or tighter, where Analysis Services would not be a separate service but just be the functionality within Excel for manipulating tables of data for reporting purposes. I’d want Data Explorer to be able to create and load these tables.
  • We also need the full BI functionality of desktop Excel – pivot tables and cube formulas – in the Excel Web App. I would guess this is in the pipeline anyway since it’s a pretty obvious requirement if Microsoft’s cloud BI strategy is going to work. I also don’t see the point of Power View being a separate app – I’d like to see it become a data visualisation feature of Excel.
  • Finally, I’d like to see some way of tying all the data held in this gigantic data store together. You’d want to be able to search it, certainly, but also understand where it’s come from and what it actually represents. It sounds like maybe this is what Project Barcelona is aiming for

Coordinating the work of multiple teams at Microsoft in the way this would demand is a particularly thankless task, I know. But maybe, just maybe, someone senior (maybe a Technical Fellow like Amir…?) could pull off something this ambitious?

Written by Chris Webb

December 16, 2011 at 9:55 pm

Posted in Cloud, Excel

SQLBits X–Registration Now Open

leave a comment »

This is just a quick post to tell you that registration for the 10th SQLBits conference, SQLBits X, is now open. It’s going to be taking place in London, UK, on March 29th-31st 2012. It’s also going to be the official launch event for SQL Server 2012 and 50% bigger than any previous SQLBits! For more details see:
http://sqlbits.com/

The agenda hasn’t been finalised (you’ll get to vote on the sessions you’d like to see first) but if you take a look at the sessions that have been submitted then you can see there’ll be a lot of famous SQL Server people presenting:
http://sqlbits.com/information/PublicSessions.aspx

I can also reveal we’re working on having several prominent members of the SSAS dev team come over too…

We’ve already had 254 registrations from people all round the UK, Europe and indeed the world in the two days since registration opened, so don’t leave your registration too late!

Written by Chris Webb

December 9, 2011 at 1:28 pm

Posted in Events

Tagged with

Using Google Docs, Data Explorer and PowerPivot for Questionnaires

with 9 comments

You may have already seen that the labs release of Data Explorer is now publicly available; there’s a whole load of really useful resources available on the learning page too if you’re interested in finding out more about it.  I’ve been very lucky to have had early access to Data Explorer, and to test it out I put together a simple demo using the cloud service that shows off a typical use-case.

The first thing I did was to use Google Docs (just to have a cross-platform demo, not because I love Google in any way, honest…) to create a simple questionnaire using Google Forms. Before you read any further, please go and fill out the questionnaire I created here:

https://docs.google.com/spreadsheet/viewform?formkey=dDRnNi1fbkotLVd6Q0g4MmhsdFV2OGc6MQ

Don’t worry, there’s only three questions and it’s all non-personal data! For those of you reading offline, here’s a screenshot:

image

Now when you create a questionnaire like this in Google Forms, the responses get put inside a Google Docs spreadsheet. Here’s the link to the spreadsheet behind my questionnaire:

https://docs.google.com/spreadsheet/ccc?key=0Akv4XYo6s_Z2dDRnNi1fbkotLVd6Q0g4MmhsdFV2OGc

image

The good thing about Google Docs (unlike, ahem, the Excel Web App) is that it has an API. The contents of this sheet could easily be exported to a number of formats including csv, which means I could get the data into PowerPivot very easily. But there was a problem: the last question is multiple choice, and for the results of that question I got a comma-delimited list of values in a single cell in the spreadsheet (see the above screenshot) – which was not going to be very useful for analysis purposes. What I really wanted was all this data split out into separate columns, one column for each version and containing a boolean value to show if that version has been checked, so if I was going to analyse my responses by version I clearly needed to do some ETL work. I could do this with a calculated column inside PowerPivot of course, but the problem with this is that every time someone wanted to work with this data in a new PowerPivot model they’d have to repeat all this work, which is a pain, and clearly some users wouldn’t have the DAX skills to do this. The best thing to do would be to perform the ETL somewhere up in the cloud so everyone could benefit from it…

Enter Data Explorer. I created a simple mashup with the following steps:

  • Imported the data from the Google spreadsheet as a csv file
  • Cast that data as a table
  • Split the Timestamp column into two separate Date and Time columns
  • Added new columns to the table for each version of SSAS that contained the value True if that version had been checked in a particular response, False if not

image

Apart from the usual struggles that go with learning a new language, it was pretty straightforward and I was impressed with how easy it was to use. Here’s an example of an expression that adds a new column to show whether the respondent checked the “OLAP Services” box in the final question:

= Table.AddColumn(#"Rename Date Time", "Used OLAP Services", each if Text.Contains([#"What versions of Analysis Services have you used?"],"OLAP Services") then "True" else "False")

Finally, I published the output of the mashup publicly. This page contains all of the links to download the live data in various different formats:

https://ws18615032.dataexplorer.sqlazurelabs.com/Published/Chris%20Webb%20Questionnaire%20Demo

image

If you filled in the questionnaire you should be able to find your responses in there because it’s a live feed.

And you can of course import the data into PowerPivot now very easily, for example by using the OData feed from Data Explorer. First, start Excel, go into the PowerPivot window and click on the “From Data Feeds” button:

image

Then, in the wizard, enter the URL of the OData feed:

image

And you should then have no problems importing the data:

image

…and then analysing the responses. You will want to create a simple measure with a definition something like this to do so to count the number of responses:

=COUNTROWS(‘Questionnaire Data’)

image

I’m looking forward to seeing the data come flooding in!

This approach could easily be applied to scenarios such as analysing feedback from user group meetings or events, and what with the number of online data sources out there there must be hundreds of other potential applications as well. And given that anyone can now publish and sell data on the Windows Azure Marketplace there must be ways of making money from this too…

Written by Chris Webb

December 7, 2011 at 10:23 am

Posted in Data Explorer, ETL

User Group Dates and Online PowerPivot Courses

leave a comment »

It’s a bit short notice, but I thought I’d mention I’m doing an SSAS session at the Maidenhead SQL Server User Group (in the UK) tomorrow night, so if you’re in the area please come along! More details and registration here:

http://sqlserverfaq.com/events/340/Maidenhead-UG-meeting-with-Chris-Webb-and-David-Morrison.aspx

I’m also speaking at the Southampton SQL Server User Group on January 11th next year, although I don’t think registration is open for that yet, and hopefully I’ll get down to Exeter  some time early next year too.

Also, if you’re looking for PowerPivot training then I can highly recommend my friends Marco and Alberto’s course which is going to be run online (at times convenient for those of you living in the Americas) on December 19th-20th. More details can be found on Marco’s blog, and on the PowerPivot Workshop site

Written by Chris Webb

December 5, 2011 at 11:42 am

Posted in Events

Follow

Get every new post delivered to your Inbox.

Join 2,867 other followers