Archive for June 2010
The Silverlight PivotViewer Control (as Live Labs Pivot is now officially called) has just been officially released. There’s loads of great content on the website here:
There’s also a good post on how to use the Pivot Collection Tool for Excel here:
No sign of that tool for creating collections from SSRS that was demoed at TechEd yet, though.
It is a truly beautiful piece of software and puts to shame all of Microsoft’s previous attempts at BI client tools, although of course it doesn’t actually integrate with any of the rest of Microsoft’s BI stack (I’ve asked a question on the PivotViewer forum about whether there are any plans to fix this here – it really needs to happen). It’s also proves something I’ve said on this blog several times over the years: that the lessons learned in the business intelligence world for visualising and analysing large data sets could bring many benefits to the world of search. Look at this real example of how the PivotViewer control can be used to search for wedding venues in the UK, for instance:
And wouldn’t it be cool if you could use it to browse through the contents of your file system in the way I showed with Excel and PowerPivot recently?
Before I start, I need to declare that since Michael Brönnimann (one of the guys at IBAX behind this product) is a friend of mine I can’t really be considered an unbiased reviewer of this product; however, since I think it’s interesting and deserves wider exposure, I did think it was worth blogging about. In fact I’ve more or less given up reviewing products and books on this blog because I know so many people in the SSAS world that it makes impartiality on my part very difficult; it’s just easier to flag up interesting stuff and not try to make any kind of critical assessment.
Anyway, back to the matter in hand: a look at the IBAX SSAS Value Pack. The product’s web page is here:
Basically, it’s a template Analysis Services solution that contains solutions to many cube design and MDX calculation problems that you can adapt for your own project. The idea is that, rather than building everything yourself from scratch, if you bring your own data into an existing solution you’ll be able to build something that’s very feature-rich very quickly without having to spend too much time worrying about the design and implementation of the difficult stuff. And the template itself certainly is very feature rich: it has a full set of time series calculations, many other financial and analytical calculations, different types of currency conversion, showing measures in different scales, formatting and colour highlighting, actions and drillthrough.
The next question is, of course, will you actually get any value from a product like this (assuming you don’t buy it as a service offering)? I mean, if you don’t know how to implement this stuff in the first place are you going to be able to adapt someone else’s code to your own requirements? Well, there’s a lot of documentation and some introductory training included in the package to help get you up to speed; and I think if you’re going to learn how to learn SSAS it always helps to have some worked examples to try to understand; and even if you throw away 80% of what you get, the remaining 20% will have still proved useful. If the alternative is to spend months and months developing something, and then finding you’ve made some fundamental mistake in the design early on that you can’t now change, then it’s got to be better.
Arguably Microsoft should be providing something like this to its customers and partners; there have been various service-offerings and solution accelerators developed for SSAS over the years but nothing much has ever come of them. The Add Business Intelligence Wizard in BIDS does something similar too, but again that never seemed to be a development priority and it suffered in the earliest releases of SSAS 2005 from generating MDX that didn’t follow best-practices or, worse, just didn’t work. Adventure Works is probably the closest comparison, and although it does a good job as a sample database it’s not really a template for an enterprise solution. So it’s up to third parties like IBAX to fill the gap in the market…
It’s SQLBits time again! Here’s our press release with all the details:
The SQLBits committee is pleased to announce that the next SQLBits conference, SQLBits 7, will be taking place from September 30th to October 2nd at York University. SQLBits is the largest SQL Server conference in Europe and the previous six conferences have established it as a must-attend event for DBAs, developers and BI professionals who work with Microsoft SQL Server in the UK and Europe. This SQLBits will be the biggest ever, with capacity for over 500 attendees.
This time SQLBits will be running the multi-day format that was so successful in Manchester and South Wales last year. Thursday 30th will be a training day, featuring in-depth, full-day seminars delivered by a number of well-known SQL Server professionals; Friday 1st will be a deep-dive conference day with advanced sessions delivered by the best speakers from the SQL Server community; and Saturday 2nd will be the traditional SQLBits community conference day, with a wide range of sessions covered all aspects of SQL Server at all levels of ability. There will be a charge to attend days one and two, but day three will, as usual, be completely free to ensure everyone can have the opportunity to attend. The conference will be held at the Heslington Campus of York University, a modern venue easily accessible from all parts of the country by road and rail.
Session submission is open now and we are looking for presentations on the seven main parts of the SQL Server toolset: the SQL Server relational database engine and TSQL, Integration Services, Reporting Services, Analysis Services/PowerPivot, Master Data Services, StreamInsight and Parallel Data Warehouse. As a community organisation we are keen to have as many people involved as possible, so we welcome submissions from new or inexperienced speakers. To submit a session, please go to http://sqlbits.com/information/SessionSubmission.aspx.
If you have a SQL Server-related product or service, SQLBits is a unique way of promoting it to hundreds of SQL Server professionals and decision makers. We can tailor sponsorship packages for SQLBits to your specific business needs; anyone interested in sponsoring the conference can get more details by emailing us at firstname.lastname@example.org.
Just to sum up – speakers and sponsors, we need you!
The 2008 R2 version of Best Practices Analyzer has just been released – you can download it here:
I’ve not looked at it yet (it was only released a few hours ago) but I’m told it features an updated set of rules for SSAS. Running the BPA over your existing cube is a good way of spotting bad things you might have done by accident or because you didn’t know they were bad.
My whole life is on my laptop hard drive, and as a consequence my laptop hard drive has a lot of stuff on it. Luckily, when I need to find something, there’s always Windows Search to help me out – but even so a single search query can return a lot of data. How can we make sense of all the data in the Windows Search Index? Well, we can use PowerPivot of course!
I first got the idea for doing this when I saw ‘Microsoft OLE DB Provider for Search’ in my list of OLE DB Providers; a quick look around on the net revealed the following useful sources of information about it:
So if you can query Windows Search using SQL, I thought, then I should be able to take the data that is returned from running one of these SQL queries and load it into PowerPivot. And after a lot of trial and error, I managed it – and it works rather well. Here’s how to do it…
First of all, you need to make sure you have the OLE DB Provider for Search installed. If you don’t, you need to download and install the Windows SDK. Then you can open up a new Excel workbook and open the PowerPivot window. Next you need to create a new connection by going to the Home tab, clicking the Get External Data/From Other Sources button, and then clicking the Others(OLEDB/ODBC) option.
Next, type in the following connection string:
Do not try to click the Build button and select the provider from the list – when I did this, I got the error “Provider is no longer available. Ensure that the provider is installed properly”. Next, choose the “Write a query to specify the data to import” option (again, if you choose the “Select from a list of tables…” you’ll get an error) and enter your SQL query.
Here’s where the fun begins. From the two links above, you can see that there are loads of possibilities as to what you can query. Here’s a sample query that returns a list of all the files in the Documents folder on my c:\ drive and below, along with their file types, the folder path and the file size in bytes:
CAST(System.ItemFolderPathDisplay as DBTYPE_BSTR),
CAST(System.ItemName AS DBTYPE_BSTR)
WHERE SCOPE=’file:C:\Users\Chris Webb\Documents’
Notice that I’ve had to cast some columns to DBTYPE_BSTR – I found that if I didn’t do this, the columns simply didn’t appear in the query results in PowerPivot. Other things you can do here in the query include searching for all items that include particular words or phrases, or which are above a certain size, or have a particular file extension.
With that done, you’re good to go. In a pivot table you can slice and dice all the data returned to your heart’s content. Here, for example, are the top five files with the .ABF extension (ie SSAS backup files) from my c:\ drive:
With a separate time dimension table joining to System.DateCreated you can do even more. Here’s the total size of files on my c:\ drive in bytes broken down by the year they were created:
You can also use the DAX time intelligence functionality. I added a running total calculation that shows the growth in the total size in MB of all files, over time, based on the creation date of each file. Here’s the formula:
=CALCULATE(SUM(Files[SYSTEMSIZE]), DATESBETWEEN(Time[Date], BLANK(), LASTDATE(Time[Date])))/1024/1024
This chart shows that running sum from November 2008, when I bought the laptop, to today:
There are plenty of tools out there that help you analyse this type of data but I doubt any of them can do what the PowerPivot/Excel combo can do. And it’s this kind of personal BI that PowerPivot makes easy. The only thing missing is an API which would allow you to build the SQL query used here dynamically: imagine having an interface where users could type their own search terms and then be able to analyse the results in PowerPivot at the click of a button. Hopefully PowerPivot will get an API soon. And as I’ve said before in the past, wouldn’t it be cool if Bing could do this kind of thing with web search results and data found on the web?
I wasn’t able to go to the MS BI Conference/TechEd in New Orleans this year, unfortunately, but I’ve just watched Ted Kummert’s keynote from today and it’s got lots of interesting new BI-related stuff in it. You can view it here:
If you don’t have any patience with the normal keynote backslapping, I suggest you forward to around 01:20:00 when Amir Netz comes on stage to do some demos. Here’s what he shows:
- 01:25:20 Pivotviewer Extensions for Reporting Services. As far as I can see, this is a new bit of SSRS functionality that does something like the following: execute a data-driven subscription to generate multiple SSRS reports, then load them into the new Silverlight control-based version of Live Labs Pivot, so the output of these reports can be analysed. This will be available in the next thirty days, so I guess early July. It makes for a very visually appealing demo but if I’m right about what’s happening here then it seems a bit of a hack. Live Labs Pivot needs to be properly integrated into the BI stack!
- 01:32:30 the ability to define KPIs in PowerPivot. The funny thing is, ever since KPIs were introduced in SSAS 2005 I have worked with hundreds of companies using SSAS and I have never, ever seen anyone using KPIs in production! But this is slightly different and it’s got a nice visual designer, so I can see how KPIs in PowerPivot would be very useful.
- 01:34:45 new record view for viewing data in the PowerPivot designer, a better way of working with wide tables and editing the calculations on them. Handy.
- 01:36:30 importing a PowerPivot solution, from Excel, into BI Development Studio. You get the same PowerPivot designer UI in BIDS as you get in Excel, but now you are able to use source control, develop offline and so on.
- 01:37:55 lineage and impact analysis for DAX calculations – a diagram showing the dependencies between DAX calculations. Good, but lineage and impact analysis is needed for much more than just DAX calculations – it needs to cover everything from the structure of the relational data warehouse to SSIS, SSAS and SSRS reports.
- 01:39:30 2 billion rows of data loaded into a server-based instance of SSAS running in in-memory mode, from a project developed in BIDS. And of course, it’s fast!
So only a few hints at how Vertipaq will be used in corporate BI scenarios, but what’s here is encouraging…
So far this year I’ve indulged myself a few times in a bit of futurology (here and here, for example) regarding directions the Microsoft BI stack might take. The one area I haven’t touched on recently, though, is what Analysis Services in the cloud might look like; I did speculate a bit here but that was a while ago now and before several relevant technologies had been announced. It’s certainly coming, and presumably somebody somewhere is working on it right now in some top-secret bunker in Redmond, so maybe a few public comments on what we the user community would want from it would be helpful…? Anyway, welcome or not, here are some thoughts…
So why would you want or need Analysis Services in the cloud rather than regular Analysis Services? I can think of a few things it should be able to do to justify its existence:
- It should be cheaper than hosting all the infrastructure in-house and it should be scalable to the Nth degree. OK, so these are the standard reasons dragged out for cloud-based anything, but with Analysis Services there are two obvious times when resource usage peaks – processing and when a big/complex query runs – and equally there are times when the server can be completely quiet; so the idea of being able to make use of near infinite resources when you need them to make the processing/querying super-quick, but only pay for what you use, is very appealing. From this point it follows that when you need to use extra resources, you need a platform that can scale to be able to make use of those resources.
- As well as being able to work with ‘traditional’ data sources such as your corporate data warehouse, it should be able to work with cloud-based data sources be they relational or non-relational (like Amazon SimpleDB, Google’s recently-announced BigQuery, Azure Table Storage and all the rest), feeds (like OData or GData), linked data (RDF), web-based spreadsheets like Google Docs or the Excel web app, or completely unstructured data from anywhere on the web (maybe something like how Google Squared works). Supporting the integration of data modelled for all of these different types of database would be a challenge but I think it should be possible.
- It should be available as a data source for anywhere else on the web – your own apps, reports, web-based spreadsheets and so on – as well as desktop apps like Excel. The really important thing, for me at least, would be for it to expose an XMLA interface to allow ad-hoc querying (note that Excel can’t talk direct to an XMLA provider, it only does OLEDB for OLAP, but it’s possible to bridge the two and Simba already sell an OLEDB provider that does this); grudgingly, I’ll admit a SQL query interface would be useful too. The ability to expose data via an OData feed would be a must as well.
- I’d also like to see it support some basic ETL functionality too, because if it is as scalable and fast as I’d like to be then it would have an obvious secondary use for large-scale number crunching – aggregating data, doing lookups, sorting, many of the things that you might do today in the SSIS data flow or which you might look at Hadoop to do. Derived columns and lookups could all be done with DAX or MDX calculations; pivoting, sorting and filtering could all be done (and configured very easily with a good client tool) through the right MDX query. I can imagine it acting as a datasource for itself: you’d load data into a cube or a table or whatever, create a query on top of it which is made available as a feed, then take the data from that feed and load it into another cube/table, and so on.
- Following on from the last two points, it’s not enough to be able to act as a data source, Microsoft would need to come up with a decent web-based client tool specifically for use with it. And no, vanilla pivot tables on the web wouldn’t cut it, nor would SSRS in the cloud (not that that I wouldn’t want that) – you’d need to have the wow factor that something like Live Labs Pivot has as well as serious, power-user functionality like the Proclarity desktop client; it would probably need to be built using Silverlight or HTML5. I still think there’s an opportunity to rethink what a client tool could be here, blur the line between BI client tool, spreadsheet and database and come up with something really new.
I wouldn’t want, and don’t expect to get, a recognisable version of Analysis Services 2008 in the cloud in the way that SQL Azure is recognisable as server-based SQL Server. While I still see an important role for Analysis Services as we have it today as in corporate BI scenarios I don’t think there’s any point transferring it to the cloud with exactly the same functionality. Some things, like dimension security, would still be needed, but some things, like cell security, we could probably live without. PowerPivot in the cloud would make more sense as a starting point, so long as it was not just a straight copy of PowerPivot on the desktop: the ability to scale to really, really large data volumes, as in the first bullet above, would be the key feature and the only real reason why customers would want BI in the cloud. And it’s not just the scalability of simple queries either – queries that use complex calculations would need to scale too. You know what, though? When I look at DAX I can’t help but think it was designed with this requirement in mind; I can see how the evaluation of DAX expressions could be easily parallelised.
So all this sounds pretty ambitious, even a bit pie-in-the-sky. The way I see it, though, as far as BI-in-the-cloud goes there’s everything still to play for and if Microsoft doesn’t deliver then someone else will. Could it be Google, or Amazon, or some startup we’ve never heard of? Now that Google BigQuery has a SQL interface, it’s only going to be a matter of time before someone builds a BI app on top of it (I wonder if Mondrian can be made to work with it?) – and it certainly seems to be fast. Microsoft needs to think beyond using BI to defend the Excel/Sharepoint franchise and start thinking about the future! With ten years of BI experience behind it, Microsoft should be in a strong position to move forward into the cloud but it’s only going to succeed if it’s innovative. I understand there will be some new PowerPivot-related product announcements at the BI Conference this week; I’m keeping my fingers crossed.
As always, your comments and ideas are welcome…
In my last post on DAX, I mentioned I’d come across the following error when running a query in PowerPivot:
ERROR – CALCULATION ABORTED: MdxScript(Sandbox) (22, 4) Function ‘DATEADD’ only works with contiguous date selections.
The DAX expression I was using was as follows:
, DATEADD(DimDate[FullDateAlternateKey], -1, YEAR)
And the error was happening when I ran queries with Dates on either rows or columns and was using a slicer that had multiple years selected but only a few months, for example like this:
Clearly I had a non-contiguous date selection: I was expecting to see all the dates in January and February 2003 on rows, followed by all the dates in January and February 2004. But I couldn’t understand why I was getting an error on what, to me, seemed like a perfectly reasonable query!
So I asked my friends at Microsoft and Marius Dumitru explained that this wasn’t really an error, it was actually a deliberate feature designed to stop users running queries where calculations would return misleading results, or where the semantics of what was actually happening would be hard to understand. I can understand the motivation for doing this – for example, consider what would happen in the scenario above if there was no error, and I didn’t have dates on rows or columns – no time intelligence calculation would return any meaningful values here. But I still maintain that my scenario above should return values, because in this case the slicers are only serving to filter which dates are being displayed on a visible axis, and when the dates themselves are visible on an axis then it makes sense to see the result of the calculation.
Time to open a Connect, then, in the hope that this scenario will be allowed in a future version. Please vote here: