Archive for the ‘Random Thoughts’ Category
Some time ago I received a review copy of a book called “Ethics Of Big Data” from O’Reilly; I didn’t get round to writing a review of it here for a number of reasons but, despite its flaws (for example its brevity and limited scope), it’s worth reading. It deals with the ethics of data collection and data analysis from a purely corporate point of view: if organisations do not think carefully about what they are doing then
“Damage to your brand and customer relationships, privacy violations, running afoul of emerging legislation, and the possibility of unintentionally damaging reputations are all potential risks”
All of which is true, although I think what irked me about the book when I read it was that it did not tackle the wider and (to my mind) more important question of the social impact of new data technologies and their application. After all, this is what you and I do for a living – and I know that I haven’t spent nearly enough time thinking these issues through.
What prompted me to think about this again was a post by Adam Curtis which argues that the way that governments and corporations are using data is stifling us on a number of levels from the personal to the political:
“What Amazon and many other companies began to do in the late 1990s was build up a giant world of the past on their computer servers. A historical universe that is constantly mined to find new ways of giving back to you today what you liked yesterday – with variations.
Interestingly, one of the first people to criticise these kind of “recommender systems” for their unintended effect on society was Patti Maes who had invented RINGO. She said that the inevitable effect is to narrow and simplify your experience – leading people to get stuck in a static, ever-narrowing version of themselves.
Stuck in the endless you-loop.”
Once our tastes and opinions have been reduced to those of the cluster the k-means algorithm has placed us in we have become homogenised and easier to sell to, a slave to our past behaviour. Worse, the things we have in common with the people in other clusters become harder to see. Maybe all of this is inevitable, but if there is going to be an informed debate on this then shouldn’t we, as the people who actually implement these systems, take part in it?
Today marks eight years since my first ever post on this blog, and every year on this date I write a review of what’s happened to me professionally and what’s gone on in the world of Microsoft BI in the previous year.
For me, 2012 has been yet another busy year. The SSAS Tabular book that Marco, Alberto and I wrote – “SQL Server Analysis Services 2012: The BISM Tabular Model” – was published in July and has been selling well, and the balance of my consultancy and training work has started to move away from Multidimensional and MDX towards Tabular, PowerPivot and DAX. It’s always exciting to learn something new and, frankly, the lack of any significant new functionality in Multidimensional and MDX has meant they have got a bit boring for me; at the same time, though, moving out of my comfort zone has been disconcerting. It seems like I’m not the only Microsoft BI professional feeling like this though: the most popular post on my blog by a long chalk was this one on Corporate and Self-Service BI, and judging by the comments it resonated with a lot of people out there.
Whether or not Microsoft is neglecting corporate BI (and I’m not convinced it is), it’s definitely making a serious investment in self-service BI. The biggest Microsoft BI release of this year was for me not SQL Server 2012 but Office 2013. That’s not to say that SQL Server 2012 wasn’t a big release for BI, but that Office 2013 was massive because of the amount of functionality that was packed into it and because the functionality was so well executed. You can read this post if you want details on why I think it’s significant, but I’ve really enjoyed playing with Excel 2013, PowerPivot, Power View and Office 365; there’s more cool stuff in form of Mobile BI, GeoFlow and Data Explorer coming next year, all of which are very much part of the Office 2013 story too. No Microsoft BI professional can afford to ignore all this.
The other big theme in Microsoft BI this year, and indeed BI as a whole, was Big Data. I reckon that 90% of everything I read about Big Data at the moment is utter b*llocks and as a term it’s at the peak of its hype cycle; Stephen Few has it right when he says it’s essentially a marketing campaign. However, as with any over-hyped technological development there’s something important buried underneath all the white papers, and that’s the increasing use of tools like Hadoop for analysing the very large data sets that traditional BI/database tools can’t handle, and the convergence of the role of business analyst and BI professional in the form of the data scientist. I’m still not convinced that Hadoop and the other tools that currently get lumped in under the Big Data banner will take over the world though: recently, I’ve seen a few posts like this one that suggest that most companies don’t have the expertise necessary for using them. Indeed, Google, the pioneer of MapReduce, felt the need to invent Dremel/BigQuery (which is explicitly referred to as an OLAP tool here and elsewhere) to provide the easy, fast analysis of massive datasets that MapReduce/Hadoop cannot give you. My feeling is that the real future of Big Data lies with tools like Dremel/BigQuery and Apache Drill rather than Hadoop; certainly, when I played with BigQuery it clicked with me in a way that Hadoop/HDInsight didn’t. I hope someone at Microsoft has something similar planned… or maybe this is the market that PDW and Polybase are meant to address? In which case, I wonder if we’ll see a cloud-based PDW at some point?
I read an interesting article by Stephen Swoyer today on the TWDI site today, about a new Gartner report that suggests that companies should start selling the data they collect for BI purposes to third parties via public data marketplaces. This is a subject I’ve seen discussed a few times over the last year or so – indeed, I remember at the PASS Summit last year I overheard a member of the Windows Azure Marketplace dev team make a similar suggestion – and I couldn’t resist the opportunity to weigh in with my own thoughts on the matter.
The main problem that I had with the article is that it didn’t explore any of the reasons why companies would not want to sell the data they’re collecting in a public data marketplace. Obviously there are a lot of hurdles to overcome before you could sell any data: you’d need to make sure you weren’t selling your data to your competitors, for example; you’d need to make sure you weren’t breaking any data privacy laws with regard to your customers; and of course it would have to be financially worth your while to spend time building and maintaining the systems to extract the data and upload it to the marketplace – you’d need to be sure someone would actually want to buy the data you’re collecting at a reasonable price. Doing all of this would take a lot of time and effort. The main hurdle though, I think, would be disinterest: why would a company whose primary business is something else start up a side-line selling its internal data? It has better things to be spending its time doing, like focusing on its core business. If you sell cars or operate toll roads why are you going to branch out into selling data, especially when the revenue you’ll get from doing this is going to be relatively trivial in comparison?
What’s more, I think it’s a typical piece of tech utopianism to think that data will sell itself if you just dump it on a public data marketplace. Maybe apps on the Apple App Store can be sold in this way, but just about everything else in the world, whether it’s sold on the internet or face-to-face, needs to be actively marketed and this is something that the data generators themselves are not going to want to make the effort to do. As I said earlier, those companies that are interested in selling their data will still need to be careful about who they sell to, and the number of potential buyers for their particular data is in any case going to be limited. Someone needs to think about what the data can be used for, target potential customers and then show these potential customers how the data can be used to improve their bottom line.
For example, imagine if all the hotels around the Washington State Convention Centre were to aggregate and sell information on their bookings for the next six months into the future to all the nearby retailers and restaurants, so it was possible for them to predict when the centre of Seattle would be full of wealthy IT geeks in town for a Microsoft conference and therefore plan staffing and purchasing decisions appropriately. In these cases a middle man would be required to seek out the potential buyer and broker the deal. The guy that owns the restaurant by the convention centre isn’t going to know about this data unless someone tells him it’s available and convinces him it will be useful. And just handing over the data it isn’t really good enough either – it needs to be used effectively to prove its value, and the only companies who’ll be able to use this data effectively will be the ones who’ll be able to integrate it with their existing BI systems, even if that BI system is the Excel spreadsheet that the small restaurant uses to plan its purchases over the next few weeks. Which of course may well require outside consultancy… and when you’ve got to this point, you’re basically doing all of the same things that most existing companies in the market research/corporate data provider space do today, albeit on a much smaller scale.
I don’t want to seem too negative about the idea of companies selling their data, though. I know, as a BI consultant, that there is an immense amount of interesting data now being collected that has real value to companies other than the ones that have collected it. Rather than companies selling their own data, however, what I think we will see instead is an expansion in the number of intermediary companies who sell data (most of which will be very small), and much greater diversity in the types of data that they sell. Maybe this is an interesting opportunity for BI consultancies to diversify into – after all, we’re the ones who know which companies have good quality data, and who are already building the BI systems to move it around. Do public data marketplaces still have a role to play? I think they do, but they will end up being a single storefront for these small, new data providers to sell data in the same way that eBay and Amazon Marketplace act as a single storefront for much smaller companies to sell second-hand books and Dr Who memorabilia. It’s going to be a few years before this ecosystem of boutique data providers establishes itself though, and I suspect that the current crop of public data marketplaces will have died off before this happens.
After yesterday’s stream of consciousness on how PowerPivot could be used in SSRS, here’s a follow-up post on how PowerPivot and ‘traditional’ SSAS could be integrated. Hold on, you say, surely that’s a no-brainer? Surely all that would need to happen would be that Vertipaq would become a new storage mode inside SSAS, along with MOLAP, ROLAP and HOLAP, and everyone would be happy? Well, maybe. But here’s alternative idea that I bounced off some friends a while back and got good feedback on, which I thought I’d air here.
Before I go on, let me state my position on some things:
- I like PowerPivot, and the more I use PowerPivot the more I like it.
- I really like the power of the Vertipaq engine, and I want to be able to use it in a corporate BI environment.
- I really like DAX, and I want to be able to use it in a corporate BI environment.
- BUT SSAS as we have it today is a very mature, rich tool that I don’t want to lose. PowerPivot models will always be a little rough-and-ready; a good SSAS cube is a lot more ‘finished’ and user-friendly (I always liken building a cube to building a UI). SSAS dimension security is, for example, an absolute killer feature in many corporate BI solutions; PowerPivot won’t have anything like this until at least the next version, whenever that will be.
- I also love MDX and I don’t want to lose it. MDX Script assignments, calculated members on non-measures dimensions, all of the things that PowerPivot and DAX can’t do (and probably won’t ever do) are things that I use regularly and in my opinion are essential for many complex, enterprise BI implementations.
- I don’t want the dev team to abandon corporate SSAS, and neither do I want the dev team to waste time re-implementing things in PowerPivot that we already have in corporate SSAS. Already people are asking when they can have security and partitioning in PowerPivot. I want new stuff though!
So, like all users I want absolutely everything possible. How could it be done? Here’s my basic idea: let us be able to build regular SSAS cubes using PowerPivot models as data sources, with SSAS working in something similar to ROLAP mode so every request for data from the cube is translated into an MDX (or SQL – remember SSAS, and presumably PowerPivot, supports a basic version of SQL) query against the PowerPivot model.
In more detail, let’s imagine we have an instance of SSAS running in Vertipaq mode and an instance of SSAS running in normal mode. You’d be able to do the following:
- Fire up BIDS and create a new SSAS project.
- Create a data source, which was a PowerPivot database on your Vertipaq instance of SSAS.
- Create a new Data Source View, which showed all of the tables in your PowerPivot database already joined. Alternatively, here I can imagine connecting to other data sources like SQL Server, creating a Data Source View as normal and then taking the DSV and deploying it as a PowerPivot model onto the Vertipaq instance of SSAS. So in effect, the DSV designer becomes a development environment for PowerPivot models.
- Create a regular SSAS cube in the usual way, only using the PowerPivot tables in the DSV.
- Set the storage mode of your dimensions and partitions to the new ROLAP-like storage mode; each SSAS partition could then be based on a separate PowerPivot table. This would mean that when you queried the cube, the SSAS instance issued MDX or SQL queries against the Vertipaq instance of SSAS, just as it issues SQL queries in ROLAP mode today. I suppose though there would be an overhead to making an out-of-process call, so maybe it would be better if you only had one instance of SSAS that could host both Vertipaq and regular SSAS databases at the same time, so all these requests could stay in-process.
The first, obvious, point here is that with this approach we get the traditional, rich SSAS cubes that we know and love and the raw speed of Vertipaq. So one objective is achieved. But I think there would be a lot of other benefits:
- You’d get two cubes for the price of one: the PowerPivot cube and the SSAS cube. You could choose which one to query depending on your needs.
- The ability to turn DSVs into PowerPivot models also gives you a proper development environment for creating PowerPivot models, integrated with BIDS and Visual Studio (so you also get source control). The current Excel-based UI is all very well, but us developer types want a nice visual way of creating relationships between tables.
- You’re able to use all of the new data sources that PowerPivot can work with in traditional SSAS. Imagine being able to create a planning and budgeting solution where users wrote values into an Excel Services spreadsheet, which then fed into PowerPivot via the new Excel Services REST API, which then in turn fed into a SSAS planning and budgeting cube complete with custom rollups and all the complex financial calculations you can only do in MDX.
- If your users have already built an existing PowerPivot model that they like and want to turn into an ‘official’ BI solution, you can very easily take that model as the starting point for building your cube by importing it into a DSV.
- It would also make it relatively easy to upgrade existing SSAS projects to use PowerPivot storage – you’d just convert your existing DSV into a PowerPivot model.
- SSAS drillthrough would be much, much faster because you’d be drilling through to the PowerPivot model and not the underlying relational source.
- You’d also have the possibility of working in something like HOLAP mode. Vertipaq may be fast, but with really large data volumes some pre-calculated aggregations are always going to be useful.
- You could define calculated measures in DAX in the PowerPivot model, and then expose them as measures in the SSAS cube. Probably you’d need some special way of handling them so they didn’t get aggregated like regular measures, but in some cases you’d want to take a calculated measure and sum it up like a regular measure (kind of like SQL calculations defined in named calculations today); many more calculations, like year-to-dates, can be treated as semi-additive measures. Effectively this means you are performing some multidimensional calculations outside the Formula Engine, in the SSAS Storage Engine (which in this case is PowerPivot), in the same way I believe that measure expressions work at the moment.
- For such additive and semi-additive calculations, it also opens up the possibility of parallelism since these calculations can be done in parallel in each partition and the result summed at the end. It also means you get the option to use either DAX or MDX, and can choose the right language for the job.
- There’s no duplication of dev work needed. For users of PowerPivot who want features like security, partitioning or parent/child relationships, you tell them they have to upgrade to regular SSAS; PowerPivot becomes something like SSAS Express. For users of SSAS who want the speed of Vertipaq, you tell them they have to use a PowerPivot database as their data source. The two complement each other nicely, rather like twins… now where have I heard that analogy before?
- You also have a convincing story for BI professionals who are sceptical/hostile to PowerPivot to win them over: traditional, corporate SSAS does not go away but is able to build on the new features of PowerPivot.
So there we have it, another fantasy on the future of the MS BI stack sketched out. You may be wondering why I’ve taken the time to write these two posts – after all, I don’t work for Microsoft and I’m sure plenty of people on the dev team have their own ideas on what features they want to implement for Denali. Well, as the saying goes, if you don’t ask you don’t get! And with Kilimanjaro almost out of the door now’s the time to ask. If you agree with what I’ve said here, or you disagree, or you have a better idea, please leave a comment…
I’ve been doing a fair amount of work with SSRS over the last few days, and with PowerPivot also fresh in my mind it got me thinking about the amount of overlap between SSRS and the PowerPivot/Excel/Sharepoint stack. Of course anyone who’s had to try to sell a MS BI solution to a potential customer over the last few years will have had to deal with conversations like this:
Customer: So, what is Microsoft’s solution for building BI dashboards?
Consultant: Well there’s SSRS, or if you want to build an SSAS cube you can use PerformancePoint, or maybe Excel and Excel Services, or you can go with any of these 50 third-party tools…it depends…
Customer: I’m confused already!
But just about any large software company has a certain amount of overlap between their products, that’s just life. However, that doesn’t mean that sometimes some rationalisation of products isn’t a good idea.
Let’s take a look at some of the things you’d want to do when building a dashboard, and how you can achieve them with both stacks:
|Requirement||SSRS||PowerPivot/ Excel/ Sharepoint||Comments|
|Get data from a number of different sources||Create data sources and then datasets to return the data you want||Import data into PowerPivot ‘tables’ from Excel, RDBMSes, OData feeds||There’s a slight difference between the data sources here, but the most important case is always going to be getting data from a RDBMS, which both do well.
The key difference, though, is that in general with SSRS you get data on demand through parameterised queries, whereas with PowerPivot you import all the data you’re ever likely to need up front.
|Integrate that data||No real solution here, though SSRS developers have wanted to be able to do joins on datasets for a while. The new R2 lookup functions partly address this.||Create joins between PowerPivot tables||PowerPivot has the obvious advantage here, although for SSRS you can argue that in most cases any data integration should be happening upstream in your ETL.|
|Perform calculations on that data||Use SSRS expressions||Use DAX calculations||I’d say that SSRS expressions, while as not powerful as DAX, are easier for most people to understand; however there are a lot of things that only DAX can do.|
|Create reports from that data||Use BIDS if you’re a developer, or Report Builder if you’re a power user||Use Excel or any client tool that speaks MDX (including SSRS)||For developers, BIDS is a great tool for creating reports. However SSRS has always struggled with Report Builder – in my experience users find it too difficult. And that’s where Excel comes into its own: it’s a powerful tool and most end-users are familiar with it.|
|Publish reports to a wider audience||Deploy up to your SSRS server||Publish to Excel Services/Sharepoint||The advantage SSRS has here is that most companies have no problem with the IT department setting up an SSRS server. On the other hand, Sharepoint is a Big Deal. If your company has a Sharepoint strategy, and is planning on installing Sharepoint 2010 Enterprise Edition, you’ll be fine with PowerPivot. If not, and I guess many companies are in this position, you have a problem.|
|Export reports to a variety of formats||SSRS handles exporting to a lot of different formats||Export to Excel isn’t a problem, but other formats are a bit trickier (though doable)||SSRS has the clear advantage here|
|Schedule report refresh||Again, SSRS has a lot of options for controlling when reports are refreshed||PowerPivot’s functionality for scheduling when data is refreshed is a bit v1.0||SSRS has the advantage again|
Anyway, you get the idea – there’s a fair amount of overlap and some things are done better by one tool, some things are done better by the other. Isn’t it, though, a bit of Microsoft’s time, money and energy to develop two parallel BI stacks? If they could merge in some way, it would mean less effort spent developing duplicate functionality and a more coherent message for customers to hear.
How could this be done, you ask? Well here are some vague ideas I had about what you’d need:
- Inside SSRS – BIDS as well as Report Builder – in addition to the existing functionality for bring data into datasets, and possibly in the long term as a replacement for it, you get the option of building a PowerPivot model to act as the main source of data for your reports. For Report Builder especially I think this would be a winner, given that the PowerPivot UI for building models is already aimed at the same power users that Report Builder is aimed at.
- The fact that you need to load all of your data into a PowerPivot model upfront is both an advantage and a disadvantage, depending on your scenario. When you know the data’s not going to change much, or you’ve got relatively small amounts of data, it’s good because you get joins and fast query performance. But if the data changes a lot or you don’t want the overhead of loading it into PowerPivot then you’d need the option to pass requests straight through PowerPivot back to your sources – so maybe PowerPivot would need something like ROLAP mode, or proactive caching, or the ability to make its tables work like existing SSRS datasets and send parameters to them.
- Include proper support for MDX queries in SSRS reports (my old hobby horse). This would involve developing a proper, fully-functional MDX query builder (not the rubbish one SSRS has right now – a standard MDX query generator across all MS products which was also available as a control for custom development would be ideal) and the ability to bind the results of an MDX query direct to a tablix (and no messing around with mapping field names to rowgroupthingies in the tablix control please). If power users didn’t have to worry about tablixes and could just build their queries as easily as they could in an Excel pivot table, Report Builder would be a much more popular tool. I think many developers would appreciate it too. Once all the data you need for your report is in a PowerPivot model, and you have full MDX support in SSRS, the business of report design is much easier. You also no longer need to join datasets because the data is already joined in PowerPivot, you have a powerful calculation language in DAX, and query performance is extremely fast. Oh, and with this functionality in place you could probably kill off PerformancePoint too and no-one would complain…
- Blur the line between Excel and SSRS. There’s been talk about being able to author SSRS reports in Excel for a long time (whatever happened to the Softartisans technology MS licensed?), but nothing’s ever come of it. Why not also have the option within SSRS to take a range from Excel Services and make that the body of your report? So your report is essentially still a fragment of an Excel worksheet, but it’s just surfaced via SSRS which then handles the refreshing and rendering to different formats.
- You’d also need SSRS to be able to schedule the refresh of your PowerPivot model, but that should be very doable; it would be great if it could refresh different parts of the model at different times. SSRS would also maintain control over report security, rendering, folders etc etc.
The end result would be that this PowerPivot/Excel/SSRS hybrid would give you the best of both worlds. I also have some ideas about how PowerPivot and SSAS should be integrated which I might get round to blogging about soon too, that would fit nicely with this vision of the future.
What are the chances of all this happening? Practically zero. It would involve the SSRS team, the SSAS team and the Excel team setting aside their differences, co-operating, and possibly sacrificing large chunks of different products. But it’s a nice thought to kick around…
The PASS Summit is over for another year and I’m just starting out on the long trip back home, so there’s plenty of time to get my thoughts together on what’s happened over the past week. In fact there’s not much to say about the event itself: it was, as ever, a lot of fun and totally worthwhile. Hey, within 30 minutes of arriving at the conference I learned I’d won an award for the best BI-related blog entry, for my post on implementing real SSAS drilldown in SSRS!
Attendance was up from last year although probably the recession still took its toll: remember that there was no BI Conference this year and I would have thought that a lot of people who would have gone to it would have gone to PASS instead. To be honest I think not having a BI Conference is a good thing, actually. I don’t like having to choose which conference to attend, and part of the benefit of a conference is to get as many members of a tech community together in one place. And this was certainly the largest gathering of Analysis Services people I’ve ever seen: all the usual crowd were there, I met a lot of people who I’d only met a few times before, and I finally got to meet Darren Gosbell in person after having known him by email for at least five years. One complaint I would make about the event was that the sessions weren’t scheduled particularly well. I know everyone always complains about this but in this case it did seem worse than usual: my session, for example, was up against two other SSAS-specific sessions, but in other cases there were time slots with no SSAS content at all.
The other benefit of PASS is that you get to talk at length about what’s going on in the world of SQL Server with other like-minded people. As a result you get to crystallise your thoughts on a lot of matters and – guess what – I’m going to share mine here.
First of all, the topic that was on everyone’s lips was PowerPivot. In fact everyone at the conference must have seen the standard demo at least five times and there were also a lot of advanced sessions on it too. Don’t get me wrong, I really think PowerPivot it cool from a technology point of view, I am going to take the time to learn it, and I also think from a make-money-by-getting-people-to-upgrade-to-Office-2010 point of view it is a very clever move for Microsoft. But my feelings about it remain ambiguous. Quite apart from the arguments about it discouraging ‘one version of the truth’ and encouraging spreadmarts that have already been discussed ad nauseam, I have another problem with it: I don’t honestly know whether I, as a consultant, will be able to make any money from it. The very nature of it, as a self-service tool, means no expensive outside consultancy is necessary. I don’t think it will take business away from me though; it will be widely used and it will be used instead of regular SSAS for more basic projects, but the more serious stuff will stay with SSAS I hope. I think the need for sophisticated security and more complex calculations will be the deciding factor when people choose between SSAS and PowerPivot; I’m not sure I see many people upselling from PowerPivot to SSAS either. We’ll see.
Something that worries me more about PowerPivot is the fact that it seems to have diverted the attention of the SSAS dev team. For SSAS 2008 we had few new features, although the performance improvements were very welcome. For 2008 R2 I can only think of one new feature in SSAS, and that’s the ability to use calculated members in subselects that will allow Excel 2010 to use time utility dimensions properly (I’ll blog about that at some point). Even though work on good old server-side SSAS will resume for the next major release of SQL Server I worry that PowerPivot will take priority in the future. If this happened it would be bad for me and other BI partners from a business point of view, and seems crazy given that SSAS has been such a successful product in the enterprise sector; it’s not like there aren’t a lot of new features and fixes that could be done. Shades of IE6 and Microsoft getting complacent once it’s cornered a market, I think.
Last of all on PowerPivot, I suspect that there is something new relating to it in the roadmap that hasn’t been announced yet. David DeWitt devoted his keynote on Thursday to it, the specifics of column-store databases and the Vertipaq engine (which is the new in-memory storage engine that PowerPivot uses), and at the end hinted at this saying that although he couldn’t make any announcements, those people who had been paying attention might have some ideas on what the future held for it. Of course I hadn’t been paying attention properly, but the obvious thing would be to integrate it with the relational database somehow. Given that PowerPivot is now being hosted inside Sharepoint, why not host it in SQL Server too? It’s already very table and join friendly, and I could imagine a scenario where it was used inside SQL, pointed at a schema, some kind of proactive caching kept the data in SQL in synch with the data in the Vertipaq store, difficult BI calculations could be expressed in DAX, but the whole thing was transparent to TSQL. Imagine integrating that with Madison too!
Moving on, the other thing that has become clear to me is that I really have to sit down and learn Sharepoint (or at least the relevant bits of it) properly. It’s at the heart of Microsoft’s BI strategy and there’s no avoiding it. I have to admit to some mixed feelings about this move though, and I know other people I talked to at the conference share them. Partly it’s because, in the past, there were BI specialists and there were Sharepoint specialists and we didn’t necessarily have much to do with each other; now, though, the two worlds are colliding and I’m outside my comfort zone. You might say that Sharepoint has been part of the MS BI strategy for ages now, what with PerformancePoint etc, but I see an awful lot of MS BI customers in my work and I very rarely seem to see any Sharepoint, although it could be because I’m not looking out for it. A more valid objection is that the need for Sharepoint Enterprise Edition CALs adds a lot of extra cost to a project; and from a technical standpoint Sharepoint itself carries a very big overhead – its installation and maintenance may put a lot of customers off if they don’t already have a company-wide Sharepoint strategy, and if they do have one they may not be willing to go to 2010 for some time. Sharepoint might be just too big for some customers to swallow, and be a difficult sell for BI partners.
I’d like to stress though, once again, that I see the considerable technical benefits for using Sharepoint for BI, and even if the reception of the latest wave of PerformancePoint has been somewhat muted (eg the realisation that the decomposition tree has been tacked on at the last minute and isn’t properly integrated) I am impressed with what’s coming with Excel 2010 and Excel Services too; for example I think the Excel Services REST API is very cool indeed, and as a SSAS client Excel 2010 is a big improvement on 2007 (which wasn’t all that bad either). I’ve decided I also need to learn Excel properly now as well – get to know all those advanced Excel functions, use Solver and all that. Once again two worlds are colliding: the Excel guys and the SSAS guys are going to have to learn a lot more about each others’ technologies for truly effective BI applications to get built.
Anyway, I think this post has gone on quite long enough now. As always, your comments on everything I’ve written here would be much appreciated.
So, like everyone else this week I was impressed with the Google Wave demo, and like everyone else in the BI industry had some rudimentary thoughts about how it could be used in a BI context. Certainly a collaboration/discussion/information sharing tool like Wave is very relevant to BI: Microsoft is of course heavily promoting Sharepoint for BI (although I don’t see it used all that much at my customers, and indeed many BI consultants don’t like using it because it adds a lot of extra complexity) and cloud-based BI tools like Good Data are already doing something similar. What it could be used for is one thing; whether it will actually gain any BI functionality is another and that’s why I was interested to see the folks at DSPanel not only blog about the BI applications of Wave:
…but also announce that their Performance Canvas product will support it:
It turns out that the Wave API (this article has a good discussion of it) makes it very easy for them to do this. A lot of people are talking about Wave as a Sharepoint-killer, and while I’m not sure that’s a fair comparison I think it’s significant that DSPanel, a company that has a strong history in Sharepoint and Microsoft BI, is making this move. It’s not only an intelligent, positive step for them, but I can’t help but wonder whether Microsoft’s encroachment onto DSPanel’s old market with PerformancePoint has helped spur them on. It’s reminiscent of how Panorama started looking towards SAP and Google after the Proclarity acquisition put them in direct competition with Microsoft…
Meanwhile, Google Squared has also gone live and I had a play with it yesterday (see here for a quick overview). I wasn’t particularly impressed with the quality of the data I was getting back in my squares though. Take the following search:
The first results displayed are very good, but then click Add Next Ten Items and take a look at the description for the TopCount function, or the picture for the VarianceP function:
That said, it’s still early days and of course it does a much better job with this search than Wolfram Alpha, which has no idea what MDX is and won’t until someone deliberately loads that data into it. I guess tools like Google Squared will return better data the closer we get to a semantic web.
I suppose what I (and everyone else) like about both of these tools is that they are different, they represent a new take on a problem, unencumbered by the past. With regard to Wave, a lot of people have been pointing out how Microsoft could not come up with something similar because they are weighed down by their investment in existing enterprise software and the existing way of doing things; the need to keep existing customers of Exchange, Office, Live Messenger etc happy by doing more of the same thing, adding more features, means they can’t take a step back and do something radically new. Take the example of how, after overwhelming pressure from existing SQL Server users, SQL Data Services has basically become a cloud-based, hosted version of SQL Server with all the limitations that kind of fudge involves. I’m sure cloud-based databases will one day be able to do all of the kind of things we can do today with databases, but I very much doubt they will look like today’s databases just running on the cloud. It seems like a failure of imagination and of nerve on the part of Microsoft.
It follows from what I’ve just said that while I would like to see some kind of cloud-based Analysis Services one day, I would be more excited by some radically new form of cloud-based database for BI. With all the emphasis today on collaboration and doing BI in Excel (as with Gemini), I can’t help but think that I’d like to see some kind of hybrid of OLAP and spreadsheets – after all, in the past they were much more closely interlinked. When I saw the demos of Fluidinfo on Robert Scoble’s blog I had a sense of this being something like what I’d want, with the emphasis more on spreadsheet than Wiki; similarly when I see what eXpresso is doing with Excel collaboration it also seems to be another part of the solution; and there are any number of other tools out that I could mention that do OLAP-y, spreadsheet-y type stuff (Gemini again, for example) that are almost there but somehow don’t fuse the database and spreadsheet as tightly as I’d like. Probably the closest I’ve seen anyone come to what I’ve got in mind is Richard Tanler in this article:
But even then he makes a distinction between the spreadsheet and the data warehouse. I’d like to see, instead of an Analysis Services cube, a kind of cloud-based mega-spreadsheet, parts of which I could structure in a cube-like way, that I could load data into, where only I could modify the cube-like structures containing the data, where I could define multi-dimensional queries and calculations in an MDX-y but also Excel-y and perhaps SQL-y type way – where a range or a worksheet also behaved like a table, and where multiple ranges or worksheets could be joined, where they could be stacked together into multidimensional structures, where they could even be made to represent objects. It would also be important that my users worked in essentially the same environment, accessing this data in what would in effect be their own part of the spreadsheet, entering their own data into other parts of it, and doing the things they love to do in Excel today with data either through formulas, tables bound to queries, pivot tables or charts. The spreadsheet database would of course be integrated into the rest of the online environment so users could take that data, share it, comment on it and collaborate using something like Wave; and also so that I as a developer could suck in data in from other cloud-based data stores and other places on the (semantic) web – for example being able to bind a Google Square into a range in a worksheet.
Ah well, enough dreaming. I’m glad I’ve got that off my chest: some of those ideas have been floating around my head for a few months now. Time to get on with some real work!
For the second year running I’m late celebrating my blog birthday (it was yesterday); my only excuse is that I’m still reeling from the amount I’ve eaten over the last two weeks. But four years of blogging… wow… it feels like ages.
And to a certain extent I feel that, after all this time, I’m running out of things to say here. The actual writing of blog entries isn’t a problem, it’s more the problem of having something to write about. Part of the problem is me: I don’t want to write about things I don’t find interesting so I haven’t gone down the route of turning the blog into a MDX tutorial (Bill Pearson does that much better than I ever could, and only Mosha could ever cover the advanced stuff properly), but at the same time I’m not coming across so many MDX/SSAS issues or obscure features as I used to. Part of the problem is, too, that SSAS2008 was so light on new features that it didn’t provide me with much new to write about. So I’m hoping that Gemini, Kilimanjaro, Excel 14, Azure etc will give me something to get my teeth into in 2009; I’m sure they will. If not, well, I’ve always wanted to spend some time getting into Mondrian and other open source BI technologies. And with the economy the way it is I suppose I’ll have a lot more spare time for learning new stuff in the coming twelve months…
But anyway, bear with me and keep reading! For those of you who have stuck with me for the last four years, thanks, and best wishes for 2009.
Here’s a way-out thought I had over the Xmas break for a new approach to building BI reports….
Have you, when you’ve asked a typical non-technical business user what they want a report to look like, asked them to draw a quick sketch? I do all the time – I find seeing what the user wants the report layout to look like is much the best way to understand what they want and for the user it’s the best way to express their requirements. So on the back of the proverbial envelope you’d get something like this:
…and then go back to your desk and write the query and design the report layout in something like SSRS. So – why can’t we cut out a step and go direct from the sketch to the report design? I can see two options for the first step here:
- Using a tablet PC you write some software that works a bit like OneNote, but where the user can draw the outline of a report freehand. Unfortunately tablet PCs just aren’t that common and the really tech-phobic end user wouldn’t feel comfortable using one.
- The user draws the report on paper and then the drawing gets scanned; definitely something the most computer illiterate manager would be comfortable with.
You would then take the freehand drawing and:
- Interpret the freehand lines into the borders of a table, and
- Interpret the text on columns and rows as either
- Explicit selections of members, or
- Set expressions
- Apply a lot of smarts to format the report in ways that conform to the best practices laid down by the likes of Stephen Few et al. Almost no business or IT people (me included) have any idea on how to format reports properly, and while you could argue that this might be intrusive I think users would appreciate a tool that did this for them.
- Apply a standard corporate template, with the appropriate logos etc in place.
Working out what the borders of a table should be from a freehand drawing must be possible (although implementation would be well beyond me). Interpreting what the user has written they want on columns and rows would present more problems:
- Handwriting recognition is notoriously difficult to do well, and usually the software needs to have a bit of practice to get good. On the other hand, in this particular scenario it should be easier because the user won’t be writing just any old text. For instance, if we assume that we’re working with an Analysis Services data source, we know that any text is either going to be the name of a member or something that will resolve to a set expression; we also know, for instance, if there are what look like multiple member names on the same position on the same axis they will all have to come from the same hierarchy.
- Resolving text to member names is all very well, but turning sentences into set expressions would be trickier. It seems reasonable to think that phrases like "Products where sales is greater than £100" could be interpreted effectively, but whether your average business user can write something as clear as that is debatable. Any tool would certainly have to prompt the user to confirm that the interpretations it has made was correct, and do to that it would have to resort to the kind of techy interface that the tool is trying to get away from.
- Similarly there are going to be ambiguities that need to be resolved when looking at a the design. For example, the drawing might have the years 2006, 2007 and 2008 on columns. But does this mean the report should always have these three years on columns, or the last three years with data, or something else?
So it certainly wouldn’t work like magic, but at the same time I think it would offer some advantages over current report design tools, the designers of which have fallen into the trap of building a UI on top of the functionality they’ve got available in MDX or SQL, rather than building a UI for what the user actually wants to do. After all, don’t you think that it’s actually very difficult to lay out anything other than the most simplistic reports in most report design tools, compared to how easy it would be to draw the report?
A couple of interesting (and possibly BI-related) technologies are coming soon from Microsoft:
- I see today via Nick Carr that soon we’ll be able to run Windows and SQL Server on Amazon Elastic Compute Cloud (EC2). I wonder if that includes Analysis Services too? If so, that would be handy.
- Also announced today, the new MS cloud operating system, coming within the month
- Windows High Performance Computing (HPC) server is due to launch at the beginning of November (see here for a news report on it, here and here for some details). Hmm, I see SQL Server is listed on the ‘supported applications’ page… surely there’s got to be some kind of tie-in here with the whole MatrixDB/DATAllegro MPP stuff?