Archive for the ‘BI’ Category
I read an interesting article by Stephen Swoyer today on the TWDI site today, about a new Gartner report that suggests that companies should start selling the data they collect for BI purposes to third parties via public data marketplaces. This is a subject I’ve seen discussed a few times over the last year or so – indeed, I remember at the PASS Summit last year I overheard a member of the Windows Azure Marketplace dev team make a similar suggestion – and I couldn’t resist the opportunity to weigh in with my own thoughts on the matter.
The main problem that I had with the article is that it didn’t explore any of the reasons why companies would not want to sell the data they’re collecting in a public data marketplace. Obviously there are a lot of hurdles to overcome before you could sell any data: you’d need to make sure you weren’t selling your data to your competitors, for example; you’d need to make sure you weren’t breaking any data privacy laws with regard to your customers; and of course it would have to be financially worth your while to spend time building and maintaining the systems to extract the data and upload it to the marketplace – you’d need to be sure someone would actually want to buy the data you’re collecting at a reasonable price. Doing all of this would take a lot of time and effort. The main hurdle though, I think, would be disinterest: why would a company whose primary business is something else start up a side-line selling its internal data? It has better things to be spending its time doing, like focusing on its core business. If you sell cars or operate toll roads why are you going to branch out into selling data, especially when the revenue you’ll get from doing this is going to be relatively trivial in comparison?
What’s more, I think it’s a typical piece of tech utopianism to think that data will sell itself if you just dump it on a public data marketplace. Maybe apps on the Apple App Store can be sold in this way, but just about everything else in the world, whether it’s sold on the internet or face-to-face, needs to be actively marketed and this is something that the data generators themselves are not going to want to make the effort to do. As I said earlier, those companies that are interested in selling their data will still need to be careful about who they sell to, and the number of potential buyers for their particular data is in any case going to be limited. Someone needs to think about what the data can be used for, target potential customers and then show these potential customers how the data can be used to improve their bottom line.
For example, imagine if all the hotels around the Washington State Convention Centre were to aggregate and sell information on their bookings for the next six months into the future to all the nearby retailers and restaurants, so it was possible for them to predict when the centre of Seattle would be full of wealthy IT geeks in town for a Microsoft conference and therefore plan staffing and purchasing decisions appropriately. In these cases a middle man would be required to seek out the potential buyer and broker the deal. The guy that owns the restaurant by the convention centre isn’t going to know about this data unless someone tells him it’s available and convinces him it will be useful. And just handing over the data it isn’t really good enough either – it needs to be used effectively to prove its value, and the only companies who’ll be able to use this data effectively will be the ones who’ll be able to integrate it with their existing BI systems, even if that BI system is the Excel spreadsheet that the small restaurant uses to plan its purchases over the next few weeks. Which of course may well require outside consultancy… and when you’ve got to this point, you’re basically doing all of the same things that most existing companies in the market research/corporate data provider space do today, albeit on a much smaller scale.
I don’t want to seem too negative about the idea of companies selling their data, though. I know, as a BI consultant, that there is an immense amount of interesting data now being collected that has real value to companies other than the ones that have collected it. Rather than companies selling their own data, however, what I think we will see instead is an expansion in the number of intermediary companies who sell data (most of which will be very small), and much greater diversity in the types of data that they sell. Maybe this is an interesting opportunity for BI consultancies to diversify into – after all, we’re the ones who know which companies have good quality data, and who are already building the BI systems to move it around. Do public data marketplaces still have a role to play? I think they do, but they will end up being a single storefront for these small, new data providers to sell data in the same way that eBay and Amazon Marketplace act as a single storefront for much smaller companies to sell second-hand books and Dr Who memorabilia. It’s going to be a few years before this ecosystem of boutique data providers establishes itself though, and I suspect that the current crop of public data marketplaces will have died off before this happens.
By now you’ve probably already seen that the new Office Store, where you can get hold of apps for Office and Sharepoint, is now open. If you haven’t, check out the following blog posts:
The implications for BI are obvious: new apps for data visualisation (along the lines of what’s available in Sparklines for Excel maybe; perhaps also the long-lost decomposition tree from Proclarity?), analysis, importing and exporting data. I’ve already downloaded and had a play with the Bubbles app, which is quite fun:
Will it take off? Who knows; it’ll certainly be a while before enough people are on Office 2013 before we can tell. Will anyone want to pay for apps? Again, who knows – I wonder if we’ll see something similar to OLAP PivotTable Extensions appear, and if free, open source apps will kill the paid app market at least in some areas? If you’ve got any ideas for a BI-related app, please leave a comment!
Jamie Thomson and I share a number of… obscure enthusiasms. For instance, last week when he spotted the new forms/surveys feature in the Excel 2013 Web App (see here for a mention) he knew I’d be excited. And I was. Excited enough to devote a whole blog post to them.
What is this feature? Basically a rip-off of homage to the Google docs functionality I mentioned here that allows you to create simple questionnaires and save the data back to a spreadsheet. To use it you need to create a new Excel spreadsheet in the Excel Web App (I can’t seem to find it in desktop Excel and it may not even exist there), then click on Form/New Form in the ribbon:
This opens a new dialog where you can create your form/survey:
It’s all pretty self-explanatory from there, you just enter a title and description and then some questions, which can be various types (returning text, numbers, multiple choices etc):
You can then answer the questions yourself or send a link out to other people so they can too:
If you’d like to take the survey you can do so here btw.
The data then lands in a table in the original Excel spreadsheet, ready for you to do something useful with it:
For my next trick, and to go back to another issue that Jamie and I have been moaning about for years, I would have liked to consume the data in this table via an OData feed as detailed here:
Unfortunately I couldn’t get it to work. Whether this is a temporary problem or a limitation with Office 365 (as opposed to on-prem Sharepoint) I don’t know… if someone knows how to make it work, though, I’d be much obliged if you could leave a comment.
UPDATE: First of all, if you can’t see the survey don’t worry – the service seems to be very unreliable. Secondly I’ve got the OData feed working now and will blog about it later.
As you may already have read, the first public preview for Office 2013 is now available and there’s lots of new BI functionality in there – see here for an overview. Here’s a quick summary of what the two really important changes are:
- PowerPivot has been integrated into Excel, kind of. This means that the xVelocity (aka Vertipaq) engine is now native to Excel, and you can do all the basic PowerPivot stuff like loading vast amounts of data from multiple data sources and querying it via PivotTables directly in Excel, without installing any extra addins. PowerPivot does still exist as an optional extra however: you need it if you want to use the more advanced functionality that exists in PowerPivot today, such as filtering data before import, using diagram view, defining hierarchies and perspectives and so on.
- Power View has also been integrated into Excel: Power View reports become a new type of sheet inside a workbook, and you can use it against data held in the integrated xVelocity/PowerPivot engine; I’m not clear yet whether it will work on a SSAS 2012 Tabular model (and at some point a SSAS Multidimensional model, once support for DAX on Multidimensional models arrives) but I hope it does. No more need to moan about Power View being tied to Sharepoint!
There are a whole bunch of other BI-related changes in Excel which I’ll try to summarise in another post soon (stuff like the suggestions for charts and PivotTables, flash fill, timeline slicer). However I think that the two changes above represent a master-stroke on the part of Microsoft: they make Excel 2013 a serious contender in the self-service BI tool stakes. Certainly, other vendors will be quick to point out the features they have and that Excel doesn’t, and dedicated BI vendors will always be able to add new features faster and more frequently than Excel, but that’s not the point. It won’t happen overnight but at some point every company will upgrade to Office 2013 and when they do, all users will have a BI tool on their desktops which is vastly more capable than Excel today and will be good enough for the majority of BI scenarios – which means that the need to even look at third party tools will disappear.
UPDATE A few clarifications:
- The PowerPivot addin, while still an addin, comes bundled with Excel – there’s no separate download
- As the comments below show, and I can confirm, Power View does work with SSAS 2012 Tabular models
I’ll be posting more details throughout the day on Twitter as I play with the new preview, and will post something more substantial here later
In my last post I showed how to load data from Excel into Layerscape, the new data visualisation tool from Microsoft Research; in the post before that I showed how to load UK weather data from Windows Azure Datamarket into PowerPivot. Now let’s take that weather data and plot it on a map!
When doing this, the first decision I made with this data was about what to show. The main problem is that the data volumes involved here are very close to the upper limit of what Layerscape is able to handle: in a few cases I crashed Layerscape, but that was when I was trying to load around 150000 rows of data into it; I found that just over 110000 rows of data was ok however. As a result I made the decision to only show data for cloud or rain, not sun (which is ok – if you want to visualise a clear day, you don’t want to show anything on a map I think) or mist or fog (which I was less happy about).
To achieve this I created a calculated column on my Three Hourly Forecast table in PowerPivot called Cloud with the following definition:
=if([SignificantWeatherId]>=7, TRUE, BLANK())
Then, after having created a flattened PivotTable with the data I wanted to display, I dropped the Cloud field into the Values box in my PivotTable and was able to filter it to only show Weather Stations and time periods where there was cloud:
I also created a few other calculated columns:
- ActualPredictionForStart and ActualPredictionForEnd: the former is mentioned in my previous post, and the latter returns a value three hours after the former:
=[Date] + (([TimeStep]+3)/24)
These two values represent the start time and the end time for each row in the Three Hourly Forecast table.
- WeatherDepth: in the Significant Weather table there’s a distinction made between low-altitude cloud and high altitude cloud, and in LayerScape when you’re plotting data you can control how high off the ground a point is displayed, so the WeatherDepth column contains some fairly arbitrary numbers for cloud altitudes based on the [Code] column. Here’s the definition:
=SWITCH([Code], 5, 150, 6, 150, 7, 100, 8, 120, 10)
- WeatherColor: again, in Layerscape you can control the colour of your points and their opacity, which again is useful for displaying dark/light and thin/thick clouds. The only compromise I had to make was to display dark clouds (ie where clouds are described as either black or where the weather is rainy or snowy) in blue rather than black, because I couldn’t get Layerscape to display black – white clouds are shown in white. To calculate WeatherColor I created two intermediate calculated columns on the Significant Weather table, InitialColor (which contains colours for weather that I’m not displaying in this demo – I thought dust storms should be shown in yellow for instance):
=SWITCH([Code], 1, "yellow", 2, "white", 3, "yellow", 4, "yellow", 5, "white", 6, "white", 7, "white", 8, "blue", "blue")
=SWITCH([Code], -99, 0, 0, 0, 1,10, 2, 25, 3, 10, 4, 100, 5, 25, 6, 75, 7, 50, 8, 50, 9, 60, 10, 60, 11, 60, 12, 65, 13, 75, 14, 75, 15, 80, 90)
…making the definition of WeatherColor:
=[Opacity] & "% " & [InitialColor]
- Depth and Color: calculated columns on the Three Hourly Forecast table that copied the WeatherDepth and WeatherColor values down to the main fact table:
This screenshot gives you some idea of the values that the depth and colour calculations return:
With this all done I was able to load the data into Layerscape in the way I showed in my previous post, tweak some of the settings for the markers and time decay, and come up with a nice-looking visualisation. The big difference in this case compared to my previous examples is that here we have time series data and Layerscape is able to show values changing over time. The only thing I needed to do to make this happen was to check the Time Series box in the Layers pane on the main screen; with this done I could show the data for a particular point in time or let Layerscape cycle through time showing how the forecast weather changed.
I also created a video showing the tour:
I’m quite happy to admit that this tour doesn’t show off the full capabilities of Layerscape (I’m not sure my graphics card is up to the job, frankly) and I’ve not taken much time to ensure that the visualisation is as accurate as it could be, but I’m still quite proud if it!
I had an interesting chat with Gary Short at SQLBits last week about social network analysis, and he pointed out that Microsoft Research’s graph database, Project Trinity, was now available to download. You can find out more about it here:
Gary suggested that Trinity might, at some point in the future, follow F# in being a MS Research project that becomes a commercial project and to be honest I think he might be right. If Analysis Services is a database that is optimised for multidimensional data, why not add Trinity to the SQL Server stack as a specialised database for graph data? After all, social network analysis is only going to become more and more important in corporate BI in the future, and that’s just one use case for a graph database. And if Trinity becomes a product, why not include features of a tool like NodeXL in Project Crescent?
It was only a matter of time, but here’s what I think is the first use of Kinect for Microsoft BI:
OK so it’s very, very basic and only works on Dynamics Business Analyzer, but it shows what could be done. Maybe something to think about for a BI Power Hour presentation on Crescent? Although until the day comes that we all work in CSI-style labs with large screens to wave our arms around in front of, I seriously doubt it would have any practical use. Anyway, if you wanted to do something truly cool you’d not even bother moving at all and build your reports using the power of your mind – I saw Guy Smith-Ferrier do a very entertaining session called “Mind Control Your Computer in C#” recently, so the idea isn’t as far-fetched as it seems!
My post about Sharepoint and the MS BI strategy last month generated a lot of debate; an equally hot topic is that of Microsoft’s mobile BI strategy, or lack of it. It’s something I’ve heard a lot of people sounding off about recently which is why I found the this article interesting:
Apart from a diplomatic quote from Donald Farmer in his current guise as QlikView evangelist-in-chief, the article picks up on a thread from the Microsoft BI Facebook page and perhaps tries to read too much into it…
It’s been clear for a while now that Sharepoint is at the heart of Microsoft’s BI strategy. The first sign was the way PerformancePoint was touted as a replacement for Proclarity. Then came the news that if you wanted to share models between multiple users in PowerPivot, you needed Sharepoint 2010 Enterprise Edition. In Denali, if you want to use cool new stuff like Crescent or the new SSRS alerting functionality you need Sharepoint. But is this a good thing for Microsoft BI? Behind the scenes a lot of people have been debating this question for a long time, so I thought it was an appropriate subject for a blog post – and you know how I like controversial topics…!
Let me start by saying that I have I have an opinion on this but not one that I feel 100% sure of asserting: I have the nagging feeling that my own view of the market is too limited to know whether the Sharepoint strategy is good or not, so my mind isn’t completely made up (and in fact the more I think about this issue, the more unsure about my opinion I am). Also, I don’t think anyone has objections to the purely technical reasons for the Sharepoint strategy – after all, why should the various Microsoft BI teams be in the business of building portals when Microsoft has its own, extremely successful portal they can integrate with, which gives them a lot of rich functionality for free? The question is essentially a commercial one: will more customers buy Microsoft BI as a result of increased integration with Sharepoint (in turn leading to Microsoft and its partners making more money, which is all anyone really cares about), or will a Sharepoint dependency actually put customers off and drive them into the arms of Microsoft’s competitors?
The argument in favour of the Sharepoint strategy goes something like this:
- Microsoft’s BI products need portal functionality. Time and money for development of BI products is limited, so if the portal functionality can be got from Sharepoint then it can be delivered quicker, at a lower cost, and with time and money left over for other new functionality that would not be possible otherwise. More and better functionality means the customer is more likely to buy.
- Integrating with Sharepoint also gives the wider Microsoft BI offering a coherence it wouldn’t otherwise have (and something it has historically lacked), and the whole ends up being greater than its constituent parts. This lack of overlapping functionality looks good in front of the customer, and also increases the opportunity to cross-sell BI to existing Sharepoint customers and vice versa.
- Sharepoint is massively successful, one of Microsoft’s leading server products, so most customers you want to sell BI to will have Sharepoint anyway; therefore there will be little resistance to buying new tools that have a dependency on Sharepoint. The Sharepoint market is so large that even if only a small percentage of it is interested in or able to use MS BI, that’s still a massive potential market.
- Presumably, at some point there will be a fully-featured “Sharepoint in the cloud” with all the BI features baked in, which means that it will be even easier for companies to adopt it.
- Microsoft is well aware of the arguments against Sharepoint that are listed below, and because it wants the Sharepoint strategy to work it is taking action to address these problems of cost, complexity and uptake. One example is the increasing number of Microsoft BI appliances that are available, where all of the tough configuration decisions are made for you.
The argument against is this:
- Sharepoint is expensive (or at least perceived as expensive) in terms of license costs, infrastructure and administration, so it makes the overall MS BI solution more expensive to have a dependency on it.
- Sharepoint is a complex product (or at least perceived as complex), and Microsoft’s BI tools are pretty complex as well; integrating the two makes something even more complex. As a result, whereas in the past a single BI guy could just install SSAS, SSRS and so on on a server, now you need a BI guy and a Sharepoint guy to do all the setup and admin, which doubles the cost of labour; the added complexity also makes it more likely that the setup and admin will take longer. Microsoft BI products have traditionally seen a lot of their adoption come from internal IT departments taking the ‘it’s effectively free, so let’s install it somewhere and see what it does’ path, and this will become much less common because of the added overhead of Sharepoint.
- The added dependencies between Sharepoint and BI could actually make it slower to deliver new features because now there are multiple MS dev teams that need to work together, co-ordinate functionality and release cycles, and deal with conflicting priorities. History has shown that MS dev teams don’t always do this well (think of Excel and SSAS support), and even when they do some compromises are inevitable.
- Many customers do have Sharepoint, but not all of them have the editions or versions that the MS BI stack requires. And very often due to political divisions, an internal corporate Sharepoint team have their own agenda to follow which has no place for BI, and aren’t interested in upgrading to a certain version or otherwise accommodating the BI team when it might impact on their own goals.
- Some customers do not have Sharepoint and have made a conscious decision not to have it; these customers include not only the die-hard anything-but-Microsoft shops but also some who would be interested in a solution with fewer dependencies. For these customers, a Sharepoint dependency removes all question of the use of MS BI.
- The MS partner ecosystem, at least at the mid-level, is segregated into BI partners and Sharepoint partners, and while there’s a certain amount of convergence you still tend to find that many consulting companies are BI partners who do a bit of Sharepoint on the side or Sharepoint partners who do a bit of BI on the side, so not all of them are capable of selling or implementing an overarching BI-Sharepoint solution.
The nature of my work means that I get to see a lot of different Microsoft BI implementations, probably more than the average consultant. I reckon I work with around 30-40 different customers every year, ranging in size from one-man-bands to the biggest enterprises, and in the five or so years I’ve been in business I’ve only ever seen a relatively small number who actively use Sharepoint in combination with the rest of the Microsoft BI stack. If you work for a large partner that specialises in and actively sells Microsoft BI and Sharepoint you may have seen much greater use of Sharepoint than I have, and if you work for a specialist Sharepoint partner I dare say you only ever work with customers who are very committed to Sharepoint, so I’ll admit my point of view is biased. On the other hand I can’t deny the evidence of my own experience and as a result my natural inclination is to be slightly sceptical about the Sharepoint BI strategy, because I don’t see any basis for the claims that Sharepoint is a ubiquitous platform and one that users actively want to integrate with BI. I’d also add that a couple of years ago I was equally sceptical about Excel’s central role in the Microsoft’s SSAS and wider BI strategy, but now I see Excel used successfully on a wide range of SSAS projects and I’m very much more pro-Excel (although I’m not blind to Excel’s continuing shortcomings as an SSAS client tool for more advanced users). Maybe in a year or two’s time all my customers really will have Sharepoint, the Sharepoint strategy will bear fruit, and my fears will have been proved groundless.
So… what do you think about all this? What are your experiences with Sharepoint, do you have it in-house already or (if you’re a consultant) will you be able to sell a BI platform based on it? Please leave a comment…
So day one of the PASS Summit is drawing to a close, and what a day it’s been. I did a session myself today, which went well (I think), but the real news is all the cool new stuff that was announced at the keynote and at sessions throughout the day and that’s what I’d like to reflect on here. A lot of questions about what we’ve seen today remain unanswered (more will be revealed tomorrow, I’m told) but I thought I’d blog about what interested me and was relevant to BI.
All in all, I’m very excited. Let’s face it – over the last five years, in SSAS and in the wider BI stack, there’s been a distinct lack of anything really radical and new. But today saw several announcements that will completely change the Microsoft BI stack:
- There will be column-store indexes, provided by the Vertipaq engine, inside the SQL Server relational database. Simon Sabin has a link to more details here.
- There’s a new, corporate BI version of PowerPivot, BISM
- Project Crescent is a new ad hoc query tool coming from the Reporting Services team
Let’s consider what each of these means in turn. Based on the limited information we’ve got so far, column-store indexes in SQL Server massively increase the performance of reporting/OLAP style queries inside the relational engine; if we get column-store indexes in Parallel Data Warehouse, well, I can imagine we’ll get astounding performance over huge data volumes. The pdf linked to in Simon Sabin’s blog says:
Users who were using OLAP systems only to get fast query performance, but who prefer to use the T-
SQL language to write queries, may find they can have one less moving part in their environment,
reducing cost and complexity. Users who like the sophisticated reporting tools, dimensional modeling
capability, forecasting facilities, and decision-support specific query languages that OLAP tools offer can
continue to benefit from them. Moreover, they may now be able to use ROLAP against a columnstore-
indexed SQL Server data warehouse, and meet or exceed the performance they were used to in the past
with OLAP, but save time by eliminating the cube building process.
To paraphrase, if your data’s in SQL Server that’s where it should stay for querying and reporting – MOLAP no longer provides any performance benefit, so the concept of ‘processing’ a cube to get data into a different, OLAP database is gone. If you still want to do OLAP on SQL Server it will be a form of ROLAP, and to me this makes a lot of sense.
Very few details about BISM, the ‘corporate’ version of Powerpivot/Vertipaq, were released but we saw that we could develop BISM models in Visual Studio and the end result was exactly what PowerPivot creates when you deploy a PowerPivot mode to Sharepoint – I guess it’s basically a different type of Analysis Services database. BISM can either hold the data itself in its own Vertipaq store (useful when the data comes from anywhere other than SQL Server, eg Excel, Oracle, text files and so on) or it can act purely as a semantic layer and allow querying data in SQL Server in a ROLAP, multidimensional way. So we’ll be able to create an Excel pivot table, connect to BISM and use that as a thin layer to query data that’s directly in SQL Server. This is clearly what the future of Analysis Services is going to be – while the Analysis Services we know and love today might hang around for a few versions, it’s not got a long-term future in my opinion.
There was a full session on Project Crescent later on today, which I went to, so I have more details on this. It’s essentially a new ad hoc query tool being built by the SSRS team – interestingly it seems to have nothing to do with the rest of Reporting Services, and it doesn’t create rdl files that can be edited in Report Builder or Report Designer. It’s all about flashy visualisation and interactivity and reminds me a lot of Tableau and other fashionable BI tools; I can also see how it can be a replacement for PerformancePoint for creating dashboards. It is only surfaced in Sharepoint (boo! yet another Sharepoint dependency!) and is built in Silverlight; also, it can only work with data sourced from BISM/PowerPivot models. Once you get past the flashy stuff it does much the same that every other ad hoc query tool has been doing since the year dot (Marco, sitting next to me, commented that it was doing much the same thing that Data Analyzer was doing 10 years ago) but the flashy stuff is very flashy indeed, and very impressive – for instance the ability to export views out to PowerPoint slides looks cool; but I do wonder whether it will be as practically useful as something like Tableau so we’ll have to wait and see. I’m pleased to see that someone at MS has finally woken up to the fact that end users might want to use something other than Excel for ad hoc querying.
OK, time for a few drinks and to have some fun! There’ll be another post with more news tomorrow…