Archive for the ‘BI’ Category
Self-Service BI Mapping with Microsoft Research’s Layerscape–Part 2
In my last post I showed how to load data from Excel into Layerscape, the new data visualisation tool from Microsoft Research; in the post before that I showed how to load UK weather data from Windows Azure Datamarket into PowerPivot. Now let’s take that weather data and plot it on a map!
When doing this, the first decision I made with this data was about what to show. The main problem is that the data volumes involved here are very close to the upper limit of what Layerscape is able to handle: in a few cases I crashed Layerscape, but that was when I was trying to load around 150000 rows of data into it; I found that just over 110000 rows of data was ok however. As a result I made the decision to only show data for cloud or rain, not sun (which is ok – if you want to visualise a clear day, you don’t want to show anything on a map I think) or mist or fog (which I was less happy about).
To achieve this I created a calculated column on my Three Hourly Forecast table in PowerPivot called Cloud with the following definition:
=if([SignificantWeatherId]>=7, TRUE, BLANK())
Then, after having created a flattened PivotTable with the data I wanted to display, I dropped the Cloud field into the Values box in my PivotTable and was able to filter it to only show Weather Stations and time periods where there was cloud:
I also created a few other calculated columns:
- ActualPredictionForStart and ActualPredictionForEnd: the former is mentioned in my previous post, and the latter returns a value three hours after the former:
=[Date] + (([TimeStep]+3)/24)
These two values represent the start time and the end time for each row in the Three Hourly Forecast table. - WeatherDepth: in the Significant Weather table there’s a distinction made between low-altitude cloud and high altitude cloud, and in LayerScape when you’re plotting data you can control how high off the ground a point is displayed, so the WeatherDepth column contains some fairly arbitrary numbers for cloud altitudes based on the [Code] column. Here’s the definition:
=SWITCH([Code], 5, 150, 6, 150, 7, 100, 8, 120, 10) - WeatherColor: again, in Layerscape you can control the colour of your points and their opacity, which again is useful for displaying dark/light and thin/thick clouds. The only compromise I had to make was to display dark clouds (ie where clouds are described as either black or where the weather is rainy or snowy) in blue rather than black, because I couldn’t get Layerscape to display black – white clouds are shown in white. To calculate WeatherColor I created two intermediate calculated columns on the Significant Weather table, InitialColor (which contains colours for weather that I’m not displaying in this demo – I thought dust storms should be shown in yellow for instance):
=SWITCH([Code], 1, "yellow", 2, "white", 3, "yellow", 4, "yellow", 5, "white", 6, "white", 7, "white", 8, "blue", "blue")
…and Opacity:
=SWITCH([Code], -99, 0, 0, 0, 1,10, 2, 25, 3, 10, 4, 100, 5, 25, 6, 75, 7, 50, 8, 50, 9, 60, 10, 60, 11, 60, 12, 65, 13, 75, 14, 75, 15, 80, 90)
…making the definition of WeatherColor:
=[Opacity] & "% " & [InitialColor] - Depth and Color: calculated columns on the Three Hourly Forecast table that copied the WeatherDepth and WeatherColor values down to the main fact table:
=RELATED(‘Significant Weather’[WeatherDepth])
=RELATED(‘Significant Weather’[WeatherColor])
This screenshot gives you some idea of the values that the depth and colour calculations return:
With this all done I was able to load the data into Layerscape in the way I showed in my previous post, tweak some of the settings for the markers and time decay, and come up with a nice-looking visualisation. The big difference in this case compared to my previous examples is that here we have time series data and Layerscape is able to show values changing over time. The only thing I needed to do to make this happen was to check the Time Series box in the Layers pane on the main screen; with this done I could show the data for a particular point in time or let Layerscape cycle through time showing how the forecast weather changed.
With that done, was able to save my demo as a very basic guided tour and upload it to the Layerscape site from where you can download it yourself :
http://www.layerscape.org/Content/Index/620
I also created a video showing the tour:
I’m quite happy to admit that this tour doesn’t show off the full capabilities of Layerscape (I’m not sure my graphics card is up to the job, frankly) and I’ve not taken much time to ensure that the visualisation is as accurate as it could be, but I’m still quite proud if it!
Project Trinity
I had an interesting chat with Gary Short at SQLBits last week about social network analysis, and he pointed out that Microsoft Research’s graph database, Project Trinity, was now available to download. You can find out more about it here:
http://research.microsoft.com/en-us/projects/trinity/default.aspx
Gary suggested that Trinity might, at some point in the future, follow F# in being a MS Research project that becomes a commercial project and to be honest I think he might be right. If Analysis Services is a database that is optimised for multidimensional data, why not add Trinity to the SQL Server stack as a specialised database for graph data? After all, social network analysis is only going to become more and more important in corporate BI in the future, and that’s just one use case for a graph database. And if Trinity becomes a product, why not include features of a tool like NodeXL in Project Crescent?
Kinect for BI?
It was only a matter of time, but here’s what I think is the first use of Kinect for Microsoft BI:
OK so it’s very, very basic and only works on Dynamics Business Analyzer, but it shows what could be done. Maybe something to think about for a BI Power Hour presentation on Crescent? Although until the day comes that we all work in CSI-style labs with large screens to wave our arms around in front of, I seriously doubt it would have any practical use. Anyway, if you wanted to do something truly cool you’d not even bother moving at all and build your reports using the power of your mind – I saw Guy Smith-Ferrier do a very entertaining session called “Mind Control Your Computer in C#” recently, so the idea isn’t as far-fetched as it seems!
What is Microsoft’s mobile BI strategy?
My post about Sharepoint and the MS BI strategy last month generated a lot of debate; an equally hot topic is that of Microsoft’s mobile BI strategy, or lack of it. It’s something I’ve heard a lot of people sounding off about recently which is why I found the this article interesting:
http://www.v3.co.uk/v3-uk/the-frontline-blog/2102901/wheres-mobile-bi-strategy-microsoft
Apart from a diplomatic quote from Donald Farmer in his current guise as QlikView evangelist-in-chief, the article picks up on a thread from the Microsoft BI Facebook page and perhaps tries to read too much into it…
The question of Sharepoint and the Microsoft BI Strategy
It’s been clear for a while now that Sharepoint is at the heart of Microsoft’s BI strategy. The first sign was the way PerformancePoint was touted as a replacement for Proclarity. Then came the news that if you wanted to share models between multiple users in PowerPivot, you needed Sharepoint 2010 Enterprise Edition. In Denali, if you want to use cool new stuff like Crescent or the new SSRS alerting functionality you need Sharepoint. But is this a good thing for Microsoft BI? Behind the scenes a lot of people have been debating this question for a long time, so I thought it was an appropriate subject for a blog post – and you know how I like controversial topics…!
Let me start by saying that I have I have an opinion on this but not one that I feel 100% sure of asserting: I have the nagging feeling that my own view of the market is too limited to know whether the Sharepoint strategy is good or not, so my mind isn’t completely made up (and in fact the more I think about this issue, the more unsure about my opinion I am). Also, I don’t think anyone has objections to the purely technical reasons for the Sharepoint strategy – after all, why should the various Microsoft BI teams be in the business of building portals when Microsoft has its own, extremely successful portal they can integrate with, which gives them a lot of rich functionality for free? The question is essentially a commercial one: will more customers buy Microsoft BI as a result of increased integration with Sharepoint (in turn leading to Microsoft and its partners making more money, which is all anyone really cares about), or will a Sharepoint dependency actually put customers off and drive them into the arms of Microsoft’s competitors?
The argument in favour of the Sharepoint strategy goes something like this:
- Microsoft’s BI products need portal functionality. Time and money for development of BI products is limited, so if the portal functionality can be got from Sharepoint then it can be delivered quicker, at a lower cost, and with time and money left over for other new functionality that would not be possible otherwise. More and better functionality means the customer is more likely to buy.
- Integrating with Sharepoint also gives the wider Microsoft BI offering a coherence it wouldn’t otherwise have (and something it has historically lacked), and the whole ends up being greater than its constituent parts. This lack of overlapping functionality looks good in front of the customer, and also increases the opportunity to cross-sell BI to existing Sharepoint customers and vice versa.
- Sharepoint is massively successful, one of Microsoft’s leading server products, so most customers you want to sell BI to will have Sharepoint anyway; therefore there will be little resistance to buying new tools that have a dependency on Sharepoint. The Sharepoint market is so large that even if only a small percentage of it is interested in or able to use MS BI, that’s still a massive potential market.
- Presumably, at some point there will be a fully-featured “Sharepoint in the cloud” with all the BI features baked in, which means that it will be even easier for companies to adopt it.
- Microsoft is well aware of the arguments against Sharepoint that are listed below, and because it wants the Sharepoint strategy to work it is taking action to address these problems of cost, complexity and uptake. One example is the increasing number of Microsoft BI appliances that are available, where all of the tough configuration decisions are made for you.
The argument against is this:
- Sharepoint is expensive (or at least perceived as expensive) in terms of license costs, infrastructure and administration, so it makes the overall MS BI solution more expensive to have a dependency on it.
- Sharepoint is a complex product (or at least perceived as complex), and Microsoft’s BI tools are pretty complex as well; integrating the two makes something even more complex. As a result, whereas in the past a single BI guy could just install SSAS, SSRS and so on on a server, now you need a BI guy and a Sharepoint guy to do all the setup and admin, which doubles the cost of labour; the added complexity also makes it more likely that the setup and admin will take longer. Microsoft BI products have traditionally seen a lot of their adoption come from internal IT departments taking the ‘it’s effectively free, so let’s install it somewhere and see what it does’ path, and this will become much less common because of the added overhead of Sharepoint.
- The added dependencies between Sharepoint and BI could actually make it slower to deliver new features because now there are multiple MS dev teams that need to work together, co-ordinate functionality and release cycles, and deal with conflicting priorities. History has shown that MS dev teams don’t always do this well (think of Excel and SSAS support), and even when they do some compromises are inevitable.
- Many customers do have Sharepoint, but not all of them have the editions or versions that the MS BI stack requires. And very often due to political divisions, an internal corporate Sharepoint team have their own agenda to follow which has no place for BI, and aren’t interested in upgrading to a certain version or otherwise accommodating the BI team when it might impact on their own goals.
- Some customers do not have Sharepoint and have made a conscious decision not to have it; these customers include not only the die-hard anything-but-Microsoft shops but also some who would be interested in a solution with fewer dependencies. For these customers, a Sharepoint dependency removes all question of the use of MS BI.
- The MS partner ecosystem, at least at the mid-level, is segregated into BI partners and Sharepoint partners, and while there’s a certain amount of convergence you still tend to find that many consulting companies are BI partners who do a bit of Sharepoint on the side or Sharepoint partners who do a bit of BI on the side, so not all of them are capable of selling or implementing an overarching BI-Sharepoint solution.
The nature of my work means that I get to see a lot of different Microsoft BI implementations, probably more than the average consultant. I reckon I work with around 30-40 different customers every year, ranging in size from one-man-bands to the biggest enterprises, and in the five or so years I’ve been in business I’ve only ever seen a relatively small number who actively use Sharepoint in combination with the rest of the Microsoft BI stack. If you work for a large partner that specialises in and actively sells Microsoft BI and Sharepoint you may have seen much greater use of Sharepoint than I have, and if you work for a specialist Sharepoint partner I dare say you only ever work with customers who are very committed to Sharepoint, so I’ll admit my point of view is biased. On the other hand I can’t deny the evidence of my own experience and as a result my natural inclination is to be slightly sceptical about the Sharepoint BI strategy, because I don’t see any basis for the claims that Sharepoint is a ubiquitous platform and one that users actively want to integrate with BI. I’d also add that a couple of years ago I was equally sceptical about Excel’s central role in the Microsoft’s SSAS and wider BI strategy, but now I see Excel used successfully on a wide range of SSAS projects and I’m very much more pro-Excel (although I’m not blind to Excel’s continuing shortcomings as an SSAS client tool for more advanced users). Maybe in a year or two’s time all my customers really will have Sharepoint, the Sharepoint strategy will bear fruit, and my fears will have been proved groundless.
So… what do you think about all this? What are your experiences with Sharepoint, do you have it in-house already or (if you’re a consultant) will you be able to sell a BI platform based on it? Please leave a comment…
PASS Summit 2010 Day 1
So day one of the PASS Summit is drawing to a close, and what a day it’s been. I did a session myself today, which went well (I think), but the real news is all the cool new stuff that was announced at the keynote and at sessions throughout the day and that’s what I’d like to reflect on here. A lot of questions about what we’ve seen today remain unanswered (more will be revealed tomorrow, I’m told) but I thought I’d blog about what interested me and was relevant to BI.
All in all, I’m very excited. Let’s face it – over the last five years, in SSAS and in the wider BI stack, there’s been a distinct lack of anything really radical and new. But today saw several announcements that will completely change the Microsoft BI stack:
- There will be column-store indexes, provided by the Vertipaq engine, inside the SQL Server relational database. Simon Sabin has a link to more details here.
- There’s a new, corporate BI version of PowerPivot, BISM
- Project Crescent is a new ad hoc query tool coming from the Reporting Services team
Let’s consider what each of these means in turn. Based on the limited information we’ve got so far, column-store indexes in SQL Server massively increase the performance of reporting/OLAP style queries inside the relational engine; if we get column-store indexes in Parallel Data Warehouse, well, I can imagine we’ll get astounding performance over huge data volumes. The pdf linked to in Simon Sabin’s blog says:
Users who were using OLAP systems only to get fast query performance, but who prefer to use the T-
SQL language to write queries, may find they can have one less moving part in their environment,
reducing cost and complexity. Users who like the sophisticated reporting tools, dimensional modeling
capability, forecasting facilities, and decision-support specific query languages that OLAP tools offer can
continue to benefit from them. Moreover, they may now be able to use ROLAP against a columnstore-
indexed SQL Server data warehouse, and meet or exceed the performance they were used to in the past
with OLAP, but save time by eliminating the cube building process.
To paraphrase, if your data’s in SQL Server that’s where it should stay for querying and reporting – MOLAP no longer provides any performance benefit, so the concept of ‘processing’ a cube to get data into a different, OLAP database is gone. If you still want to do OLAP on SQL Server it will be a form of ROLAP, and to me this makes a lot of sense.
Very few details about BISM, the ‘corporate’ version of Powerpivot/Vertipaq, were released but we saw that we could develop BISM models in Visual Studio and the end result was exactly what PowerPivot creates when you deploy a PowerPivot mode to Sharepoint – I guess it’s basically a different type of Analysis Services database. BISM can either hold the data itself in its own Vertipaq store (useful when the data comes from anywhere other than SQL Server, eg Excel, Oracle, text files and so on) or it can act purely as a semantic layer and allow querying data in SQL Server in a ROLAP, multidimensional way. So we’ll be able to create an Excel pivot table, connect to BISM and use that as a thin layer to query data that’s directly in SQL Server. This is clearly what the future of Analysis Services is going to be – while the Analysis Services we know and love today might hang around for a few versions, it’s not got a long-term future in my opinion.
There was a full session on Project Crescent later on today, which I went to, so I have more details on this. It’s essentially a new ad hoc query tool being built by the SSRS team – interestingly it seems to have nothing to do with the rest of Reporting Services, and it doesn’t create rdl files that can be edited in Report Builder or Report Designer. It’s all about flashy visualisation and interactivity and reminds me a lot of Tableau and other fashionable BI tools; I can also see how it can be a replacement for PerformancePoint for creating dashboards. It is only surfaced in Sharepoint (boo! yet another Sharepoint dependency!) and is built in Silverlight; also, it can only work with data sourced from BISM/PowerPivot models. Once you get past the flashy stuff it does much the same that every other ad hoc query tool has been doing since the year dot (Marco, sitting next to me, commented that it was doing much the same thing that Data Analyzer was doing 10 years ago) but the flashy stuff is very flashy indeed, and very impressive – for instance the ability to export views out to PowerPoint slides looks cool; but I do wonder whether it will be as practically useful as something like Tableau so we’ll have to wait and see. I’m pleased to see that someone at MS has finally woken up to the fact that end users might want to use something other than Excel for ad hoc querying.
OK, time for a few drinks and to have some fun! There’ll be another post with more news tomorrow…
Dryad goes commercial?
I’m in the middle of my summer holiday at the moment, hence the relative silence here, but I just noticed this article by Mary Jo Foley suggesting that Microsoft might be turning Dryad into a commercial product next year and couldn’t resist posting it:
http://www.zdnet.com/blog/microsoft/microsoft-research-parallel-programming-project-set-to-go-commercial-in-2011/7161
There was a flurry of interest in it a few years ago but I thought all had gone quiet; maybe this is this the source of the rumours that Microsoft/Hadoop rumours too? I also notice that SQL Server/SSIS has disappeared from the ‘software stack’ diagram; is there going to be any tie-in with the rest of Microsoft’s BI efforts?
OData and Microsoft BI
I first came across OData last year when Jamie Thomson blogged about it; since then I hadn’t really thought about it much until I came across Douglas Purdy’s blog and specifically his post from yesterday which really brought home how important it is to the future of Microsoft BI. I would urge you to watch the video “OData: The Movie” that he mentions in his post because it gives a really good introduction to the way that OData can be used in BI scenarios; if you don’t have the time, all you really need to know is that it’s OData that makes it possible for PowerPivot to consume data from SSRS reports and Sharepoint lists.
Just watching this video made my mind boggle with the possibilities of OData, although since I’m a long way from being an expert in this area I won’t bother to detail all of these fantasies as they’re probably rubbish. However, just consider how much easier life would be for the PowerPivot user of the future if the internet was full of sites that supported OData; certainly, when I’ve looked at the new UK government websites such as the recently-launched http://data.gov.uk/ and http://data.london.gov.uk/, or even sites like the Guardian Data Store that I blogged about last year, I’ve felt that the lack of a standard format to consume this data (a badly-formatted Google spreadsheet is not ideal) has seriously limited these sites’ usefulness. And what if Bing came up with a service like the sadly-useless Google Squared, where you could search for data on the web and return it in a structured, OData format?
I’d also love to see Analysis Services support OData too, in some shape or form. Perhaps it could be used to solve some of the same problems with XMLA that Julian Hyde lists in his recent post on xmla4js. Wouldn’t it also be cool if there was a standard interface for publishing an MDX query up to SSAS from any client tool, and then be able to consume it via OData (similar to what I suggested here)? You’d then enable scenarios like this: power user creates query using an advanced SSAS client tool, then publishes it up to SSAS whereupon it becomes something similar to a SQL stored proc with the results available as an OData feed, and therefore can be consumed either in your own code (eg in a website), by PowerPivot users, or by something like SSRS (which would then know nothing about the query used but would just be used to format the resultset).
Kapow Technologies
In all of the Gemini demos seen so far, a big thing is made of the fact that it can be used to integrate data from outside sources (eg some stuff you found on the web) with internal data (eg data from your data warehouse). This is all very well, but it assumes you have some way of actually capturing the data that’s out there on the web in a usable form and automating the capture of new data from these sources on a regular basis.
For example, if you’ve found a table containing some data you want on a website, what do you need to do to actually use it? You’d need to copy the table and then paste it into Excel perhaps; you’d then need to do some reformatting and editing, copy it again and then paste it to wherever you need it such as Gemini. All very manual; in fact, a right pain and not something you’d want to do on a regular basis. There might be an RSS or an Atom feed, and you might be able to use a tool like Yahoo Pipes (see also Jamie Thomson’s recent post), but there isn’t always one and even when there is it might not contain all the data you need or be in the right format.
Last week I had a call with Kapow Technologies, and they’ve got a really cool tool that addresses this problem:
http://kapowtech.com/index.php/solutions/web-and-business-intelligence
It’s something like a cross between a screenscraper and an ETL tool, and it allows you to harvest data from web pages, do some transformation and cleansing, and then move it somewhere else like a database table (where more traditional ETL tools can take over) or another application. There’s a bit of background on this type of technology here:
http://en.wikipedia.org/wiki/Web_scraping
I got hold of an eval version and had a play with it, and while I’m hardly the right person to give an expert review it does seem very powerful. It’s certainly easy to use: I had a simple scenario working within twenty minutes, and I think anyone with some SSIS experience will find it fairly straightforward. First of all you have to declare a set of objects (which can then be mapped onto relational tables) that will hold the data you want to collect; you then create a robot that will harvest data from web pages and load it into the objects; finally you can run the robot either from the command line or on a schedule. More information can be found in the help:
http://help.kapowtech.com/7.0/index.jsp
In my example, I took a web page from the BBC website that shows the weather where I live:
http://news.bbc.co.uk/weather/forecast/2302?&search=amersham&itemsPerPage=10®ion=uk&area=Amersham
I wanted to harvest the time, the basic outlook and the temperature from the first box in the “first 24 hours” section:
Here’s a screenshot of what that page looks like in the RoboMaker tool:
You can see that I’ve selected the tag containing the time in the central browser window, and that it will be mapped to the attribute weather.forecastTime.
I think the thing that I find really exciting about this tool is that, when you think about it, there is a whole load of useful data out there on the web that would be great to use as part of your BI if you can extract it quickly and easily – and indeed legally, because I suspect there might be some legal objections to doing this on a large scale. One of the examples Kapow gave me was of a customer that harvested comments on their products from places like Amazon, then fed that data through a text mining application, to monitor public opinion on their products (today’s Dilbert shows how this information could be used!); I would have thought that this is something a lot of companies would like to do. The weather forecasts I was looking at could also be useful for predicting retail sales in the short term; if you’re an online retailer you’d probably want to compare your prices for certain products with those of your competitors. I’m sure we’ll be seeing much more use of web data in BI, captured using tools like this, in the future…
Microsoft Solver Foundation
Via SoCalDevGirl, I’ve just discovered another interesting piece in the somewhat fragmented Microsoft Business Intelligence story: Microsoft Solver Foundation. Here’s the official website:
http://www.solverfoundation.com/Default.aspx
What is it then? Ahem, well, if you can’t make much sense of the blurb on the website (like me) it’s probably not aimed at you. Here’s a sample quote:
Solver Foundation is a Microsoft framework designed to deliver critical business insight tools to CxOs, quantitative analysts and developers of mission-critical systems. Traditionally referred to as mathematical programming, these tools provide business intelligence and planning support to organizations seeking maximal competitive advantage.
I suggest you read the full overview to get a better idea of what it does. What I do understand, though, is that anyone who uses this is going to be interested in using the rest of the Microsoft BI stack; I sincerely hope that the Solver Foundation team is talking to the other BI teams and that some kind of coherent BI strategy will emerge. If one does it’s clearly going to be Excel-centric (which makes a lot of sense): Solver Foundation has an Excel addin; there’s also the data mining addin; there’s SSAS’s own integration with Excel; and of course Gemini will be surfaced through Excel, tying up SSAS, some kind of data cleansing functionality, and possibly some data mining functionality too into one compelling package.
