Archive for March 2008
Possibly a little early to blog as this startup seems to be a while away from RTM, but it looks cool and BI as a service is very topical…
What they’ve got is a "a complete, on-demand business intelligence platform combining analytics, reporting, data warehousing and data integration". The workflow seems to be that you upload your data, then you have an online environment where you have OLAP and collaboration tools; similar, I guess, to what Panorama are working on with Google docs? I guess they’re going to be making their money designing the initial data warehouse/ETL/OLAP design for each customer as well as subscription costs?
This raises some fundamental questions about BI as a service in my mind:
- Will companies be willing to upload their most valuable and secret data to a third party on the web? If they’re willing to use online data stores like Amazon S3 or SQL Server Data Services or other hosted solutions then I suppose so, but that’s a big if – and these online data stores aren’t necessarily positioned to hold the kind of ultra-sensitive data that you work with in a BI system.
- However complete a platform you build, the old adage that all data ends up in Excel needs to be considered. Remember it’s Excel I said here, not any old online spreadsheet. Will the guys in Finance abandon Excel to do spreadsheets on the net? I’m not sure. This is where Microsoft have a locked-in advantage, but it’s not an insurmountable problem for BI service-providers. Why not expose an XMLA interface for your hosted OLAP and let people connect to it directly from Excel? It’s possible for Analysis Services and SAP BW, and I’m sure companies like Simba will be only too happy to Excel-enable other OLAP tools.
- Unless BI-as-a-service companies let customers or partners design and build their own solutions on their platform, will these companies be able to scale out to meet the demand of designing enough apps for potentially hundreds of customers? That’s a lot of BI projects for any one consultancy to run. More importantly, will they be able to do an adequate job of designing each app? One of the arguments that’s always used against outsourcing BI projects is that face-to-face contact between designer and business users is essential, and that’s a view I tend to agree with.
Clever old Panorama… just when you thought that it was about time for them to roll over and die, having been jilted by Microsoft and SAP, they team up with Google to do BI. From their blog, here’s the press release:
and more information is here:
and there’s a tutorial here:
This is part of the wider Google announcements on Google Gadgets:
and the Google Visualization API:
I need some time to digest this, but my first impression is that this is BIG. The PowerApps stuff sounds very interesting indeed.
Those community conferences keep coming: it’s DeveloperDay Scotland next, on May 10th. You can see the agenda here:
I’m not speaking (unfortunately – to my shame I’ve never been to Scotland) but SQLBits is sponsoring a track and Tony Rogerson, Martin Bell and Allan Mitchell will be there. Also speaking is Gary Short who, when I was in my first IT job with no coding experience beyond BBC Basic and Turbo Pascal, took the time to teach me VB6. I owe him a few beers…
More fun with Reporting Services. The other week I was working with a customer who I suspected was experiencing query performance problems as a result of the issue described here:
Certainly in Profiler I could see some fairly trivial queries which were taking much longer than they should have done, and which when I ran them on a test system ran much faster. And certainly when I ran a ‘big query’ and a ‘small query’ simultaneously I saw the ‘small query’ run slow, and when I got the secret ini file settings from PSS my ‘small queries’ ran much faster. But there was still some doubt in my mind over the extent to which the customer was suffering this problem – just by looking at my Profiler traces from the Production servers I couldn’t tell whether lots of small queries were overlapping with the big queries.
What I then thought was that it would be useful to be able to visualise the information from a Profiler trace in some way. I did some experiments with different chart types and different tools, mainly Excel 2007, but in the end I went with Reporting Services 2005 and the Dundas Gantt chart control. I’m not sure whether the Gantt chart has made it into the current RS2008 CTP but I guess it will soon if it hasn’t; in any case the data was coming from a 2005 system so that’s what I used.
The first step was to get the data out of Profiler into some useful format; the easiest way turned out to be to get Profiler to output directly to a SQL Server table and for the problem here all I needed was the Query End events, and specifically the Duration, SPID, TextData, ConnectionID, EndTime, EventClass, EventSubClass, and NTCanonicalUserName columns. I could now run my Profiler trace against the Production system and capture some data. The next problem was to manipulate the data into a useable format. Gantt charts need a start time and a duration for each event and the problem with trace data is that you only know how long something takes when it’s finished, ie when you have an ‘End’ event. Subtracting the EndTime from the duration gives the derived start time of the event, which is one way of dealing with the problem of working out when queries started and finished. There are Query Begin and Query End events and perhaps another, better, solution would have relied on the knowledge that you can only have one query executing on any given session at any one time and matching, so it should be possible to match each Query End event to a Query Begin event. In fact I started off more ambitiously by including other types of events, such a Progress Reports, but the problem here I found was that the StartTimes and EndTimes given by Profiler were rounded to the nearest second which meant that subtracting the Duration from the EndTime didn’t give accurate derived start times and I was finding events were coming out in what was clearly the wrong order when I tried to order them by derived Start Time (some of the SQL I’ll give below is more complex than necessary for just the Query End event because it was written for these other event types too). It would be nice to visualise the various internal events that are generated when you run an individual query, such as reads from partition and cache, but I need to do some more work before that’s possible.
Anyway, with my Query End events I wrote the following SQL query to use in Reporting Services which finds the derived start time for each query, then makes the first query to start at time 0 and calculates all other derived start times relative to that first query. As always I need to apologise for my SQL in advance:
with RelativeStartTimes (OriginalRowNumber, RelativeStartTime, EventClass, DurationMSecs
, TextData, ConnectionID, EventSubClass, NTCanonicalUserName)
–Find the Relative Start Time of each event
–by finding the difference in seconds between the End Time
–of the event and the Earliest End Time in ms
–then subtracting the Duration
) * 1000) – isnull(a.Duration, 0) as RelativeStartTime,
isnull(a.duration, 0) as DurationMSecs,
from dbo.[102QE] a cross join
–Get the earliest End Time in the Trace
(select min(isnull(b.EndTime, b.StartTime)) as EarliestEnd
from dbo.[102QE] b
row_number() over(order by (r.RelativeStartTime – x.EarliestRelStartTime) asc) as row,
r.RelativeStartTime – x.EarliestRelStartTime as BeginTime,
r.RelativeStartTime – x.EarliestRelStartTime + r.DurationMSecs as EndTime,
from RelativeStartTimes r
(–Find the Earliest Relative Start Time
min(RelativeStartTimes.RelativeStartTime) as EarliestRelStartTime from
I could then use this as the basis for my Reporting Services report. As always with these things I spent most of my time working out how to use the Gantt chart but when I did, and after I’d done some other funky stuff that allowed me to filter by row number and start/end time, and click on an item in the Gantt chart to view the MDX for the query that a particular line represented, I got a report looking like this:
Another problem to mention here is that, as with this report, when you’ve got a lot of queries to look at the Gantt chart takes a loooooong time to render. Anyway, from this initial view you can see that almost all queries executed very quickly but fairly early on there was a very long query, represented by the longish brown line at around event 250 that I’ve circled. Filtering the view down to show this query and those around it in more detail gets you this:
Although the vast majority of queries here execute immediately (they’re just small queries the front end uses to retrieve metadata etc and are represented as single blobs) I’ve circled a few queries around the main query that look like they are taking a few seconds to run. And indeed, running at least one of these queries on the test server showed it runs faster on its own than it did when this Profiler trace was running. Which proved my point.
Moving away from this particular exercise, I’m fairly surprised that none of the big SQL Server tool vendors have tackled the problem of monitoring Analysis Services performance and activity. The only tool I know of that does this is Companion for Analysis Services:
Where are the others though? I would have though Analysis Services would be the obvious next step for them, given that many SQL Server relational shops also use the MS BI stack too. Whenever I talk to enterprise customers they always ask me what tools are available, certainly… I guess there’s three things going on here: first of all the all-too-common ignorance of Analysis Services in the wider SQL Server community; second the problem that no-one (not even the SSAS experts) has really sat down and thought about what needs to be monitored in an Analysis Services deployment; and thirdly, at least before AS2008, the hooks for getting at data that can be used for monitoring either haven’t been available or have been well hidden. This and my last post on viewing and killing queries in Reporting Services have given a few ideas on what’s possible, though, and hopefully within the next few years Microsoft and/or third party vendors will give us the tools we need.
I guess anyone who has had to demo the new DMVs in Analysis Services 2008 has, like me, put together a few Reporting Services reports to show off what they can do. Now I don’t have anything much to add about the content of the DMVs and schema rowsets beyond what Darren and Vidas have already blogged, but I did discover an interesting trick while I was creating my reports that I thought would be good to share.
One of my example reports was displaying the results of the following DMV query:
select * from $system.discover_commands
What this does is display information on currently executing command or the last command that executed on every open session on the server (see here for more details). Pretty useful information in that if someone is running the query from hell on your AS server you’ll be able to see it here; what you’d really want to do though is act on this information and be able to kill the query. You can certainly get the same information and can kill the query if you’re using the Activity Viewer sample app that comes with the SQL Server samples (Jesse Orosz blogged about its features and shortcomings here); but can you do the same thing from within Reporting Services without writing any code? It turns out you can.
Here’s what my report looks like in BIDS 2008:
So far, so straightforward. Notice the column on the far right that contains the text "Kill This": this is static text and is a link to another report which takes one parameter, the SPID of the session that represents the current row. When you click on this link you jump to another report and it’s the act of running this report that kills the query on the session whose SPID you pass in. This second report looks like this in BIDS:
You’ll see that the design surface of the report contains a single text box with the message that your query has been killed successfully. The killing is done by the query in the sole dataset, and the query is generated by a Reporting Services expression that looks like this:
<SPID>" + Parameters!SPID.Value.ToString() + "</SPID>
It’s an XMLA Cancel command with the SPID injected into it from the report parameter mentioned above. Two things to know to make it work:
- You have to execute it on an OLEDB-type connection, ie not Reporting Services’ "Analysis Services" connection
- The command doesn’t return any data and RS expects all its datasets to return at least one field. To work around this, you need to create a new dummy calculated field on the dataset that returns any value you like (you may see some error messages along the way but they can safely be ignored):
And bingo, you’ve got a basic activity monitor implemented in RS2008! I haven’t tested it, but you should be able to do exactly the same thing in RS2005 if you use the functionality in the XMLADiscover class Darren put into the Analysis Services Stored Procedure Project; the reason I like the approach I’ve just described more than this option is that whenever I’ve wanted to use some ASSP functionality in a production environment the customer has, for obvious reasons, not been comfortable uploading a dll onto their server.
I’m always interested to learn about the different dialects of MDX, and here’s a document on using MDX with TM1:
Thanks to Paul Simon for sending me this…
BARC, the publishers of the BI Survey, have asked me to publicise a new survey they’re doing on BI Competency Centres:
It should only take a few minutes to complete, and they donate €1 to charity for each participant.
Simon Sabin has a good list of the new features in SQL Server tools for 2008:
Top of the list is the fact that you will be able to set connection string properties when you connect to AS in SQL Management Studio; it’s not there yet, but I guess it’ll appear in CTP7. This means (sniff) I’ll no longer need MDX Sample App any more…