Comparing MDX Query Results With Tablediff
Recently I was working on a project where the customer wanted to upgrade from SP1 to SP2 but found that when they did so the results of some of the calculations on their cube changed. Unfortunately these changes weren’t easy to spot: their MDX Script was >1500 lines long and the bugs they were running into were dependent on the query being run and whether the cache was cold or warm (in fact one of them was the bug I blogged about here). When testing various workarounds and hotfix builds, the problem I ran into was a problem I’ve run into before on numerous occasions: when I changed something on the MDX Script, how did I know it didn’t break something else?
In the past what I’ve done was take a few MDX queries, run them, copy and paste the results into Excel, make my changes and then rerun the same queries and paste the new results into Excel again and use Excel formulas to compare the old and new resultsets. However in this case this approach just wasn’t feasible and I started looking for a better way of comparing large numbers of MDX resultsets. Now there is definitely a tool to be written here, probably one which runs batches of queries, serialises the cellsets and saves them to disk, and then compares the cellsets and highlights any cells with differences, but I didn’t have time to write it. I was talking the problem over with Didier Simon of PSS, though, and he made a great suggestion – use tablediff.exe, the command line utility you get with SQL Server to compare relational tables. You can read more about it here:
This turned out to be the key. I cannibalised bits of my Analysis Services cache warmer and created two SSIS packages, the first of which ran a batch of MDX queries retrieved from a relational table (originally saved from a Profiler trace) through SQL Server using OpenRowset and saving the results in a table using a SELECT INTO. Here’s what the control flow looked like:
And here’s an example of the kind of SQL query I was generating:
select identity(int, 1,1) as idcolumn, * into Query1
from openrowset(‘MSOLAP’, ‘Datasource=localhost; Initial Catalog=Adventure Works DW’,
‘ with cell calculation test for ”(*)” as ”cstr(round(measures.currentmember,5))”
select Measures.[Internet Sales Amount] on 0,
[Date].[Calendar Year].Members on 1
from [Adventure Works]‘)
A couple of things to note – Tablediff needs a column which can uniquely identify each row in the tables its comparing, hence the identity column. I also found that I was running into the precision problems described here and as a result, tablediff was flagging up cells as having different values when in fact the values were differing only by a miniscule amount, so I created a calculated cell assignment in the WITH clause which rounded up every value to five decimal places. I could do this because I knew none of my queries had WITH clauses, but another solution would have been to create a session calculated cell assignment which did the same thing.
After running this package I had a SQL Server database with as many tables containing results as I had queries. I could then make my changes to the cube and rerun the same package to dump a new set of results to a different database, and then run my second package to compare the results. This was, again, a fairly simple package to put together:
All I did was use a ForEach loop to loop over the tables in the database containing the first set of results, and then use an Execute Process task to call tablediff to compare each table with the equivalent table in the second database; if it found any, I used the -et argument to save the output to another table. I also used the -c argument to get column-level differences and the -b argument to specify the number of bytes to compare for large objects, necessary because the SELECT INTO creates an ntext column for all the member names on the Rows axis.
Overall, it’s not an ideal solution (coding a proper app is definitely the way to go) but it did the job, and hopefully this will be useful to someone else…