Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Archive for June 2009

Kapow Technologies

with one comment

In all of the Gemini demos seen so far, a big thing is made of the fact that it can be used to integrate data from outside sources (eg some stuff you found on the web) with internal data (eg data from your data warehouse). This is all very well, but it assumes you have some way of actually capturing the data that’s out there on the web in a usable form and automating the capture of new data from these sources on a regular basis.

For example, if you’ve found a table containing some data you want on a website, what do you need to do to actually use it? You’d need to copy the table and then paste it into Excel perhaps; you’d then need to do some reformatting and editing, copy it again and then paste it to wherever you need it such as Gemini. All very manual; in fact, a right pain and not something you’d want to do on a regular basis. There might be an RSS or an Atom feed, and you might be able to use a tool like Yahoo Pipes (see also Jamie Thomson’s recent post), but there isn’t always one and even when there is it might not contain all the data you need or be in the right format.

Last week I had a call with Kapow Technologies, and they’ve got a really cool tool that addresses this problem:
http://kapowtech.com/index.php/solutions/web-and-business-intelligence
It’s something like a cross between a screenscraper and an ETL tool, and it allows you to harvest data from web pages, do some transformation and cleansing, and then move it somewhere else like a database table (where more traditional ETL tools can take over) or another application. There’s a bit of background on this type of technology here:
http://en.wikipedia.org/wiki/Web_scraping

I got hold of an eval version and had a play with it, and while I’m hardly the right person to give an expert review it does seem very powerful. It’s certainly easy to use: I had a simple scenario working within twenty minutes, and I think anyone with some SSIS experience will find it fairly straightforward. First of all you have to declare a set of objects (which can then be mapped onto relational tables) that will hold the data you want to collect; you then create a robot that will harvest data from web pages and load it into the objects; finally you can run the robot either from the command line or on a schedule. More information can be found in the help:
http://help.kapowtech.com/7.0/index.jsp

In my example, I took a web page from the BBC website that shows the weather where I live:
http://news.bbc.co.uk/weather/forecast/2302?&search=amersham&itemsPerPage=10&region=uk&area=Amersham

I wanted to harvest the time, the basic outlook and the temperature from the first box in the “first 24 hours” section:

image

Here’s a screenshot of what that page looks like in the RoboMaker tool:
image

You can see that I’ve selected the tag containing the time in the central browser window, and that it will be mapped to the attribute weather.forecastTime.

I think the thing that I find really exciting about this tool is that, when you think about it, there is a whole load of useful data out there on the web that would be great to use as part of your BI if you can extract it quickly and easily – and indeed legally, because I suspect there might be some legal objections to doing this on a large scale. One of the examples Kapow gave me was of a customer that harvested comments on their products from places like Amazon, then fed that data through a text mining application, to monitor public opinion on their products (today’s Dilbert shows how this information could be used!); I would have thought that this is something a lot of companies would like to do. The weather forecasts I was looking at could also be useful for predicting retail sales in the short term; if you’re an online retailer you’d probably want to compare your prices for certain products with those of your competitors. I’m sure we’ll be seeing much more use of web data in BI, captured using tools like this, in the future…

Written by Chris Webb

June 29, 2009 at 10:03 pm

Posted in BI

SAP, MDX and the right language for BI

leave a comment »

Via Amyn Rajan’s blog I’ve learnt about a forthcoming MDX book, “MDX Reporting and Analytics with SAP”, and also seen some encouraging news on the SAP Business Objects Roadmap where MDX will be a core element. As an MDX guy – and more specifically as someone who make his living from MDX – I’m pleased to see this level of support for MDX from someone other than Microsoft and Mondrian. Even if I never do any consultancy work on SAP MDX (although I’d be interested since I hear rates are much higher than for Microsoft work!) I have a vested interest in the popularity of the MDX language in general. Yes, it’s difficult to learn and sometimes inconsistent, but I am 100% convinced it’s the best language for multidimensional queries and calculations and much better than SQL.

To a certain extent we’re all guilty of promoting the technology we know and love as the way of solving all problems; and whenever we come across a problem our technology can’t solve easily, the temptation is to extend it and improve it so that it can solve the problem. However in the case of SQL and BI there comes a point where no amount of extensions and improvements can make it do what you want it to do efficiently and easily. To paraphrase Dr Johnson, a complex BI-type calculation done in SQL is like a dog walking on its hind legs. It is not done well, but you are surprised to find it done at all. There are plenty of vendors and bloggers who would disagree with me on this though, I know.

One good example of where MDX scores over SQL is that it inherently knows about order on hierarchies: for example it knows (because you told it when you built your dimension) that Monday comes after Sunday, that 2009 comes after 2008, that June comes after May. This makes doing time-based calculations very easy, and time based calculations are part of just about every BI solution. The ability to order a set and then work with it is also important for other types of problem, such as the grouped string concatenation puzzle that Adam Machanic blogged about recently. In MDX, to solve this problem and get an ordered list of product names, you just need to order a set and then use the Generate function, for example:

Generate(
Order(
  NonEmpty(
   [Product].[Product].[Product].MEMBERS
  ,[Measures].[Internet Sales Amount])
,[Product].[Product].CurrentMember.Name,BASC)
,[Product].[Product].CurrentMember.Name,",")

But I’m getting carried away again. Arguing the merits of a language is all very well, but it’s commercial support that is what actually matters of course. And to get back to my original point that’s why I’m pleased to see that SAP seems to agree with me that MDX is a good thing.

Written by Chris Webb

June 26, 2009 at 11:46 am

Posted in MDX

Speaking at the PASS Summit 2009

leave a comment »

The agenda for the PASS Summit 2009 has been announced, and I’m pleased to say that I’ve been accepted as a speaker. A full listing of who’s speaking can be found here:
http://summit2009.sqlpass.org/Agenda/ProgramSessions.aspx
http://summit2009.sqlpass.org/Agenda/SpotlightSessions.aspx

I’ll be doing a session on ‘Designing Effective Aggregations in Analysis Services 2008’. Hope to see some of you there!

Written by Chris Webb

June 22, 2009 at 4:20 pm

Posted in Events

Error messages in MDX SELECT statements and what they mean

with 4 comments

Anyone that has tried to learn MDX will know that, when you make a mistake somewhere in your code, the error messages that Analysis Services gives you are pretty unhelpful. It was suggested to me recently while I was teaching an MDX course that I should blog about common error messages and what they actually mean; so here’s a list of a few example queries using Adventure Works that return confusing errors, the error messages themselves, and details on how to solve the problems. I’ve deliberately concentrated on query-related errors rather than calculation-related errors (that can be a future blog post); if you can think of any more errors that I should cover please leave a comment.

1) Query causing error:

SELECT
{[Measures].[Internet Sales Amount]} ON COLUMNS 
[Date].[
Calendar Year].MEMBERS ON ROWS
FROM [Adventure Works]

Error message: Query (3, 1) Parser: The syntax for ‘[Date]‘ is incorrect.

The first step to solving this fairly simple syntax error is understanding the values in brackets in the error message. (3,1) indicates that the error is at character 1 on the third line of the query, where we have the expression [Date].[Calendar Year].MEMBERS; we should also see a red squiggly underneath this text in SQL Management Studio. There’s nothing wrong with this expression though, apart from the fact that it’s in the wrong place: what has happened is that we’ve forgotten to include a comma after COLUMNS immediately beforehand. If we put one in, the query runs.

Solution:

SELECT
{[Measures].[Internet Sales Amount]} ON COLUMNS, 
[Date].[
Calendar Year].MEMBERS ON ROWS
FROM [Adventure Works]

 

2) Query causing error:

SELECT
{[Measures].[Internet Sales Amount]} ON COLUMNS,
[Date].[Calendar].[Calendar Year].MEMBERS.CHILDREN ON ROWS
FROM [Adventure Works]

Error message: Query (3, 1) The CHILDREN function expects a member expression for the 1 argument. A tuple set expression was used.

This is a very common error that people encounter while learning MDX, and it all comes down to understanding the difference between sets, tuples and members. In a lot of situations Analysis Services is very forgiving: if it expects a set and you give it a single member, then it will cast that member into a set with one item in it for example. It can’t do this for you all the time, though, and you do need to understand what kind of object each function returns and/or expects for a parameter. In this case, the problem is that the .CHILDREN function needs to be passed a member and the .MEMBERS function returns a set (strictly speaking, as the error says, it’s a set of tuples); therefore we can’t use the two functions together. If we want to find all of the children of all years, we can use the DESCENDANTS function instead, which can accept a set as its first parameter.

Solution:

SELECT
{[Measures].[Internet Sales Amount]} ON COLUMNS,
DESCENDANTS(
[Date].[Calendar].[Calendar Year].MEMBERS
, [Date].[Calendar].[Calendar Semester])
ON ROWS
FROM [Adventure Works]

 

3) Query causing error:

SELECT
[Measures].[Internet Sales Amount], [Measures].[Internet Tax Amount] ON COLUMNS,
[Date].[Calendar].[Calendar Year].MEMBERS
ON ROWS
FROM [Adventure Works]

Error message: Parser: The statement dialect could not be resolved due to ambiguity.

Analysis Services supports no less than three query languages: MDX, DMX and a very limited subset of SQL. As a result, when you run a query it needs to work out what query language you’re using and can easily get confused if you make a mistake. In the query above we’ve given a list of the two measures we want to see on the columns axis, but we’ve forgotten to surround this list in braces to turn it into a set – and it’s a set that is required for the axis definition. This is an error that is commonly made by people with a background in SQL, and indeed the problem here is that the error has made the query look a bit too much like SQL or DMX. Putting in braces where they’re needed fixes the problem and removes the ambiguity.

Solution:

SELECT
{[Measures].[Internet Sales Amount], [Measures].[Internet Tax Amount]} ON COLUMNS,
[Date].[Calendar].[Calendar Year].MEMBERS
ON ROWS
FROM [Adventure Works]

4) Query causing error:

SELECT
{[Measures].[Internet Sales Amount1], [Measures].[Internet Tax Amount]} ON COLUMNS,
[Date].[Calendar].[Calendar Year].MEMBERS
ON ROWS
FROM [Adventure Works]

Error message: Query (2, 2) The member ‘[Internet Sales Amount1]‘ was not found in the cube when the string, [Measures].[Internet Sales Amount1], was parsed.

A fairly straightforward error this: we’ve tried to reference a member that doesn’t exist in our query – it’s the extra 1 on the end of the name that’s the problem. The way to avoid this is to always let Analysis Services generate unique names for you, and you can do this by dragging the member (or any other object) from the metadata pane in SQL Management Studio into the MDX query pane when you’re writing queries. Here, using the correct member unique name solves the problem.

Solution:

SELECT
{[Measures].[Internet Sales Amount], [Measures].[Internet Tax Amount]} ON COLUMNS,
[Date].[Calendar].[Calendar Year].MEMBERS
ON ROWS
FROM [Adventure Works]

Note that for dimensions other than the Measures dimension, what happens in this scenario depends on how you’ve set the MDXMissingMemberMode property. By default if you write something that looks like it could be an MDX unique name, but which isn’t actually the unique name of a member on a hierarchy, Analysis Services will simply ignore it. So the following query returns nothing on rows because the year 2909 doesn’t exist in our Calendar hierarchy:

SELECT
{[Measures].[Internet Sales Amount], [Measures].[Internet Tax Amount]} ON COLUMNS,
{[Date].[Calendar].[Calendar Year].&[2909]}
ON ROWS
FROM [Adventure Works]

And worse, the in this query a genuine syntax error is completely ignored too:

SELECT
{[Measures].[Internet Sales Amount], [Measures].[Internet Tax Amount]} ON COLUMNS,
[Date].[Calendar].[Calendar Year].MAMBERS
ON ROWS
FROM [Adventure Works]

Written by Chris Webb

June 19, 2009 at 10:59 am

Posted in MDX

Microsoft .NET Provider for Oracle dead

with 2 comments

Products, die, come back to life, die again… Anyway, those of you building cubes from Oracle data sources may be interested to know that the Microsoft .NET provider for Oracle has been killed off. Here’s the official announcement-disguised-as-a-blog-entry:
http://blogs.msdn.com/adonet/archive/2009/06/15/system-data-oracleclient-update.aspx

Yet again, it’s those persuasive “customers, partners, and MVPs” who are asking for the product to be canned. I just hope that there aren’t any “customers, partners, and MVPs” who are asking for Steve Ballmer to sell Microsoft to Google for $1, because then we’ll all be in trouble… Anyway, needless to say, not everyone is pleased:
http://www.theregister.co.uk/2009/06/18/microsoft_kills_oracle_connector/

One blog entry I’ve had half-written for a long time is a discussion of the problems faced when building cubes from Oracle data sources (thanks for your help on this David; anyone else with any hints, please leave a comment). There are a lot of bugs and data type issues that need to be dealt with, and from what I’ve gathered using the Microsoft .NET provider was part of how they could be worked around. 

Written by Chris Webb

June 18, 2009 at 10:02 pm

Posted in Analysis Services

PerformancePoint Planning – Back from the Dead!

with 2 comments

Here’s the official announcement from http://www.microsoft.com/bi/partners/default.aspx :

Financial Planning Accelerator

Microsoft is pleased to make available the Financial Planning Accelerator (FPA). The FPA is source code and project files derived from the PerformancePoint Server 2007 Planning module. Based on requests from customers and partners, we are making this code available on a no-cost, individual license.
This is unsupported source code that customers and partners can use to support or change PerformancePoint Server Planning functionality. Derived object code files can be distributed to end users with Microsoft SharePoint Server Enterprise Client Access Licenses. To obtain access to the FPA a license agreement between Microsoft and the customer or partner is required. After that agreement is in place, download instructions will be made available.
Please e-mail
fpasupp@microsoft.com to request the agreement.

It’s not exactly open source, but it does mean that the partners who were hit hardest when PerformancePoint Planning was killed off can now get their hands on the source code, modify it and sell it on to their customers so long as those customers have the right Sharepoint licences. The question is now, will anyone take Microsoft up on this offer?

Written by Chris Webb

June 18, 2009 at 12:50 pm

Posted in PerformancePoint

Implementing Analysis Services Drillthrough in Reporting Services

with 4 comments

For some reason I’ve never needed to implement Analysis Services drillthough (note: not the same thing as Reporting Services drillthrough; why can’t they use consistent terminology?) in a Reporting Services report. Of course, Reporting Services support for Analysis Services being what it is, it’s not a straightforward task and since I’ve recently come across a few good blog posts that discuss the different ways you can do it I thought I’d link to them.

The main problem is that you can’t execute an MDX Drillthrough statement using the MDX query designer and the Analysis Services data source. You have four options then:

  1. You can execute the Drillthrough statement through an OLEDB data source instead. Gurvan Guyader shows how to do this in the following blog entry (in French, but with lots of screenshots):
    http://gurvang.blogspot.com/2009/05/drillthrough-ssas-dans-ssrs.html
    The problem with using an OLEDB data source is that you lose the ability to use parameters and have to use Reporting Services expressions to dynamically built your Drillthrough statement instead.
  2. It turns out you can also execute a Drillthrough statement by pretending it’s DMX, and so use regular MDX parameters, as Francois Jehl describes here (also in French):
    http://fjehl.blogspot.com/2009/06/drillthrough-ssas-dans-ssrs-ajout-au.html
  3. If you buy Intelligencia Query (which, as always, I need to state that I have a financial interest in) then Drillthrough statements now work with no tricks necessary:
    http://andrewwiles.spaces.live.com/Blog/cns!43141EE7B38A8A7A!562.entry
  4. Last of all, you can try not using a Drillthrough statement at all and use an MDX query instead to get the same data. You will lose some functionality though by doing this, however, most notably the MAXROWS option.

Written by Chris Webb

June 16, 2009 at 1:13 pm

Posted in Reporting Services

Follow

Get every new post delivered to your Inbox.

Join 3,301 other followers