Archive for the ‘Gemini’ Category
Gemini is of course only a codename, and it was announced today that it’s real name will be ‘PowerPivot’. Given that there have been some pretty awful Microsoft branding decisions over the years I think PowerPivot is actually a very good name (let’s be glad it’s not called something like “Microsoft Office 2010 SQL Server Analysis Services R2 Desktop Edition”), certainly one that will stick in the minds of its target users. There’s a new website, albeit with no new information I can see, here:
Also, here’s a blog entry summarising the new features that are coming in Sharepoint 2010:
Here’s an excerpt highlighting the BI-relevant features:
Historically, business intelligence has been a specialized toolset used by a small set of users with little ad-hoc interactivity. Our approach is to unlock data and enable collaboration on the analysis to help everyone in the organization get richer insights. Excel Services is one of the popular features of SharePoint 2007 as people like the ease of creating models in Excel and publishing them to server for broad access while maintaining central control and one version of the truth. We are expanding on this SharePoint 2010 with new visualization, navigation and BI features. The top five investment areas:
1. Excel Services – Excel rendering and interactivity in SharePoint gets better with richer pivoting, slicing and visualizations like heatmaps and sparklines. New REST support makes it easier to add server-based calculations and charts to web pages and mash-ups.
2. Performance Point Services – We enhanced scorecards, dashboard, key performance indicator and navigation features such as decomposition trees in SharePoint Server 2010 for the most sophisticated BI portals.
3. SQL Server – The SharePoint and SQL Server teams have worked together so SQL Server capabilities like Analysis Services and Reporting Services are easier to access from within SharePoint and Excel. We are exposing these interfaces and working with other BI vendors so they can plug in their solutions as well.
4. “Gemini” – “Gemini” is the name for a powerful new in memory database technology that lets Excel and Excel Services users navigate massive amounts of information without having to create or edit an OLAP cube. Imagine an Excel spreadsheet rendered (in the client or browser) with 100 million rows and you get the idea. Today at the SharePoint Conference, we announced the official name for “Gemini” is SQL Server PowerPivot for Excel and SharePoint.
5. Visio Services – As with Excel, users love the flexibility of creating rich diagrams in Visio. In 2010, we have added web rendering with interactivity and data binding including mashups from SharePoint with support for rendering Visio diagrams in a browser. We also added SharePoint workflow design support in Visio.
Now, I’ve not been following Sharepoint 2010, but two things strike me here. First of all, Excel Services does heatmaps? It’s the end of a long day, but I don’t remember seeing heatmaps in my Excel 2010 CTP. I wonder if this is a new charting feature…? Secondly, Visio Services – ok, a quick Google shows that this has been public knowledge for over a year now, but I think this is very interesting from a BI point of view. Remember that Visio can already consume data from SSAS (see here on how to do this); assuming that Visio Services will be able to do the same thing, I think we have here yet another way of creating BI dashboards.
UPDATE: before you leave a comment, it’s just struck me that what Excel means by a heatmap is that colour-scale cell formatting that’s been possible since Excel 2007. Hmm, so probably nothing to get excited about.
So after almost a year of hype I’ve finally got my hands on the first CTP of Gemini! I’m currently on holiday (and yes, I take my laptop on holiday, though at least my wife does too so we’re as bad as each other) but I couldn’t resist downloading it and taking a look. Here are my first impressions… and as soon as I get back home I’ll post something more detailed.
- Installation was pretty straightforward on my Windows 7/Excel 2010 VM. Note that you do need Excel 2010 to use Gemini, as I suspected. The good thing is that it’s an addin rather than native Excel functionality so at least Gemini isn’t tied to the overall Office release cycle. I wonder how long it will be between versions?
- When you open Excel, you see a new Gemini tab on the ribbon that looks like this:
- The ‘Load and Prepare Data’ button starts the fun and allows you to pull in data from various sources. You can either select entire tables or write your own SQL, and again I found it all very easy to do what I wanted; clearly a lot of what’s been learned from the SSAS cube design wizard has been applied here to make the process as smooth as possible. You can also get data from SSRS reports using the new Data Feed rendering functionality that Teo discusses here, paste data in from the clipboard, and link a Gemini table to an Excel table (this is what the Create Linked Table button in the screenshot above does).
I can’t see much evidence yet of data preparation rather than just plain old loading, but that may well be yet to come. I’d also like to see a visual way of managing the relationships between tables, as you get with the SSAS DSV. Gemini doesn’t handle parent/child relationships yet; I’m not quite sure it handles other more complex types of relationship either but I need to play around a bit more here.
- You can create new columns in each of the tables that you load into Gemini and define what values they display using the new Data Analysis eXpressions (DAX) language. Now MS are seemingly keen to stress that DAX isn’t an MDX replacement and I suppose that technically that’s true, but let’s be honest, it’s doing the same job as MDX but trying to be more Excel-user-friendly. I’ve not had a chance to go deep into it at all yet but it certainly looks like there’s a lot to learn here. In the meantime, from the tutorial doc I’ve got, here’s an example of a DAX calculation that returns an average of Sales per State:
AVERAGE(‘Total Sales’[SalesAmount]), ALLEXCEPT(‘Total Sales’, ‘Total Sales’[State or Province])
Is this easier than MDX? Will power users be able to write this? I honestly don’t know yet.
You can also control whether DAX calculations are updated automatically or only when you click the ‘Calculate’ button on the ribbon.
- Once this is done it’s back to the worksheet to create a pivot table, and I’ve already blogged about what’s new here. Unlike a SSAS-bound pivot table, however, you can use any column as a measure and choose how it’s aggregated, using either a Sum, Count, Min, Max or Average (data type permitting). From playing around with it a bit more, the new Slicer functionality does really come in useful here. You can flip back and forth between the worksheet and the Gemini UI very easily.
- The ‘Options and Diagnostics’ button doesn’t do much except to allow you to dump the current state of the Gemini cube to a .trc trace file for further analysis.
Overall, there aren’t actually any surprises really. As I said DAX is something I’m going to need to study in a lot more detail; I can’t really comment on the scalability and performance because I’m running on a VM and don’t have a large dataset handy; and I don’t have Sharepoint installed so I haven’t checked out the integration there (which in any case is NDA at the moment). So far I like it; it’s also less of a threat to the kind of SSAS/MDX work I do than I thought it might be – it’s a lot simpler than I’d expected and it doesn’t feel cube-like at all but much more relational.
Some more Gemini demos have appeared on the BI Blog, with more new Gemini features revealed, so let’s step through them and see what we can see…
- 2:45 Nothing much so far we haven’t seen already. However the toolbars are much easier to see in this video and the first thing pointed out is the list of tables loaded into Gemini listed at the bottom of the screen.
- 3:15 We can also see a lot more of the toolbar at the top here too. In the ribbon we can see the following areas:
- New Table, with buttons to import new tables from a database or from the clipboard
- Table Tools, with buttons to create relationships between tables and to manage relationships. So far I’m getting a very strong feeling of relational database concepts coming through – ok if the Gemini user is familiar with them (perhaps through Access), but is it asking too much of a user to think in terms of tables and joins?
- Columns Tools. Can’t see much here, but we saw a bit earlier in the demo that on the far right hand side of the data area it seems you can add new columns onto the end of the table, and the buttons here allow you to manage these columns, delete them, resize them etc.
- Sort and Filter, pretty much self-explanatory
- Calculations. Again can’t see much here, but the button at the top says Manual. I wonder if it’s going to give you the option to either automatically apply all your calculations, or only apply them when you press a button (in case calculation takes a long time).
- View. The options here are Pivot Table and Switch to Excel. I guess the demo is done in some kind of Table view, and we’ll have the option to view the data instead in a pivot table or go to Excel and work with the data there in the way we would with any other external data source.
- 4:17 The now inevitable OOHHH moment in a Gemini demo where we see 20 million rows of data being manipulated in memory. Of course, though, the amount of data we can work with will not only depend on how much memory we have but also how well it can be compressed. From what I understand of COP databases like Gemini, you get great compression because it only stores the distinct values held in each column; but if your data contains a lot of different values then you won’t be able to compress it as much and you won’t be able to work with as much of it. I think.
- 4:46 And not wishing to sound like Mr Sceptical, but watching all these demos of sorting and filtering large amounts of data very quickly raises a question in my mind: are all the rows in the table actually sorted and filtered, or does Gemini just do enough sorting and filtering to fill the screen? Finding the top 30 or so rows out of 20 million based on a value is certainly impressive, but it’s not the same as sorting all those 20 million rows.
- 6:23 The Manage Relationships dialog. Again, very relational and strangely non-visual as well; I’d have expected a graphical representation of the two tables joined, just like you’d get in any other database tool. Maybe it’s not ready yet though.
- 6:55 Looks like our first sight of DAX. The expression is:
Hmm, again seems more like a SQL expression (a sum/inner join) translated to Excel rather than anything resembling MDX. It does the calculation very quickly although it’s the first time something has been less than instant.
- 0:25 We’re in Excel now, using a pivot table, but notice that on the right-hand side we have the ‘Gemini task pane’ so perhaps it’s not a regular pivot table?
- 2:48 Create Relationship dialog. Again it doesn’t seem very graphical, and notice the use of relational database terminology again with the mention of primary keys and foreign keys; for someone who is used to working with databases this is fine, the obvious term to use, but are these concepts we should expect Gemini users to understand? Shouldn’t things be less technical, more user friendly?
- 2:59 Interesting that creating a relationship takes a few seconds and some crunching to do. I wonder what’s going on here exactly? Cube reprocessing?
- 3:43 Show Values As menu option – ok, this is what you get in Excel anyway, but am I right in thinking there are a lot more options here now than are available in 2007? Maybe I’m wrong, but this all seems to be Excel calculations rather than calculations happening in Analysis Services.
- 8:15 The Excel workbook containing this data is 203MB – interesting, because although Gemini is in-memory, it’s clearly possible to persist the data to disk if it’s being stored inside the workbook somehow.
One last point prompted by all the relational database-related terms we’ve seen: if I was a pure SQL Server relational database guy, with no interest in Analysis Services, I’d still like to get my hands on Gemini and use it server side if it’s this quick. Which goes back to a point I’ve made before in the past that if Analysis Services could be used inside SQL Server as an invisible layer to speed up the execution of data warehouse/BI style TSQL queries, in the same way as Oracle OLAP can be, it would be very cool. Just think of that working with Madison, in fact…
On the BI Blog on Monday a new set of Gemini demos were posted; they’re also available on YouTube. They look like the same demos I saw at PASS Europe a few weeks ago and while they don’t show much in the way of different functionality compared to what was shown late last year, I think there are a few interesting points to note. Unfortunately the quality of the picture is so poor you can’t make out much detail on the screen, so I can only really comment on what Donald Farmer (who’s presenting) specifically points out.
Let’s step through each demo and I’ll give you a running commentary on them…
- 0:11 Note that Gemini isn’t built into Excel, it’s an Excel addin. I’m not sure whether it will work in Excel 2007 or only Excel 2010 (or whatever it’s called) but this is significant for another reason: it means that Gemini release cycles are not tied to Office release cycles, so potentially new releases of Gemini can appear reasonably regularly.
- 0:35 We start off with sourcing data from a data warehouse – probably intentionally, to forestall some of the hostility that was seen when Gemini was first announced, when Gemini was seen as being yet another “you don’t need a data warehouse” type tool.
- 1:05 Arggh, why can’t I see the buttons on the Data-Cleaning ribbon? It looks like there’s a lot of stuff there, although it might not be all working properly yet.
- 1:45 The obligatory boast about how much data you can work with – in this case 20 million rows – on a regular desktop machine. In my experience that’s the average number of rows I see in a fact table underneath SSAS (though of course it can handle way more than that), so the number was probably deliberately chosen for that reason, as well of course to get the Excel users out there salivating. Suddenly posts like this seem less funny, more like a chilling prediction of things to come…
- 2:48 You can copy data into Gemini from the clipboard. Note that you don’t seem to be able to link to the data directly in Excel, at least not yet. Donald also mentions that ‘other data sources’ will be supported – it’ll be interesting to see which ones.
- 3:18 Creating a pivot table. We seem to be back in regular Excel here and out of the Gemini addin, although Donald says that ‘in Gemini we have some cool pivot tables we can handle’. Perhaps what we’re seeing here are Excel 2010 pivot tables.
- 3:50 Pointing out the inferred relationship between tables. I suspect this relationship was inferred well before this point; we already know you are going to be able to set these relationships up manually.
- 4:26 Showing data as a percentage of total. There seems to be a big button to do this; are there going to be any other easy calculations available? Where are the calculations taking place, and how are they expressed – in Excel or the underlying Gemini cube?
- 4:50 New slicer bars – mentioned as a ‘new feature in Excel’, specifically for Gemini but also available for other Excel users. So this must be new generic Excel pivot table, rather than Gemini functionality. This looks really good; I like the way they are aware of each other too, and aware of what data is available, though I wonder how exactly they know whether data is available and how this would work with cubes containing calculations etc.
- 0:20 Set theme – whoa, so you can apply a theme to an Excel spreadsheet? Hmm, turns out you can already do this. But it is a powerful feature when you want to create a report.
- 0:30 Publishing to Sharepoint, but notice how Donald mentions that publishing a model containing 20 million rows would take a bit of time. How long exactly? Minutes? Hours?
- 0:40 The Sharepoint report centre. OK, so we can rate reports with stars, yeah that’s going to be useful… but other ‘social tools for collaboration’ might be interesting.
- 1:03 Setting a refresh rate. Basically how often the local cube underneath Gemini gets processed, I suppose. How long will a refresh take though?
- 1:21 Seeing the report in a thin client. This is Excel Services, I think…? This will only make it harder to choose between Excel/Excel Services/Gemini on one hand and SSRS on the other. It would be nice if there was some kind of story linking the two.
- 2:57 The operations dashboard – again, I wish I could see more detail of what’s on screen. I can see some of the stuff you’d expect, like metrics on CPU usage and query response times. It’s all done in Excel Services again – I wonder if there’s a cube behind it all storing the performance data?
- 3:46 Upgrading and formalising a popular app. But notice that the option is ‘upgrade to PerformancePoint’…? The focus is on upgrading for better maintenance and management rather than performance; I guess in PerformancePoint you’ve got IT control over the report design. Possibly, when server-side SSAS gets the Gemini storage engine, you’ll be able to push the Gemini cube into an instance of SSAS. But when you’ve done this will you still be able to use the performance metrics dashboard we’ve just seen?
I’ve already linked to Nigel Pendse’s initial comments on Gemini, which you can see here:
However those nice people at the OLAP Report have just given me access to see the subscriber-only content, which is probably the most detailed write-up available (I assume Nigel has been briefed by the Gemini dev team):
It doesn’t say anything much new, but I guess if you’re someone like Qlikview you probably want to get as many details on this as you can! Nigel is much more positive about it all than I was; he may well have a better idea of MS’s proposed solutions to the management problems everyone’s been highlighting: "Microsoft has some clear ideas about the role IT will be able to play in Gemini deployments, but the details were slightly fuzzy during the October announcements. It promises more details will emerge in the following months".
He also notes the problem of Excel users needing to define multidimensional calculations and says "Microsoft has not provided details of how this will be done, except to emphasize that Gemini users will not need to learn MDX. It’s proposal is to allow simple dimensional tests in Excel-like formulas in Gemini. These Excel-like calculation rules will be equivalent to MDX, but far easier for an Excel user to understand". Hmm, we’ve already seen MDX and PEL try and fail to make multi-dimensional calculations easy to use… I wonder if we’ll have a third stab at the problem?
I saw a very interesting article the other day in Intelligent Enterprise by Seth Grimes, about a newly-published report on the semantic web by David Provost. Grimes is rightly sceptical about how close we are to these ideas reaching fruition and notes that many of the companies mentioned in the report are concentrating on ‘semantic data integration’. Frankly, to me the idea of being able to integrate data sourced from different parts of the web is still far-fetched, given the trials you have to go through to integrate data from different parts of the same company. But it did get me thinking: when users get Gemini, where will they get their data from? Yes, they’ll be downloading data on ‘industry trends’ etc from the web in the way we saw in the Gemini demo, but it won’t be that often. In well-run companies most of the time the data will come from four sources:
- The data warehouse, either from the relational source or Analysis Services.
- OLTP systems
- Data that lives in someone’s small, well-maintained, official, IT-department tolerated Excel spreadsheet. The kind of spreadsheet that couldn’t and shouldn’t be promoted to database form because it’s too small, or short-lived, or needs to be maintained by non-technical people, or needs the complete flexibility that Excel gives you. In fact, exactly the kind of thing that Excel is meant to be used for.
- Data that originally came from the data warehouse but was downloaded into someone’s local Access database, or exported to Excel, or sent to the user in something like a SSRS report; so data that has come to the user second hand.
How, then, can Gemini know (beyond its cleverness with column names and analysis of the data within those columns) what data can be integrated with what? Ideally it would have access to some form of metadata embedded in the source data that would help it make the correct decision all in all cases. Where this metadata comes from is a problem much larger than Gemini of course, and goes down to the fundamental question of how an enterprise can keep track of all of its data assets wherever they’re stored and understand what the data in each data store actually means; Microsoft is regularly criticised for its lack of a metadata tool and I guess MS is working on something in this area. If, in the first and second scenarios above, Gemini could connect to SQL Server or Analysis Services and see a common layer of metadata that would help it out, allowing it to join data from SQL Server with data from Analysis Services for instance. In a way, Analysis Services is already a metadata layer on top of the data warehouse, containing information on how tables need to be joined and how measures should be aggregated; perhaps this side of it will become more important as the MOLAP engine is superceded?
I also think some of the technologies of the semantic web, when applied to the enterprise, could be very useful here; I’ve only really just come across this stuff myself, but as an introduction I found the one-page explanation of RDF on Twine was very good, and the Microformats site was also full of interesting information. One of the companies mentioned in the semantic web report is Cambridge Semantics, whose product Anzo for Excel is aimed at imposing structure on all of those Excel spreadsheets I referred to in the third bullet point above. The demo on the web concentrates on using the tool for sharing data and collaboration, but as far as I can see the key to getting that to work is selecting data in the spreadsheet and linking it back to a common metadata layer. Of course the obvious criticism of a product like this is that you’re again relying on your users to make the effort to mark up their data and do so properly, but if there’s an incentive in the form of easier collaboration and – to put it bluntly – less of that boring cutting and pasting then maybe there’s a chance they could be persuaded to do so. I could imagine MS offering something very similar to this as part of Excel, with all the central management being done through Sharepoint, and extended throughout the Office suite to Access, Word etc. Gemini would then be able to do a better job of understanding what and how data in an Excel spreadsheet, say, could be integrated with data in the data warehouse.
So the bottom-up approach could be combined with the more traditional top-down metadata management approach, and crucially I’d want to see metadata automatically embedded in various the output formats of the server products. For example when you built a SSRS report or created an Analysis Services pivot table in Excel, in both cases I’d want the data by default to retain some record of what it was and where it had come from – the metadata would embedded in the document in the same way as if the user had marked up the document themselves. And this in turn would make it a lot more easily reusable and comprehensible by Gemini and other applications such as Enterprise Search. Metadata would accompany data as it travelled from document to document, data store to data store, format to format. This would then cover the scenario in the third bullet point, allowing you to integrate data from an SSRS report with data from the data warehouse it originally came from, or data from a user-generated Excel spreadsheet that had been marked up manually.
So much for structured data; ideally we’d want to be able to include unstructured data too. Some of the applications of natural language processing mentioned in the semantic web report looked very interesting, especially OpenCalais (which already has integration with MOSS 2007 I see). If you were looking at sales figures for a particular customer in Gemini, wouldn’t you also want to be able to look for documents and web pages that discussed sales for that customer too? And I’ve often thought that while the super-simple type-a-search-term approach for Search has worked brilliantly over the last few years, there’s a niche for a power-user interface for search too, kind of like a Proclarity Desktop for search where you could drag and drop combinations of terms from the central metadata repository and see what you found; could that be Gemini too? A tool for searching, integrating, aggregating and manipulating all enterprise data, not just numeric data?
Over the last few days I’ve seen the Gemini team (notably Amir) engaging with bloggers like me about our concerns through comments on our postings. I appreciate that, but my position hasn’t changed: I think the technology is cool but there’s too great a risk that it will be misused as it stands at the moment. Relying on oversight from the IT department isn’t enough; if MS had a convincing metadata story extending to all data types and data sources, metadata that Gemini could use, it would go a long way to addressing my concerns.