Excel 2010 and Windows HPC Server 2008 R2… and PowerPivot?
Something I first heard about last year was the ability to run Excel calculations in parallel on a Windows HPC cluster; it’s been in the news again recently, with the release of Windows HPC Server 2008 R2:
I did some digging and found the following white papers that give more details on how exactly you can do this:
Accelerating Excel 2010 with Windows HPC Server 2008 R2: Technical Overview
Accelerating Excel 2010 with Windows HPC Server 2008 R2: Building VBA applications and workbooks for a Windows HPC Cluster
Here’s a summary of what you can do, taken from the first paper above:
As models grow larger and workbooks become more complex, the value of the information generated increases. However, more complex workbooks also require more time to calculate. For complex analyses, it is not uncommon for users to spend hours, days, or even weeks completing such complex workbooks. The problem this white paper addresses is how organizations can reduce the calculation time required for long-running workbooks to give users faster access to business-critical information.
One solution is to use Windows® HPC Server 2008 R2 to scale Office Excel 2010 calculations across multiple nodes in a Windows high-performance computing (HPC) cluster in parallel. This paper presents three methods for running Office Excel 2010 calculations in a Windows HPC Server 2008 R2 based cluster: running Office Excel 2010 workbooks in a cluster, running Office Excel 2010 user-defined functions (UDFs) in a cluster; and using Office Excel 2010 as a cluster service-oriented architecture (SOA) client.
Windows HPC Server 2008 R2 now enables running multiple instances of Office Excel 2010 in a Windows HPC cluster, where each instance is running an independent calculation or iteration of the same workbook with a different dataset or parameters. This solution allows near-linear performance increases for iterative spreadsheets, such as those running a Monte Carlo algorithm.
Running Office Excel 2010 UDFs in a cluster is a new ability of Office Excel 2010 for running complex or time-consuming UDFs—functions contained in Excel link libraries (XLLs)—in a Windows HPC Server 2008 R2–based cluster. If a workbook includes long-running UDFs, moving calculations to the cluster can result in significant performance improvements.
Using the Windows HPC Pack software development kit (SDK), Office Excel 2010 can function as a cluster SOA client to run complex and time-consuming calculations across a set of servers in a Windows HPC cluster. Any Microsoft .NET or Component Object Model (COM) application can use the Windows HPC Pack SDK: This paper uses Microsoft Visual Studio® Tools for Office (VSTO) to construct an Office Excel 2010 add-in that connects to the cluster as a cluster SOA client.
From a purely Excel point of view, this is very interesting – and it’s no wonder that Excel power users everywhere are drooling over it. The obvious next question is: can I use PowerPivot with this as well? From reading the papers I don’t see why not. I’m not talking about simply taking a pivot table connected to a single PowerPivot model and somehow making it all run faster – I don’t think that would work (yet?) – but I can imagine a scenario where you used VBA and the Excel cube functions, which in turn got the value of DAX calculated measures in PowerPivot models distributed over a cluster, as part of a larger solution for example. Now how cool would that be? Sadly I don’t have a HPC cluster to test this on, but if anyone does have one and has tried this, please let me know…!