New modules are being added to my MDX online training course all the time, and now there’s another free video available: a short introduction to the CurrentMember function. You can view it here (registration required):
The previous free video on MDX SELECT statements is also still available here, now with no registration required. If you’d like to subscribe to this course you can get a 10% discount by using the code TECHNITRAIN2014 when registering.
Don’t forget, if you are looking for classroom training in London for MDX, DAX, Analysis Services Multidimensional or Tabular, Power BI, Integration Services or the SQL Server engine check out http://www.technitrain.com/courses.php
The recording of my session, “Power Query: Beyond The Basics”, from yesterday’s 24 Hours of PASS event is already available to watch here:
It’s a brief introduction to the M language that’s used in Power Query and shows some of the things you can do with it. You can download the slides and the demo material (a lot of which I didn’t have time to cover) from here:
All of the other sessions from the event can be found here:
One of the things that has intrigued me in the Power Query docs for a while now is the Expression.Evaluate() function, which takes some text and evaluates it as a Power Query expression (rather like good old StrToSet() in MDX). I can think of lots of fun things you can do with this, but here’s one very cool example: it allows you to store the M code for your Power Query query in a text file outside your Excel workbook.
Here’s an example. Consider the following Excel workbook, which has a table named FruitSales in it:
I can use this table as a source for the following simple Power Query query as follows:
Here’s the output:
No surprises so far. Next, I copy the code for the Power Query query above and paste it into a text file; in my case I’ve saved my file at C:\PowerQueryQueries\DynamicQuery.txt
Back in Excel, I can now load the code stored in this text file in a new query using the following code:
The end result is exactly the same as in the previous example, except that in this case the code to read the data from the table and to aggregate it is loaded from the text file and is not stored inside the workbook.
The Source step here is fairly straightforward – it just loads text from a file into Power Query. It’s the next step where the magic takes place: as I said, Expression.Evaluate() takes the text and evaluates it as an expression, but it’s the second parameter which defines the environment that the expression evaluates in that seems to be the key to making this work. The chapter on “Sections” in the Power Query Formula Language specification document has a little bit more explanation of what environments are but I have to admit I’m not 100% clear on how all this ties in to M as it is implemented in Power Query today.
For those of you who can’t afford a Power BI subscription, this technique allows you to share queries between multiple workbooks without a Power BI site. In fact it has some advantages over sharing a query in Power BI because it always reads the definition of the query from the file, and so it will always use the latest version of your query – in Power BI, when you use a shared query you take a copy of the query and it is not updated even if the original author shares a newer version.
One new feature of SQL Server PDW 2012 that hasn’t had the attention it deserves is the fact that it is now officially supported as a data source for Analysis Services, both Multidimensional (in ROLAP and MOLAP modes) and Tabular (in In-Memory and DirectQuery modes). If you are working with extremely large data volumes in SSAS then PDW might be something you want to do some research on. For SSAS Multidimensional in MOLAP mode or Tabular models in In-Memory mode, using PDW as a data source should make processing run extremely quickly. For SSAS Multidimensional in ROLAP mode or Tabular models in DirectQuery mode, it can give you interactive query access to data volumes that MOLAP/In-Memory simply couldn’t handle (remember though that DirectQuery only works with DAX queries, so Excel PivotTables don’t work with it, only Power View).
There are a few public sources of information on PDW/SSAS integration. One is the white paper on PDW that you can download from Henk van der Valk’s blog here:
Here’s the relevant section:
New in SQL Server 2012 PDW, you can use PDW as a high performance relational data source for building multidimensional OR tabular models with SQL Server Analysis Services (SSAS). For example, you can:
· Use DirectQuery to perform real-time queries against a SQL Server PDW data source from a tabular model.
· Reduce ROLAP query times by using clustered columnstore indexes on SQL Server PDW tables.
· Use new features of SSAS. For example, use EnableRolapDistinctCountOnDataSource to run distinct count operations on PDW tables.
· Use PDW as a data warehouse for aggregating Hadoop data for use in multidimensional or tabular models.
The DirectQuery page in Books Online says similar things:
In contrast, a tabular model in DirectQuery mode uses data that is stored in a SQL Server database, or in a SQL Server PDW data warehouse. At design time, you import all or a small sample of the data into the cache and build your model as usual. When you are ready to deploy the model, you change the operating mode to DirectQuery. After you change the operating mode, any queries against the model will use the specified relational data source (either SQL Server or SQL Server PDW), not the cached data.
DirectQuery can take advantage of provider-side query acceleration, such as that provided by xVelocity memory optimized column indexes. xVelocity columnstore indexes are provided in both SQL Server 2012 and SQL Server PDW, to support improved DirectQuery performance.
It’s a shame there isn’t more information out there though. This post from Michael Mukovskiy has some interesting findings on using PDW v1 as a data source for SSAS:
…presumably PDW 2012 would give even better results.
Voting has closed in the Power BI competition, and I just wanted to say thank you to everyone who supported me. I ended up in 6th place with 681 votes and 4812 votes – so I made it to the top ten, and I’m very pleased with that result.
There are too many people to thank individually, but I would like to single out Carmel Gunn, Jen Stirrup, and the whole of the UK, Irish, Belgian and French SQL Server communities, as well as the (literally) hundreds of people I contacted to ask for their support. Special thanks is reserved for my long-suffering wife Helen. I am extremely grateful, and I promise I won’t be bothering you like this ever again!
I am very much aware that the number of votes that I or anyone else received doesn’t necessarily correspond to the quality of the entry. There are a number of other entries that I thought were great that didn’t get any attention and I recommend you check them out while you still can here.
In case the Power BI competition site disappears at some point in the future, you can see my entry on YouTube here. If you want to see the workbook I created for the demos, that can be downloaded here; I’ll be doing a session on March 27th for the PASS Business Analytics Virtual Chapter showing in a lot more detail how I built it, exploring the data more thoroughly, and covering topics like Q&A that I wasn’t able to include in the original video.
A few weeks ago I wrote a post about comparing the contents of entire tables in Power Query, and a question came up in the comments about how you might go about comparing values in columns rather than whole tables. Of course this prompted me to investigate how different types of comparison might be done – and here’s the blog post with the results of the investigation.
Consider the following two single-column tables in an Excel worksheet:
Which items are present in one column and not in the other? Which are present in both? The easiest way to answer these questions is to take each table and turn it into a List object (using Table.ToList() ); once you’ve done that you’ll find there are loads of really useful functions for this type of thing. Here’s a query that compares the values in each column:
Here’s the output:
Fairly self-explanatory, I think. List.Difference() finds the items that are in one list and not another: List.Intersect() finds items that are in both. In fact it’s probably more interesting to look at how I’ve generated the output. Table.FromRows() returns a manually constructed table. The CombineWithComma step uses Combine.CombineTextByDelimiter() to return a function that turns all of the items in a list into a single, comma-delimited piece of text, and I then use that function inside each row of the table I’m returning to get a readable version of what List.Difference() and List.Intersect() return.
Rather than looking at the distinct values in each column, though, you might want to do a row-by-row comparison. Here’s another query that does that:
Here’s the output:
Again I’m turning each table into a list, and then I’m using List.Positions() to generate a list of integer values from 0 to 9 representing the index of each item in the source list, then using List.Transform() to iterate over each item in this list and compare the values at the given index in the source and target list.
Frankly, an even easier way of doing this might have been to import both tables in separate queries, add an index column to both of them using the Insert Index Column button, then join the two tables together on the index column using the Merge button and then finally create some custom columns to do the comparison. This is certainly how any end-user would do it, but the resulting code is a bit less elegant I didn’t learn anything interesting about M from doing it that way. I’ve left the example in the demo workbook, which you can download here.
PS Even if you have voted for me already in the Power BI competition, please vote for me again (you can vote once every 24 hours)! Here’s the link:
As I mentioned the other day, the public voting round of the Power BI competition has now started and the entrants are now competing for a place in the top ten and a whole bunch of cool prizes including XBoxes and Surface Pros. Having entered a demo myself I’m naturally very keen to see how I’m doing, but unfortunately doing this is a bit of a pain – you have to go to www.powerbicontest.com, click on the Entries tab, sort them, find your entry, and so on… So I thought to myself, isn’t there a better way? And of course there is… using Power BI!
Power Query is of course the tool to use to scrape the voting data from the Power BI site. It wasn’t straightforward to do but on the other hand it wasn’t impossible: there’s no single table of results, and indeed the results are spread over four different pages. To load the data into the Excel Data Model I:
- Created a Power Query function to scrape the data from the first page
- First I created a query to scrape the data from the first page using the “From Web” button
- I then did a *lot* of searching around in the HTML to get the data (for this reason I’m not going to paste the code of the function here because it’s ugly)
- I then deleted a lot of columns, created some new custom columns, and pivoted the data until in was in a nice tabular form
- Finally, once I was sure the query was working properly I turned it into a function that could get data from any page
- Next I created a table with five rows in it, called the function five times, once for each row, to get the data from each page
- Then I merged all the data into a single table of results
- I created another function to calculate the rank of an entry (which itself was an interesting challenge) and added a rank column to the merged table
- Last of all, I created some Power View sheets to analyse the data
You can download the entire workbook here. I’ll warn you, the code isn’t pretty but it does the job and it’s got a few interesting features in for all you Power Query fans.
Now let’s have a look at what the data shows.
At the time of writing this I’m currently in 16th place – and not in line for an XBox alas. You can of course change this by
and that’s even if you’ve voted for me already – you can vote once every 24 hours! But I digress…
The obvious first thing to do was to create a leader board sorted by rank:
As you can see, at the moment Carlos Costa is way out in front with a massive 180 votes; after that there’s a group of people fighting for a top five finish; and after that there’s a large group of people who might just squeeze into the lower reaches of the top ten.
Here’s a column chart showing this more clearly:
What are the top-ranked guys doing to get so many votes? Is it something to do with the number of people seeing their entry? If you put votes and views next to each other, it’s not easy to say what’s going on:
I’ve got the second largest number of views, but I’m in the middle of the pack. The problem for me is that I was one of the first people to enter and most of my views were well before voting opened; unfortunately there’s no way of knowing how many views each entry had after voting opened so I can’t say for sure what’s going on here. Certainly Alexander Pinkus submitted his demo “Few Facts About Dinosaurs” around the same time as me, has around the same number of views, and he’s currently ranked number 5. From that I can only deduce that his demo is better than mine (and having watched it I can say it is very good); he certainly has more exciting subject matter. Who doesn’t love dinosaurs? There are several other demos that have got particularly eye-catching subject matter and/or titles and they’re also doing well.
Here are views and votes plotted on a scatter chart, with the top 10, 20, 30 and so on shown in different colours:
I think this makes it a little easier to disregard outliers like me, and it’s probably fair to say that there is some kind of link between views and votes, at least for those in the top 20.
Now here’s one last column graph, showing the average number of views per vote for the top 20 ranked entrants:
You can see how badly I’m doing in this respect, despite my begging and pleading both here and on Twitter; conversely you can see that Mike Tetreault at the other end is obviously very good at getting his vote out.
I would like to say at this point that it’s tempting to moan and complain that people have ‘cheated’ and got their friends, family and colleagues to vote for them. To be honest, I think that everyone who’s got any significant number of votes will have ‘cheated’ like this to some extent. I certainly have – I’ve used my blog and other social media to try to get as many votes as possible. Indeed this post itself is a ruse to try to get more votes! At the end of the day the whole point of this competition is for Microsoft to get as many people as possible to see Power BI, so more people want to buy it. Therefore this mad scramble by entrants to get as many votes as possible will benefit all of us in the end.
Anyway, by the time you read this post there’s a strong chance that the patterns here will have changed completely. It’s a shame there isn’t more data available to play with – it would be great to have the time and date of each vote cast, and even the location of the person casting the vote. Given that you have to have a Facebook account to vote I suspect that someone somewhere does have all of this data, and more… so I wonder if they’re using Power BI too?