Book Review: The Microsoft Data Warehouse Toolkit, by Joy Mundy and Warren Thornthwaite
At long last I’ve got round to reviewing the last big SQL2005 BI book sitting on my bookshelf: "The Microsoft Data Warehouse Toolkit" by Joy Mundy and Warren Thornthwaite. It’s another very positive review too; although I should declare that I got the book as a freebie, hopefully you’ll believe me when I say I’m not biased by this and if anyone cares to send me a copy of a rubbish book on this topic I’ll be only too happy to slag it off in public!
Let me start by saying a word about the positioning of this book. If you’ve already got other Kimball Group classics like ‘The Data Warehouse Toolkit’ you may be worried about the overlap here; similarly if you’ve got a book like Teo Lachev’s ‘Applied Analysis Services’ you may be thinking that you don’t need another one like it. In my opinion ‘The Microsoft Data Warehouse Toolkit’ sits squarely between the two camps: it’s all about how you apply Kimball methodology to a data warehouse/BI project using the Microsoft platform, and while there is a certain amount of shared ground with the two books I’ve just mentioned I think there’s more than enough valuable information in here that you won’t find anywhere else to make this a worthwhile purchase. It means you no longer need to do as much work joining the dots between how Kimball et al tell you to design your system and what Lachev et al say you can actually do with the tools you’ve got at your disposal.
Let me give you two examples of the kind of issues it deals with. The chapter on real-time data has the most level-headed discussion of this subject that I’ve read, telling you what it means, when you do really need real-time data and when you don’t, as well as telling you how to design a real-time system and making some important technical points about the strengths and weaknesses of Analysis Services. Similarly there’s a chapter on the unsexy and usually neglected topic of operations and maintenance, telling you things like what you need to monitor (disk space, usage etc) and roughly how you need to go about doing all this. Again all the important technical points are made but there’s not too much technical detail – a good balance is struck between this and the higher-level design aspects. This also makes it as good a read for project managers as it is for architects and developers.
Another good thing about the book is the way it is structured. Although you can read each chapter on its own out of context, the book discusses issues in the same order as you’d encounter them in a project. So the first chapter covers defining business requirements, and we then move on to designing the dimensional model, building the relational data warehouse, ETL, Analysis Services, Reporting Services and so on. This means that the project manager or anyone new to BI has a clear list of the tasks that need to be undertaken on this kind of project and can plan ahead more effectively, and serves as an important checklist for people like me to tend to get carried away with the more interesting jobs to the detriment of the duller stuff. The breadth too is impressive, and since no-one can be an expert in every part of the SQL2005 toolset it’s useful to have a reference which can help fill the gaps in your knowledge.
If there’s one criticism I could make (I always try to make at least one) it’s that it’s almost too early in the lifecycle of SQL2005 to be able to write authoritatively on even high-level design. It’s not something that could have been avoided though: I learnt when co-writing ‘MDX Solutions’ that books written about new software have sales cycles, and that if you don’t get your book out within a few months of the release of the software you’re writing about then booksellers won’t place big orders for it, however good it is, so as an author you can’t afford to wait a year or so until you feel you know the product completely. Take the example of the recent about-turn on cubes whether you should design one cube with multiple measure groups or multiple cubes linked together that Teo blogged about recently here. This is a fundamental design decision and on P322 the authors state that "The best practice in AS2005 is to define a single cube for a database", which is exactly what I would have said up to a few weeks ago but which now seems to be wrong. The point here is that no-one could have known about this early enough to put it in a book – these things only emerge after months or years of experience with real implementations. Joy Mundy is clearly well ahead of me and just about everyone else on this issue though, as Deepak pointed out to me recently that in her recent webcast "Designing a Scalable Data Warehouse/ Business Intelligence System" that one of her bullets reads "Create several smaller cubes with related measure groups, rather than one big cube per database". Perhaps the advantage of publishing early is that it gives you the chance of writing a second edition at some point in the future.
But I digress. The vast amount of experience that the authors have got in designing BI systems is apparent throughout this book and I can wholeheartedly recommend it. The book’s websites can be found here if you want to find out more:
…and the second site in particular contains some useful links and downloads which are worth checking out even if you don’t buy the book.
If you’re interested in other SQL2005 BI books, then check out my book list here:
The only book on there that isn’t published and that I really want to read is "Microsoft SQL Server Analysis Services 2005" by Irina Gorbach and various other members of the dev team – I’ve heard it’s got good information on the internals of AS2005 that isn’t available anywhere else.