Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Calculating The Value Of Overdue Invoices Using Many-To-Many Relationships in SSAS Multidimensional

with 11 comments

Recently I had to solve the same, rather complex, problem for three different customers within the space of two weeks, a variation of the ‘events-in-progress’ problem I’ve blogged about a lot in the past. It’s this one: how can you calculate the value of your overdue invoices on any given date? It’s important to stress that we are not talking about the invoices that are overdue today – we want to be able to pick any date in the past and find out the value of invoices that were overdue at that point in time.

Let’s imagine you have a fact table containing payments against invoices: each row has an invoice number, the date of the payment, the payment amount, and the date that the invoice is due. A positive value in the Amount column indicates that this is the opening amount of the invoice; negative values in the Amount column are payments against the invoice. Multiple payments can be made against an invoice before it is fully closed. Here’s some example data:

image

Looking at the rows highlighted for invoice 5, you can see in the upper box that there is a positive amount of £35 shown on January 1st 2014 – this is the date that the invoice was opened, and £35 is the full value of the invoice. The invoice is due on January 10th 2014. In the lower box you can see there were four separate payments of £5, £5, £10 and £15 before the invoice was fully paid off on January 12th 2014.

Now, if you were to do a total-to-date in MDX (similar to what I describe here) it would be pretty easy to calculate the outstanding amount on all invoices on any given date. However the requirement here is not only to do that, but to break the value down so that you can see what the total value for overdue invoices and non-overdue (which I’m going to call backlog in this post) invoices is on any given date. This is tricky to do because we have to take the Due Date of each invoice into account as well as the Payment Date. The desired output for invoice 5 is this:

image

As you can see in this PivotTable, for invoice 5 £25 was still outstanding on January 9th 2014, but this is shown as backlog because this is before the due date of January 10th. On January 11th, one day after the due date, the remaining outstanding amount of £15 is shown as overdue. Of course, we also want to be able to calculate the correct values for all invoices:

image

One way of solving this problem would be to calculate the overdue and backlog values for each date that each invoice is open in your ETL, and store these values in a snapshot fact table. This works, and will give you the best possible query performance, but it has two major drawbacks: first, it makes your ETL much slower, and second it makes your fact table much larger. This post describes how you can calculate the overdue and non-overdue values on any given date using many-to-many relationships instead, without having to blow out the fact table.

To go along with the fact table (called FactInvoice) shown above, I have a date dimension table called DimDate (I have deliberately reduced the number of rows here to the dates I have data for, for reasons that will become clear soon):

image

I have a dimension table containing all of the days that an invoice can be overdue or not overdue for (which has one less than double the number of rows as the date dimension table), plus a second column classifying each row as ‘Backlog’ or ‘Overdue’:

image

… and an invoice dimension table that just contains the distinct invoice numbers called DimInvoice.

Now, let me explain how to build the SSAS cube.

Step 1

  • Build dimensions from all of the tables shown above, so you have dimensions called Date, Overdue Days and Invoice.
  • Build a cube with one measure group, based on FactInvoice, and create one measure with AggregateFunction Sum based on the Amount column in that table.
  • Add the Invoice dimension to the cube with a regular relationship. Add the Date dimension to the cube twice, as Payment Date and Due Date, with regular relationships on the PaymentDate and DueDate columns. The Dimension Usage tab should look like this:

image

Step 2

  • Create the following view in SQL Server:
  • CREATE VIEW [dbo].[FactDateToPaymentDate]
    AS
    SELECT        a.DateKey, b.DateKey AS PaymentDateKey
    FROM            dbo.DimDate AS a INNER JOIN
                             dbo.DimDate AS b ON a.DateKey >= b.DateKey

    This view returns all of the combinations of a given date and all dates up to and including the date.

  • Add this view to the DSV and create a new measure group from it; you’ll need to create a measure here, but it can just be a Count measure.
  • Add the Date dimension to the cube once again, this time leaving the name as Date (you will now have three role-playing copies of the Date dimension in the cube).
  • Set up regular relationships between the Date and Payment Date dimensions and the new measure group, then a many-to-many relationship between Date and the Invoice measure group. This means that when you query the cube by the Date dimension, the many-to-many relationship will mean you see the sum of all Amounts whose payment date is up to and including the date selected. The Dimension Usage tab will look like this:

image

Step 4

  • Create another view in SQL Server with this definition:
  • CREATE VIEW [dbo].[FactDueDateToAsOfDate]
    AS
    SELECT        a.DateKey AS DueDateKey, b.DateKey AS AsOfDateKey, DATEDIFF(dd, a.FullDate, b.FullDate) AS OverDueDays
    FROM            dbo.DimDate AS a CROSS JOIN
                             dbo.DimDate AS b

    Yes, I am cross joining the DimDate table with itself and yes, this could return a lot of rows. However you should find that the view is very fast to execute.

  • Add this view to the DSV and build another measure group from it, again with a single Count measure on it.
  • Set up a regular relationship between this new measure group and the Due Date dimension.
  • Add the Overdue Days dimension to the cube, give it a regular relationship with the new measure group and a many-to-many relationship with the Invoice measure group through the new measure group.
  • Add the Date dimension to the cube yet again, creating a new role-playing dimension called As Of Date, give it a regular relationship with the new measure group and a many-to-many relationship with the Invoice measure group through the new measure group. The Dimension Usage tab will now look like this:

image

The purpose of this measure group is this: if you select a date on the As Of Date dimension, you will be able to select ‘Overdue’ on the Overdue Days dimension and this will give you all of the dates on Due Date that were overdue on that date.

Step 5

  • You only want to have to select one date in your PivotTable, so create the following MDX calculated measures that take your selection on the Date dimension and applies it to the As Of Date dimension too:

CREATE MEMBER CURRENTCUBE.MEASURES.[Open Amount Hidden] AS
([Measures].[Amount],
LINKMEMBER([Date].[Date].CURRENTMEMBER, [As Of Date].[Date])
), VISIBLE=FALSE;

CREATE MEMBER CURRENTCUBE.MEASURES.[Open Amount] AS
IIF(MEASURES.[Open Amount Hidden]=0, NULL, MEASURES.[Open Amount Hidden]);

  • Hide all measures apart from the calculated measures you’ve just created
  • Hide the Payment Date, Due Date and As Of Date dimensions

Conclusion

This is a very complex pattern, I know, and this is after I’ve simplified it a lot (if you need currency conversion as well then things get even worse) but I also know it’s extremely useful from a business point of view. Query performance is also reasonably good, at least in the places where I have implemented this.

You can download my sample SQL Server database and VS 2012 project here.

Written by Chris Webb

July 28, 2014 at 10:00 am

11 Responses

Subscribe to comments with RSS.

  1. […] If you like challenges, roll up your sleeves and get to it. Come back and show us what you have Calculating The Value Of Overdue Invoices Using Many-To-Many Relationships in SSAS Multidimensional … sample file can be found here […]

  2. Chris,

    I remember working on an AR aging bucket problem a while back and if I’m not mistaken it seems to be similar to your challenge here. In most (if not all) financial reporting project I have worked on, the finance folks tend to run reports based on when a transaction posted into their ERP system since transactions that haven’t posted yet really don’t have much value in financial analysis. I tend to use the posing date of transactions as the default “Date” dimension for the cubes and thus the default date used in point in time analysis.

    With this in mind, I created two more dimensions in the cube (role paying dimensions). Payment Due Date and Payment Clearance Date (that are hidden from the cube user). Due dates for the original invoice posting are usually calculated based on the document date and the terms of payment. This information is usually available in the financial ERP systems are relatively easy to calculate during ETL. The original invoice posting (not payment yet) will have a Payment Clearance Date of something like 01/01/1900 just to make sure it has a date (or you can take the UNKNOWNMEMBER approach too).

    Since the payment postings have same invoice number as the original transactions, you will have all three sets of data for the payment transaction (posting date of the payment, due date of the original invoice and of course the payment clearance date). With this data, I’m able to introduce a new dimension – AR Bucket with no relationships to any measure group then define the relationship using MDS (just like the Business Intelligence wizard does with time series calculations. This dimension will derive values from a table that stores all the values of the AR Bucket – Current, 1 – 30, 31 – 60, 61 – 90, 91 – 120, 121+ and of course Default (meaningless but crucial for cases when the AR Bucket dimension is not selected in a report).

    With this approach you will be able to avoid a many to many relationship (executed on the fly and slower in nature) and if you create a custom aggregation on Date, Payment Due Date and Payment Clearance Date attributes, you can get very good performance.

    Below is sample MDX code that I used for the calculation of the “Current” and “1 – 30″ AR Bucket values. Note that the key for my date dimension (physical dimension used by all role playing dimension) is an integer in the format – yyyymmdd. This way it’s easy to use the STRTOMEMBER function on other date dimensions based on what the user selected on the default dimension.

    /*———————————————
    Fiscal Year – Current AR Bucket
    */———————————————

    SCOPE
    (
    [AR Bucket].[AR Bucket].[Current]
    );

    THIS =

    IIF(
    ISEMPTY([AR Bucket].[AR Bucket].[Default]),
    NULL,
    AGGREGATE(
    {
    [AR Bucket].[AR Bucket].[Default]
    } *
    NONEMPTYCROSSJOIN(
    {
    NULL : CLOSINGPERIOD([Date].[Fiscal].[Day], [Date].[Fiscal].CurrentMember)
    },
    {
    STRTOMEMBER(
    ‘[Payment Due Date].[Fiscal].[Day].&[‘ + CLOSINGPERIOD([Date].[Fiscal].[Day], [Date].[Fiscal].CurrentMember).Member_Key + ‘]’
    ) : NULL
    },
    {
    STRTOMEMBER(
    ‘[Payment Clearance Date].[Fiscal].[Day].&[‘ + CLOSINGPERIOD([Date].[Fiscal].[Day], [Date].[Fiscal].CurrentMember).Member_Key + ‘]’
    ).NextMember : NULL,
    [Payment Clearance Date].[Fiscal].[Day].&[19000101]
    }
    )
    )
    );

    END SCOPE;

    /*———————————————
    Fiscal Year – 1 – 30 AR Bucket
    */———————————————

    SCOPE
    (
    [AR Bucket].[AR Bucket].[1 - 30]
    );

    THIS =

    IIF(
    ISEMPTY([AR Bucket].[AR Bucket].[Default]),
    NULL,
    AGGREGATE(
    {
    [AR Bucket].[AR Bucket].[Default]
    } *
    NONEMPTYCROSSJOIN(
    {
    NULL : CLOSINGPERIOD([Date].[Fiscal].[Day], [Date].[Fiscal].CurrentMember)
    },
    {
    STRTOMEMBER(
    ‘[Payment Due Date].[Fiscal].[Day].&[‘ + CLOSINGPERIOD([Date].[Fiscal].[Day], [Date].[Fiscal].CurrentMember).Member_Key + ‘]’
    ).Lag(30) :
    STRTOMEMBER(
    ‘[Payment Due Date].[Fiscal].[Day].&[‘ + CLOSINGPERIOD([Date].[Fiscal].[Day], [Date].[Fiscal].CurrentMember).Member_Key + ‘]’
    ).Lag(1)
    },
    {
    STRTOMEMBER(
    ‘[Payment Clearance Date].[Fiscal].[Day].&[‘ + CLOSINGPERIOD([Date].[Fiscal].[Day], [Date].[Fiscal].CurrentMember).Member_Key + ‘]’
    ).NextMember : NULL,
    [Payment Clearance Date].[Fiscal].[Day].&[19000101]
    }
    )
    )
    );

    END SCOPE;

    James Mburu

    July 30, 2014 at 7:42 pm

    • Thanks James, yes, this is a different solution to the same problem. However I think the m2m will perform better than a pure MDX-based solution. I’ve used this kind of approach for creating buckets before and never been completely satisfied with it – especially when there are other calculations, such as currency conversion, in play.

      BTW you shouldn’t be using NonEmptyCrossjoin() – it’s deprecated. See http://msdn.microsoft.com/en-us/library/ms144797.aspx

      Chris Webb

      July 30, 2014 at 9:04 pm

      • I agree pure MDX solutions can be significantly slower but if you look closer to my solution, I’m not really doing much dynamically. I’m only using STRTOMEMBER to map the date selected to slice the role playing dimensions accordingly. Most of these functions come return a single member to be used in the simplest set operation (colon operator – “:”). Additionally each calculation can easily utilize custom attribute level aggregates based on usage (reports) e.g. month + product category + sales region + etc. The issue with Many to Many is that every query will have to utilize the grain level attributes for the bridge table at run time (required to speed up queries that hit many to many related dimensions) regardless of whether the respective key attributes in the many to many are needed on the report or not.

        The solution above was implemented in quite the Enterprise solution that had Currency Intelligence, Time Intelligence, Account Intelligence and two Type 5 SCDs yet report performance was quite impressive given the complexity.

        James Mburu

        July 30, 2014 at 10:02 pm

      • I think the only way to know which one would perform better would be to implement both on the same data. However I still think that mine will perform best :-) I know that m2m carries a performance overhead, but since you are already forcing the granularity down to the date level in your code using ClosingPeriod(), this only what the m2m relationships in my technique are doing. Both techniques would be able to use the same aggregations.

        Chris Webb

        July 31, 2014 at 11:53 am

  3. The invoice ID 5 was opened on January 3rd not 1st

    Marian

    August 4, 2014 at 12:58 pm

  4. I think this is a very complex solution, to a very simple problem. Why not just a view on fact table, something like:

    select InvoiceId,
    iif(PaymentDayDueDate, Amount, o) as Overdue

    from fact_table

    1. Best possible query performance
    2. ETL is not slower, because you´re loading the same data you already have.
    3. Fact table has the same size.

    .. maybe longer at processing cube, but I don´t think so, even so, user query performance is #1 goal for me.

    ZZCube

    August 8, 2014 at 10:21 am

    • I agree with Chris. This is not that quite simple. To be able to use the solution you laid out, you would need a separate fact table composed of a record for each Invoice posting and any queryable date and using Kimball’s bus architecture link that measure group back to the original fact’s measure group via conforming dimensions. For every 10 years of queryable dates, this new fact table would require at least 3,652 records for every Invoice posting fact record in the current fact. This would result in an explosion of data that easily gets out of control and significantly impede any performance gains you have with pre-aggregated data. Also your ETL may get really complicated if you have to deal with Inter company invoice reversals. SAP handles this in a rather sophisticated manner and spins off compensating transactions (read new fact records here) reducing the degree of complexity one would need in ETL but since we introduced a new fact table here, we would have to add similar sophistication to our new fact table.

      To summarize, the challenge here would be best resolved dynamically as Chris did. The focus I believe would be to try and optimize his solution from an infrastructure standpoint if query response time is ever an issue like placing the intermediate measure groups for the many to many relationships that are joined at run time into solid state drives to boost.

      James Mburu

      August 8, 2014 at 2:15 pm

  5. Sorry, query has missed a line…

    select InvoiceId,
    iif(PaymentDayDueDate, Amount, o) as Overdue

    from fact_table

    ZZCube

    August 8, 2014 at 10:23 am

    • No, this is not the same thing at all. Your solution can only give the amount overdue at a single point in time, the point that your fact table is calculated. However an invoice that is overdue today may not have been overdue last week. What my solution does (and the reason it is complex) is tell you how much was overdue on any given date, not just today.

      Chris Webb

      August 8, 2014 at 10:28 am

  6. OK, something happens with this, query i (last try)

    missing line is: iif(PaymentDay<=DueDate, Amount,0) as Backlog,

    ZZCube

    August 8, 2014 at 10:24 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 3,240 other followers

%d bloggers like this: